web stats
performance issue with generating large DFT batch file. - Mirth Community

Go Back   Mirth Community > Mirth Connect > Support

Reply
 
Thread Tools Display Modes
  #1  
Old 01-06-2012, 04:12 AM
StefanScholte StefanScholte is offline
 
Join Date: May 2009
Location: Netherlands, Harderwijk
Posts: 321
StefanScholte is on a distinguished road
Default performance issue with generating large DFT batch file.

Hi All,

I have the following situation:

I have a javascript reader source connector that queries a database.

The resultset is most of the time over 13000 records.
These records will be sent to the destination connector in a CSV format.

The destination connector will make a DFT batch file of this. and send it with LLP to the receiver.

I have this channel up and running but there is a performance issue.

this is probably due to the fact that the outbound message first needs to be xml (tmp[] = value etc.)


I wonder if there is another solution to speed things up.
it is not an option however to split up the amount of records.

I really could use some help here.

With regards
Stefan Scholte
Reply With Quote
  #2  
Old 01-06-2012, 07:39 AM
panickc panickc is offline
OBX.3 Kenobi
 
Join Date: Dec 2007
Posts: 127
panickc is an unknown quantity at this point
Default

Turn off 'Store Message Data'?

-cp
Reply With Quote
  #3  
Old 01-06-2012, 11:17 AM
StefanScholte StefanScholte is offline
 
Join Date: May 2009
Location: Netherlands, Harderwijk
Posts: 321
StefanScholte is on a distinguished road
Default

I already did that.

the bottle neck lies within the parsing of the outbound dft message.
Reply With Quote
  #4  
Old 01-06-2012, 11:33 AM
dans dans is offline
Mirth Employee
 
Join Date: Apr 2007
Location: Irvine, CA
Posts: 590
dans is an unknown quantity at this point
Default

I guess the issue is when parsing the hl7 message from/to xml. If you have 13000 segments this can take quite a while to parse.

One thing you could try is to build up the header segments in tmp and build up all repeating segments in a string that you put in the channel map (let's call it additionalSegments). On the LLP destination you would put something like this in the Template:
${message.encodedData}
${additionalSegments}
__________________
Daniel Svanstedt
Software Engineer
Mirth Corporation

Want professional services, support, and enterprise or virtual appliances? It's all available from the Mirth Corporation:
Mirth Support | Mirth Training | Mirth Appliances | Online Training | Developer Q&A

Don't forget, Mirth Support gives you access to all of our online training videos, and silver support gives you access to developer Q&As!
Reply With Quote
  #5  
Old 01-09-2012, 01:21 AM
StefanScholte StefanScholte is offline
 
Join Date: May 2009
Location: Netherlands, Harderwijk
Posts: 321
StefanScholte is on a distinguished road
Default

Thanks Daniel,

I will give it a try and will let you know if it worked
Reply With Quote
  #6  
Old 01-09-2012, 06:32 AM
StefanScholte StefanScholte is offline
 
Join Date: May 2009
Location: Netherlands, Harderwijk
Posts: 321
StefanScholte is on a distinguished road
Default

Well Daniel,

it worked but somehow the channel is leaking memory.
I already cleaned all the channelMaps variables and I only uses stringbuilders

are there other options to reclaim the used memory?
Reply With Quote
  #7  
Old 10-18-2012, 08:56 AM
tcannon tcannon is offline
What's HL7?
 
Join Date: Oct 2009
Posts: 3
tcannon is on a distinguished road
Default performance issues

I too am having performance issues with large csv files. I have written two test channels to demonstrate this. The first channel reads a csv off the filesystem and sends it to the second channel. The second channel converts the received csv to Mirth's native XML and iterates the XML to build up a json representation.

I have found that 40 column by 6000 row csv files process in under 2 seconds; however, 40 column by 60000 csv files take over 24 hours to process. I believe the entirety of the XML is being read into memory and acted on, but even this should not account for the exponential increase in processing time. The 60000 row csv is only 30MB in size and I have 2GB of RAM allocated to Mirth.

I considered opening a new thread to house my issue, but I believe this thread is describing a very similar problem even though the OP's output is dft rather than json.

I have attached a zip containing both exported channels and the sample.csv file. Any help in diagnosing and/or solving this issue will be greatly appreciated.

TC
Attached Files
File Type: zip sample_files.zip (5.45 MB, 37 views)
Reply With Quote
  #8  
Old 10-18-2012, 10:26 AM
narupley's Avatar
narupley narupley is online now
Mirth Employee
 
Join Date: Oct 2010
Posts: 7,123
narupley is on a distinguished road
Default

The first channel, "tc 0 CSV Inbound"... is that just a testing channel, or is it actually basing used? Basically in that channel, you're multiplying the space you use by many times by loading it into a JavaScript variable, then placing it in the global channel map, etc. Also you're not returning anything in that JavaScript Reader, so in 2.1.1 it'll return an Undefined instance (which gets serialized into a memory address, like "org.mozilla.javascript.Undefined@20724356"). In 2.2.1 you'll encounter an NPE if you return undefined.

Anyway the first channel notwithstanding, the second channel is reading in the entire file in memory. Multiple times in fact, since for each connector there is the raw/transformed/encoded data, etc.

Unfortunately right now batching for large files is a bit deficient, but it will be MUCH improved in 3.0. In the meantime, I would suggest using a JavaScript Reader to create a BufferedReader and placing it in the global channel map (similar to what your first channel is doing). Then just return something like "dummy" (or whatever you want), just to allow your channel to process a message. From there, you can use your destination transformer to step through the input stream and convert it to JSON as you go. That way you're not loading so much into memory at one time.
__________________
Step 1: JAVA CACHE...DID YOU CLEAR ...wait, ding dong the witch is dead?

Nicholas Rupley
Work: 949-237-6069
Always include what Mirth Connect version you're working with. Also include (if applicable) the code you're using and full stacktraces for errors (use CODE tags). Posting your entire channel is helpful as well; make sure to scrub any PHI/passwords first.


- How do I foo?
- You just bar.
Reply With Quote
  #9  
Old 10-18-2012, 10:46 AM
upstart33 upstart33 is offline
Mirth Guru
 
Join Date: Dec 2010
Location: Chicago, IL.
Posts: 459
upstart33 is on a distinguished road
Default

I also have some trouble sometimes when trying to read large XML files, parse them, then write them to database. I will get alerts from Mirth about Java Heap memory issues, the GUI will freeze, and the only way to get everything back to normal is by restarting the service, which is a pain.
Reply With Quote
  #10  
Old 10-18-2012, 10:53 AM
tcannon tcannon is offline
What's HL7?
 
Join Date: Oct 2009
Posts: 3
tcannon is on a distinguished road
Default

Quote:
Originally Posted by narupley View Post
The first channel, "tc 0 CSV Inbound"... is that just a testing channel, or is it actually basing used? Basically in that channel, you're multiplying the space you use by many times by loading it into a JavaScript variable, then placing it in the global channel map, etc. Also you're not returning anything in that JavaScript Reader, so in 2.1.1 it'll return an Undefined instance (which gets serialized into a memory address, like "org.mozilla.javascript.Undefined@20724356"). In 2.2.1 you'll encounter an NPE if you return undefined.

Anyway the first channel notwithstanding, the second channel is reading in the entire file in memory. Multiple times in fact, since for each connector there is the raw/transformed/encoded data, etc.

Unfortunately right now batching for large files is a bit deficient, but it will be MUCH improved in 3.0. In the meantime, I would suggest using a JavaScript Reader to create a BufferedReader and placing it in the global channel map (similar to what your first channel is doing). Then just return something like "dummy" (or whatever you want), just to allow your channel to process a message. From there, you can use your destination transformer to step through the input stream and convert it to JSON as you go. That way you're not loading so much into memory at one time.
The second channel is what is causing the delay. Glad to read that the efficiency of processing large batch files is being addressed in 3.0. I like your idea of placing a reader in the global channel map and converting the stream real time. I'll kick that around a bit.

Thanks,

TC
Reply With Quote
Reply

Tags
dft, performance

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -8. The time now is 01:45 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Mirth Corporation