This project has moved. For the latest updates, please go here.

Reading very large EDI X12 Files

Dec 19, 2014 at 7:54 PM
Hello,

I was wondering if there was a common way or example for using EDI Fabric to deserialize extremely large EDI X12 files (e.g. > 1Gb) one Message object at a time without needing to load the entire file into memory?

Thanks,
Tim M.
Coordinator
Jan 16, 2015 at 11:07 AM
There isn't one unfortunately. There are many improvement points around performace\handling large messages, which may be addressed in the future.
You are mentioning a VERY large file, I presume it's an aggregated group of many messages. How do you normally go about such files ? Do you need all of the transactions loaded into memory in the same time ?
It's interesting to see what the exact requirement is. The parser will currently build an XML for the full interchange (all grouped transactions). I can think of a situation where this is not needed and have the parser debatch the envelope and produce separate XML for each of the transactions in sequential way.

Don
Feb 4, 2015 at 4:22 PM
Hello Don,

I am working with aggregated EDI X12 837, 834, 274, and 271 files. I have yet to see a 1GB file however have been told I can expect to see them in the future. The largest file I've seen so far was about 72 MB and it contained around 182,000 transactions/messages. I've been informed the maximum number of transactions per file is 9,999,999.

There is typically one interchange in each file (ISA - IEA) with one group (GS - GE) containing (10s of 1000s) of transactions/messages (ST - SE) in the envelope.
So something like :
ISA
GS
    ST
        SET CONTENT
    SE
    .
    .
    .
    ST
        SET CONTENT
    SE
GE
IEA


I have been using the library like so:

var x12str = Console.In.ReadToEnd();
Byte[] strAsBytes = new System.Text.UTF8Encoding().GetBytes(x12str);
var interchange = INTERCHANGE.LoadFrom(new System.IO.MemoryStream(strAsBytes));
foreach (var group in interchange.GROUPS)
{
foreach (var message in group.MESSAGES)
{
    var typedMessage = message.DeserializeItem<M_837>();
    .
    .
    .
}
}

However it loads the entire contents of the file into memory (and in the case of the 72 megabyte file it uses a ton of memory making multiple copies of the data) when really I only need to process (make a few flat delimited lines) from one transaction at a time. So, I have written a preprocessor that splits the file into more manageable chunks of ~4MB each or 10000 transactions per file (whichever comes first).

Tim
Coordinator
Feb 5, 2015 at 2:33 PM
Thanks for the details Tim, fair point. I'll add this to the new features list and will probably implement as:

group.MESSAGES.GetNextMessage()

which will process them one by one.

Cheers,
Don