Tech Support Section Home Mekong Network Home
About This Site...
Mekong.Net Sites...
Recently Updated...
Recommended Links...
Contact Us...
Questions? Comments? Requests? Click here to contact us.

TxtSplit: A Text File Parsing Utility

Click here to download this program.

TXTSPLIT is a tool for chopping apart big text files on specific lines. It was designed with a very specific application in mind: extracting individual messages from a single, big, globby file.

Where would you get such a thing? Why would you ever have a "single, big globby file" which contained lots of separate messages? I can think of two examples: email programs, and command-line newsreaders.

If you don't know what a command-line newsreader is, chances are you'll never need to know. The basic use of this program is the same regardless of the source of the text file, however, so I'm going to discuss its use in conjunction with email.

Suppose you subscribe to a mailing list. Every day, along with your daily dose of Spam, you receive about a dozen messages from the mailing list. You want to keep them for future reference. But you don't want to keep them in your mailbox. No problem... you can just save them. But it's very time-consuming to save them one by one. You have to name each file separately, you have to remember to save them all in the same place. Moreover, let's suppose that you're like most people in this Windows-dominated world. You're probably using Outlook to read your mail.

Wouldn't it be nice if you could just select five hundred messages and save them all at once? With Outlook, you actually can do that. Ah, but what does that give you? Hint: the correct answer contains the words "big" and "globby." (OK, OK, so "globby" is not an actual word, but you get the idea: it means fat and sloppy.)

So now you have this big, giant, fat text file, and it contains ALL of your saved messages, crammed into one colossal document. (This, by the way, is basically what you'll get if you use an old-style shell program such as Pine to read email, or if you use a newsreader like Tin. But I digress. Oh... and another thing... I'm not going to talk about email attachments. That's a whole 'nother problem.)

What do you do with this file? After all, the reason you're saving these messages in the first place is that they might have information that would be useful for you. It isn't very convenient to have to read through the same fifty-page document every time you want to find a little info. You want to leverage the power of your computer. Let the computer find things for you. If you can use the computer's "Find" or "Search" feaure to look for text inside a file, you'd be able to find what you want fast. But to do that, you'll want to break that big file apart again. That's where TXTSPLIT comes in.

If you open the big text file you got when you saved everything from Outlook, you'll see that the message headers are retained. They are also consistent, in that each message has the same sequence of headers.

TXTSPLIT allows you to specify text strings to identify the message headers. It will then save each individual message into a separate file, using a filename prefix that you've defined, and appending a number to the prefix to create a unique filename automatically.

TXTSPLIT is configured via an .ini file (txtsplit.ini). The .ini file contains five lines. If you open the default .ini file, you'll find that it looks like this:

0
msg
txt_out
From:
Sent:

What do those lines mean?

So how do we use this program? It's a command-line program, which means that you should be running it from a DOS prompt, or from within a batch file. The syntax is simple:

txtsplit [target_file]

If you do not specify a filename, it will assume that you want to use its default input filename, rawfile.txt.

When the program runs, it will first display its current settings. At this point, you'll have four options: you can type "H" to view the program help, "R" to reset the .ini file to its original defaults, "D" to create the default output directory (txt_out), or "X" to exit without doing anything at all. You'll have 15 seconds to make a choice. If you don't select one of the four other options, the program will run, and will (presumably) extract your messages from the original file. (The original file, incidentally, will not be altered in any way.)

What if you want to split a file using a one-line match? You can do that by modifying txtsplit.ini. Suppose you had a big file that you wanted to split apart on every occurance of a line beginning with the word "Error." You can do this by editing the .ini file so that the fifth line of the file is blank. The fifth line of the .ini file contains the text that must match the second header line. By setting the line to an empty string, you're telling TXTSPLIT, "Screw the second header line. Just split the thing when you find a match for the first header line." One important point, though: to get this to work properly, you have to leave the fifth line of the .ini file blank. Don't delete the whole line! You still need a carriage return there, or TXTSPLIT will complain that the format of the .ini file is incorrect.

There's one potential problem that should be noted: if your text file doesn't begin with the search string, any text before the first matching line will be ignored.

If I really wanted to be thorough, I'd probably explain a lot more about why you might want to save your emails or newsgroup postings this way. However, this is such an obscure, single-purpose tool that I doubt if many people will need it. If you *do* need it, you probably already know WHY you need it, and you are probably saavy enough to figure out whether or not this will help you.

One question I've been asked from time to time: is there a maximum limit to the size of the file that can be split? I don't believe there is, beyond the limits imposed by the maximum file size on your drive's file system. There is, however, a limit to the maximum length of a line within the file. See "Bugs and Known Flaws," below.

On the outside chance that you are just certain that this is the tool you need, but you can't figure out how to use it because the documentation is so damn skimpy, go ahead and email me. If you put "TXTSPLIT" in the subject, I'll try to help you out.

Also available: a "super-Canadian" version of TXTSPLIT, known as QSPLIT, optimized for use in batch files. QSPLIT differs from TXTSPLIT in that it does not have the fifteen-second delay during which you can choose other actions, such as showing the help screen, rebuilding the .ini file, or creating the default output folder. (You can, however, still do those things with QSPLIT by passing it the command-line parameters /h, /r, or /d.) Click here to download QSPLIT.

Bugs and Known Flaws

Elegant error handling? Haaaahaaahaaa!!! This program has no such thing. I wrote this to meet a very specific need, and it was done in a big hurry. I'll let you in on a secret: this is a truly BASIC program. That is, BASIC as in "Microsoft QuickBASIC." (Yes, there was such a thing as a compiler for QuickBasic.) I'm not even going to try to create a comprehensive list of its flaws; I'll just list a few significant limitations, and beyond that, you're on your own.

Version History

Builds 1 - 17 were debugging builds, and were not released. Build 18 is the first release version.

Download

You can download TxtSplit here:

TxtSplit
Program Description: A quick-and-dirty tool for splitting a text file into new files on each occurance of matching lines of text. This is Build 18.

The zip file contains the following items:

txtsplit.exe - (the executable file)
txtsplit.ini - (configuration file)
txtsplit.txt - (program notes and documentation)
install.bat - (batch file to create the default output directory)
rawfile.txt - (a sample file to demonstrate the use of the program)
txtsplit.bas - (QuickBasic source code)
source - (directory containing source code)
txt_out - (empty directory, used as default output destination)

Program and documentation by Bruce Sharp. Last update: Nov. 2007.