Processing data from a 'foreign' database with mysql, or tools to pre-process data.
I am working on a scheme to bring some data across from an old, but still in use, system so that we can process it for use with our (Linux-based) mailing list software (relax, it's part of our customer base - not yet more spam!).
The data is hosted on a server running an operating system called THEOS and I have no problems pulling over the files by FTP. The file structure is described as 'THEOS ISAM' and as far as I can see from a quick glance, comprises fixed length records with various field delimiters.
I need to either get the data into tab-delimited format or perhaps directly into a mysql database so it can be manipulated. In the past when I have done this in the MS world, I'd use a combination of BASIC and Excel, but since the target app is Linux-based, I'd like to go straight from THEOS -> Linux.
I can hack around in perl, PHP etc. to varying degrees, but wondered if there were any import pre-processing front ends for use with mysql that make it easy to define and test record structure definitions that can then be used as import filters? Any tools worth having a look at before I do some manual data structure dissection?
Hope that make sense!?
This kind of work I do with AWK. For current project I need to convert the output of a huge Paradox export to something which I can import in MySQL.
What the script does:
- Filter records matching certain criteria
- Check values for null values, either generate an error message or replace it by \N if null is allowed
- Concatenate strings
- Replace strings with numbers if they must become enumerated values for my database
- Print each record in a tab delimited format.
The Paradox files are sent to me in a regular fashion, I call awk from a bask script, and loads the result in the database.
It takes you about a day or so to understand awk and crank out the basic code.
If your export is a CSV file with quoted strings, do not try to process this with AWK. Simple cases are fine, but if you find a string like:
|All times are GMT -5. The time now is 04:04 AM.|