How to parse files with variable record length
1 Attachment(s)
Hello. Below is a sample of my input file. I would like to extract Room number, Lastname,Firstname,invoice(205880080),arrival date, departure date, and total(229.46). Can you at least give me a hint on how to proceed? I have tried a lot but I am stumped from the beginning. Thanks.
------------------------------------------------------------------------ ***History*** Room: 124 B Payment: Bell/TRAVELSCAPE.COM Lastname*FIT*,Firstname 4A, 0K, 0B Guest Bell *205880080 FT Bell *205880080 July 31, 2010 ____ 00 August 1, 2010 00000 Date Trans Room Debit Credit Balance Jul31'10ROOM 124 206.75 206.75 Jul31'10TAX 124 21.71 228.46 Jul31'10TID 124 1.00 229.46 Aug 1'10EX 124 229.46 CR 0.00 Account Bell/TRAVELSCAPE.COM _____________________________________________________________________ |
Hmm.. you can try with GNU Awk using the gensub function to extract specific parts of the lines, based on strict regular expressions. However you have to define what are the items that show repeatedly. In other words it's necessary to define the format of the input text.
For example I tried to extract the desired information based on these assumptions: 1. The word History is at the start of each section 2. First line after History contains the keywords Room: and Payment: 3. The second line contains Lastname and Firstname separated by a single and unique comma 4. The third line contains the invoice preceded by the payment method (?) a.k.a. Spaceship or Bill in your samples 5. Arrival and departure dates are in the format Month [D]D, YYYY 6. Total is in the line above that one repeating the payment method (?) and CR is a keyword following the total amount. Well.. based on my (surely wrong) guess, I can think of something like this: Code:
#!/usr/bin/awk -f |
Moved: This thread is more suitable in <PROGRAMMING> and has been moved accordingly to help your thread/question get the exposure it deserves.
|
Like colucix I feel we need more information, but assuming the format is always the same as shown, the following is an alternative on the same theme:
Code:
#!/usr/bin/awk -f |
At last, I got it. Basing on all your input, I was able to get a working script. Thanks again to all of you.
|
All times are GMT -5. The time now is 09:45 AM. |