Visit Jeremy's Blog.
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 08-06-2010, 03:05 PM   #1
Registered: May 2009
Posts: 32

Rep: Reputation: 15
How to parse files with variable record length

Hello. Below is a sample of my input file. I would like to extract Room number, Lastname,Firstname,invoice(205880080),arrival date, departure date, and total(229.46). Can you at least give me a hint on how to proceed? I have tried a lot but I am stumped from the beginning. Thanks.
Room: 124 B Payment: Bell/TRAVELSCAPE.COM
Lastname*FIT*,Firstname 4A, 0K, 0B Guest
Bell *205880080 FT
Bell *205880080 July 31, 2010
____ 00 August 1, 2010
Date Trans Room Debit Credit Balance
Jul31'10ROOM 124 206.75 206.75
Jul31'10TAX 124 21.71 228.46
Jul31'10TID 124 1.00 229.46
Aug 1'10EX 124 229.46 CR 0.00
Attached Files
File Type: txt xtest.txt (894 Bytes, 11 views)
Old 08-06-2010, 03:57 PM   #2
LQ Guru
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Hmm.. you can try with GNU Awk using the gensub function to extract specific parts of the lines, based on strict regular expressions. However you have to define what are the items that show repeatedly. In other words it's necessary to define the format of the input text.

For example I tried to extract the desired information based on these assumptions:
1. The word History is at the start of each section
2. First line after History contains the keywords Room: and Payment:
3. The second line contains Lastname and Firstname separated by a single and unique comma
4. The third line contains the invoice preceded by the payment method (?) a.k.a. Spaceship or Bill in your samples
5. Arrival and departure dates are in the format Month [D]D, YYYY
6. Total is in the line above that one repeating the payment method (?) and CR is a keyword following the total amount.

Well.. based on my (surely wrong) guess, I can think of something like this:
#!/usr/bin/awk -f

/History/ {

   room = gensub(/.*Room: ([0-9]*).*Payment.*/,"\\1","g")
   paym = gensub(/.*Payment: (.*)\/.*/,"\\1","g")
   lastname = gensub(/(.*),.*/,"\\1","g",$1)
   frstname = gensub(/.*,(.*)/,"\\1","g",$1)
   if ( $1 ~ paym ) sub(/^*/,"",$2)
   invoice = $2
   match($0,/[JFMASOND][a-z]* [1-3]*[0-9], 20[1-9][0-9]/,arrival)
   match($0,/[JFMASOND][a-z]* [1-3]*[0-9], 20[1-9][0-9]/,departure)
   while ( $0 !~ paym ) {
     if ($0 ~ / CR / ) total = gensub(/.* ([0-9.]*) CR.*/,"\\1","g")

Just to give you an idea. What is your skill in regular expressions, anyway? And in awk?

Last edited by colucix; 08-06-2010 at 04:00 PM.
Old 08-06-2010, 06:24 PM   #3
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
Moved: This thread is more suitable in <PROGRAMMING> and has been moved accordingly to help your thread/question get the exposure it deserves.
1 members found this post helpful.
Old 08-07-2010, 01:58 AM   #4
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,552

Rep: Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898Reputation: 2898
Like colucix I feel we need more information, but assuming the format is always the same as shown, the following is an alternative on the same theme:
#!/usr/bin/awk -f


    room=$3" "$4



    arrival=$17" "$18" "$19
    departure=$22" "$23" "$24

    print room,names[1],names[2],$13,arrival,departure,$(NF - 9)
Obviously you need to work on the formatting, but you get the idea
1 members found this post helpful.
Old 08-11-2010, 11:49 AM   #5
Registered: May 2009
Posts: 32

Original Poster
Rep: Reputation: 15
At last, I got it. Basing on all your input, I was able to get a working script. Thanks again to all of you.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Wlan0: option 43 has zero length, failed to parse packet BobNutfield Linux - Networking 14 12-09-2010 08:45 PM
Variable length console prompt statquant Linux - General 7 07-15-2010 06:55 PM
how to generate variable length packets in iperf rohit83.ken Linux - Networking 1 03-10-2009 09:53 PM
Variable length objects kamransoomro84 Programming 4 10-28-2004 01:56 PM
problems reading in fixed-length record file naijaguy Programming 1 08-24-2004 03:34 PM > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:31 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration