LinuxQuestions.org
Support LQ: Use code LQCO20 and save 20% on CrossOver Office
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 11-08-2006, 08:39 AM   #1
bingo
Member
 
Registered: Apr 2006
Distribution: Slackware
Posts: 57

Rep: Reputation: 15
Filtering an ugly text file


My company is currently receiving a production release from our customer that is very difficult to read (this week's release is 365 pages!) It contains 100+ item numbers, and fields that I want to pull out, seperated by spaces or tabs. Here is the format for each item:

EDIT - see formatted code in my post below...

The above repeats for each item number. What I want to do is to pull out the "Item Number" value towards the top left, along with the following:
- In Transit Qty (top right)
- Cum Received (top right)
- Cum Req Qty at a certain date row ("matrix" at bottom)

I'm trying to write a shell script to filter this information out for each part number, and echo to an output file for easier reading. Any suggestions would help a bunch. Thanks! ...Steve

Last edited by bingo; 11-08-2006 at 11:07 AM.
 
Old 11-08-2006, 08:44 AM   #2
bingo
Member
 
Registered: Apr 2006
Distribution: Slackware
Posts: 57

Original Poster
Rep: Reputation: 15
EDIT - Thanks to druuna for helping me with the code format (see my post below)

Last edited by bingo; 11-08-2006 at 11:10 AM.
 
Old 11-08-2006, 09:37 AM   #3
druuna
LQ Veteran
 
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 7,514
Blog Entries: 1

Rep: Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140
Hi,

You should put your example input between code tags [ code ]text goes here[ /code ]. You must not use the spaces between the square bracket and the word 'code'. Take a look here for all the possible code tag's and their meaning.

About your question:

What do you mean by this: - Cum Req Qty at a certain date row ("matrix" at bottom)?
Do you need to see this for all dates present or a specific date (if so, how's that determined?)?

Could you give a rough example of the output you would like (if at all possible based on part of a 'real' infile)? I think it's something like this:

Code:
Item Number: XXXXXXX-XXX0
   Transit Qty       : 1,638.0
   Cum Received  : 47,502.0
   Cum Req Qty   : 1,234.0

tem Number: XXXXXXX-XXX1
   Transit Qty       : 2,336.0
   Cum Received  : 51,401.0
   Cum Req Qty   : 671.0
I won't promise a working version, but I will take a good look at it
 
Old 11-08-2006, 11:05 AM   #4
bingo
Member
 
Registered: Apr 2006
Distribution: Slackware
Posts: 57

Original Poster
Rep: Reputation: 15
That's great feedback, thanks for working with me! The output I would like to see is exactly what you did. For the "Cum Req Qty", I need the value at a date (no addition or anything). I'm good with just making this date a variable I can prompt for and read in to the script. Here is the code again:

Code:
SUPPLIER SCHEDULE / MATERIAL RELEASE


   Supplier: XXXXXX                            Ship-To: XXXX
   XXXXXXX XXXXXXXX                            X.X.X.X. X XX XX XX XX
   XXX X XXXXXXXXX, XXX XXX                    (XXXXXX XXXXX XXX XXXXX)
   XXXXXX                                      XXXXXXXXX # X XXX. XXXXXX
   XXXXXXX, XX XXXXX                           XXXXXX XXXXXXXXXX XXXXX
   UNITED STATES OF AMERICA                    XXXXX XXXXXX, XXXX XXXXX
                                               XXXXXX
   Attention:                                  Attention:
   Telephone:                                  Telephone:
   Fax/Telex:                                  Fax/Telex:

    Release ID: 20061106-001                   Release Date: 11/06/06
Purchase Order: XXXXXXXX                              Buyer: SAPPO
   Item Number: XXXXXXX-XXXX       UM: EA    In Transit Qty: 1,638.0
               XXX,XXX XXX XXXXX               Receipt Date: 11/03/06 23:53
                                           Receipt Quantity: 1,764.0
 Supplier Item:                                Cum Received: 47,502.0
       Contact:
                                       Packing Slip/Shipper: 140611

Ship/Delv Pattern: 12

    Item Ship     Ship
Int Rev  Date     Time  Reference  Q      Req Qty   Cum Req Qty  Net Req Qty
--- ---- -------- ----- ---------- - ------------ ------------- ------------
         Prior                                         47,502.0          0.0

W        10/30/06                  P        504.0      48,006.0        504.0
         11/06/06                  P      3,150.0      51,156.0      3,150.0
         11/13/06                  P          0.0      51,156.0          0.0
         11/20/06                  P      1,764.0      52,920.0      1,764.0
         11/27/06                  P      1,890.0      54,810.0      1,890.0
         12/04/06                  P      1,260.0      56,070.0      1,260.0
         12/11/06                  P      1,134.0      57,204.0      1,134.0
         12/18/06                  P      2,142.0      59,346.0      2,142.0

M        12/25/06                  P          0.0      59,346.0          0.0
         01/29/07                  P          0.0      59,346.0          0.0
         02/26/07                  P          0.0      59,346.0          0.0
         03/26/07                  P          0.0      59,346.0          0.0

Fab Authorization Cum Qty: 51,156.0      Thru: 11/19/06
Raw Authorization Cum Qty: 52,920.0      Thru: 12/03/06
 
Old 11-08-2006, 11:32 AM   #5
druuna
LQ Veteran
 
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 7,514
Blog Entries: 1

Rep: Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140
Hi,

This is what I came up with using awk:
Code:
#!/bin/bash
#
# filter.sh - show relevant information
#
# Usage: filter.sh <date> <infile>
#        date => MM/DD/YY

case $# in
  2);;
  *|1) echo "Usage: $0 <date> <infile>";exit;;
esac;

thisDate=$1
inFile=$2

awk -v thisDate=${thisDate} '
  /   Item Number: / { 
                      print ""
                      print "Item Number    : " $3
                      print "  Transit Qty  : " $9
                  }
  / Cum Received: / { print "  Cum Received : " $5 }
  $1 == thisDate    { print "  Cum Req Qty  : " $4 }
' $inFile
A sample run with your input (times 3 to mimick 'huge' file):
Code:
$ ./filter.sh 12/18/06 infile 

Item Number    : XXXXXXX-XXXX
  Transit Qty  : 1,638.0
  Cum Received : 47,502.0
  Cum Req Qty  : 59,346.0

Item Number    : XXXXXXX-XXXX
  Transit Qty  : 1,638.0
  Cum Received : 47,502.0
  Cum Req Qty  : 59,346.0

Item Number    : XXXXXXX-XXXX
  Transit Qty  : 1,638.0
  Cum Received : 47,502.0
  Cum Req Qty  : 59,346.0
Hope this is what you are looking for.
 
Old 11-08-2006, 04:07 PM   #6
bingo
Member
 
Registered: Apr 2006
Distribution: Slackware
Posts: 57

Original Poster
Rep: Reputation: 15
Holy crap, that's exactly what I was looking for :-) Let me play with it some more, but what a huge help for us! I'll probably insert some math to calculate required shipping numbers for each week, but that's easy enough. Thanks again! Do you do any commercial contract work for bigger jobs? I'm not a computer guy, but I'm pitching Linux at my company.
 
Old 11-08-2006, 06:54 PM   #7
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 64
Quote:
Originally Posted by bingo
Holy crap, that's exactly what I was looking for :-) Let me play with it some more, but what a huge help for us! I'll probably insert some math to calculate required shipping numbers for each week, but that's easy enough. Thanks again! Do you do any commercial contract work for bigger jobs? I'm not a computer guy, but I'm pitching Linux at my company.
You might want to learn 0a little about textflow-oriented a scripting language such as perl, awk, or even sed. It comes in handy in a lot of situations (even if your not a computer guy) and it would probably make your pitch for linux more qualified-sounding to a manager
 
Old 11-09-2006, 01:32 AM   #8
druuna
LQ Veteran
 
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 7,514
Blog Entries: 1

Rep: Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140
Hi,

Quote:
Originally Posted by bingo
Holy crap, that's exactly what I was looking for :-) Let me play with it some more, but what a huge help for us! I'll probably insert some math to calculate required shipping numbers for each week, but that's easy enough. Thanks again!
Glad to be of service

The script isn't that complicated (from my point of view), if you have some AWK experience it shouldn't be too hard to edit/add to/change the above code. And I guess you'll ask if you cannot get the changes to work the way you want them to.

Osor gave some good advice: Do learn the basics of sed/awk, it will make your computer life a lot easier. I don't know about the perl part. If you occasionally have to write a script/tool learning perl could be a bit much. On the other hand: Perl is a lot more powerfull then sed and awk combined.

Quote:
Do you do any commercial contract work for bigger jobs? I'm not a computer guy, but I'm pitching Linux at my company.
I'm an Unix System Admin by trade and have been playing with computers since late 1980. Helping people, whenever possible, on this forum is just for fun and relaxation. And to answer your question: Nope, I'm not for hire
 
Old 11-09-2006, 07:35 AM   #9
bingo
Member
 
Registered: Apr 2006
Distribution: Slackware
Posts: 57

Original Poster
Rep: Reputation: 15
This is how I learn, so thanks again for the time. I have done a little with awk, but I'm amazed at how quickly you can whip this up!

Code:
awk -v thisDate=${thisDate} '
  /   Item Number: / { 
                      print ""
                      print "Item Number    : " $3
                      print "  Transit Qty  : " $9
                  }
  / Cum Received: / { print "  Cum Received : " $5 }
  $1 == thisDate    { print "  Cum Req Qty  : " $4 }
' $inFile
I do have a case where the "Cum Req Qty" does not work correctly. I think the issue is when there is a character in front of the ship date (like the "4" in this case, or sometimes there is a "W" or "M" as the first string)...

Code:
    Item Ship     Ship
Int Rev  Date     Time  Reference  Q      Req Qty   Cum Req Qty  Net Req Qty
--- ---- -------- ----- ---------- - ------------ ------------- ------------
         Prior                                      1,383,750.0          0.0

W   4    10/30/06                  P     49,500.0   1,433,250.0     49,500.0
    4    11/06/06                  P    141,750.0   1,575,000.0    141,750.0
    4    11/13/06                  P          0.0   1,575,000.0          0.0
Is there an easy way to make it work whether this string is the 4th, 5th, or 6th string in the line?

Thanks for being patient with me. I've been a Slackware user for years at home, but haven't done anything like this. I'm actually not even a computer guy, but I love it. Thanks! ...Steve

Last edited by bingo; 11-09-2006 at 09:26 AM.
 
Old 11-09-2006, 09:49 AM   #10
druuna
LQ Veteran
 
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 7,514
Blog Entries: 1

Rep: Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140
Hi,

In general awk reads one input line at the time. Taking my script as an example, every line is checked against the 3 'triggers':

1) / Item Number: / => a regular expression that looks for a space followed by Item Number: and a space. If that is the case 3 things happen: print an empty line (print ""), print Item Number :and the third field, print Transit Qty : and the ninth field.

2) / Cum Received: / => look for Cum Received (again with a space in front and behind). If a hit is made print Cum Received : and field 5.

3) $1 == thisDate => If field number one equals the datestring given it will print Cum Req Qty : and the fourth field.

If a line does not meet any of the above criteria, it is discarded (= not printed to screen).

If 'words' on a line are separated by spaces/tabs, awk sees this 'word' as a field. The following is seen as a line with three fields:
foo bar whatever

foo can be represented as $1, bar as $2 and whatever as $3.
The so called field separator can be set to be just about anything you like, by default it is a space/tab. This for example will print all users and the uid in the /etc/passwd file: awk -F: '{ print $1, $3 }' /etc/passwd. The field separator is changed to : (the -F: part).

About the '4' in you second example, if it is irregular (I did not see it in the other infile example) it could render my third rule 'useless'. If I'm honest, the lines with W or M in front are also discarded in the above script But this can be solved (and maybe the 4/number problem):

Add these 2 lines to the script (bold are the new lines):

/ Cum Received: / { print " Cum Received : " $5 }
$1 ~ /[0-9MW]/ && $2 == thisDate { print " Cum Req Qty : " $5 }
$1 ~ /[0-9MW]/ && $3 == thisDate { print " Cum Req Qty : " $6 }

$1 == thisDate { print " Cum Req Qty : " $4 }

This will take care of the W, M and/or 4 (actually: 0-9).

And finally. You can redirect the output to a file. Easiest way:

./filter.sh 11/06/06 infile > /path/to/outfile

I hope this cleared things up a bit, if not...... Just ask!

Last edited by druuna; 11-09-2006 at 09:50 AM.
 
Old 11-09-2006, 03:18 PM   #11
bingo
Member
 
Registered: Apr 2006
Distribution: Slackware
Posts: 57

Original Poster
Rep: Reputation: 15
It looks like you were replying when I was editing my last message. I found a very helpful awk guide at http://www.oracle.com/technology/pub...laney_awk.html and I answered most of my original questions.

Your suggestion for the irregularities works perfectly as I assumed :-) Plus, I now understand everything in the script!

I have modified what we have to output very close to what I'm looking for. Only 2 things now that I'm stuck on...

- I know how to format the output per "trigger", but I can't find how to format between "triggers". What I want to do is to make the output all on the same row, with a space separation. I can do this easily with the first 1 fields, but I want to keep the other fields on the same line as well. This will help us import into a spreadsheet and use through the whole plant.

- I haven't been able to get a mathematical formula to work within the awk command. What I want to do is to create a new variable that will equal "Cum Req Qty" minus "Cum Received" minus "In Transit Qty" and output for each "Item Number".

Thanks again!
 
Old 11-09-2006, 08:29 PM   #12
bingo
Member
 
Registered: Apr 2006
Distribution: Slackware
Posts: 57

Original Poster
Rep: Reputation: 15
I found the answer to my first question in the man page... my solution is to re-define the ORS separator. Works perfectly!

With some further research, I'm thinking the reason why my equations do not work is because awk is seeing strings, not numbers? The man page says to convert, "add 0 to it", but I'm not sure how I'm supposed to do that?

I'm having fun now... next thing you know, the only Linux box in the whole company is going to be busy doing everyone's busy-work all day!
 
Old 11-10-2006, 02:24 AM   #13
druuna
LQ Veteran
 
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 7,514
Blog Entries: 1

Rep: Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140
Hi again,

About your first question: Changing the ORS is one way of solving the output problem you have/had. Another would be using printf instead of print, which is a more powerful and elegant solution. Elegance isn't always important, sometimes the script just has to do what you want it to do, no matter what

About the arithmetic: variables with a comma in them are seen as string, not a decimal integer. Placing a 0 (zero) in front will not help in this situation. You need to rewrite some parts of the script and add some extra lines. I came up with the following:
Code:
#!/bin/bash
#
# filter.sh - show relevant information
#
# Usage: filter.sh <date> <infile>
#        date => MM/DD/YY

case $# in
  2);;
  *|1) echo "Usage: $0 <date> <infile>";exit;;
esac;

thisDate=$1
inFile=$2

nawk -v thisDate=${thisDate} '
BEGIN {
       print "Item Number  Transit Qty  Cum Received  Cum Req Qty  Calculated"
      }
/   Item Number: / {
                    gsub(/,/,"")
                    printf "%-13s", $3
                    printf "%11s", $9
                    trnqty = $9
                   }
/ Cum Received: /  {
                    gsub(/,/,"")
                    printf "%14s", $5
                    cumrec = $5
                   }
$1 ~ /[0-9MW]/ && $2 == thisDate {
                                  gsub(/,/,"")
                                  printf "%13s", $5
                                  reqqty = $5
                                  printf " %11.1f\n", reqqty - cumrec - trnqty
                                 }
$1 ~ /[0-9MW]/ && $3 == thisDate {
                                  gsub(/,/,"")
                                  printf "%13s", $6
                                  reqqty = $6
                                  printf " %11.1f\n", reqqty - cumrec - trnqty
                                 }
$1 == thisDate {
                gsub(/,/,"")
                printf "%13s", $4
                reqqty = $4
                printf " %11.1f\n", reqqty - cumrec - trnqty
               }
' $inFile
A sample run looks like this:
Code:
$ ./filter.sh 10/30/06 infile
Item Number  Transit Qty  Cum Received  Cum Req Qty  Calculated
XXXXXXX-XXX1      2638.0       47502.0      52920.0      2780.0
XXXXXXX-XXX2       638.0         502.0     433250.0    432110.0
XXXXXXX-XXX3      1638.0       47502.0    1575000.0   1525860.0
As you can see I introduced some 'new' awk stuff

I'll let you try to figure it all out, I know you will ask if something is unclear.
Before you start looking for printf in the awk manuals: Check man printf first. printf is used in different languages (C, perl, awk to name just 3).

I thought I mentioned it already, but I haven't: If you like a paper reference, take a look at the O'Reilly's Sed & Awk book.
 
Old 11-11-2006, 09:27 AM   #14
bingo
Member
 
Registered: Apr 2006
Distribution: Slackware
Posts: 57

Original Poster
Rep: Reputation: 15
Apparently Slackware isn't packaged with nawk, so I just combined some of my solutions with yours, and I can now do my math, and output exactly as I wanted. Thanks! With your guidance, I have this script now finding and calculating current numbers, along with forecasts directly into a comma-separated file my co-workers can open into Excel.

Not to keep this thread going forever, but... there's 1 more thing I can't find how to do. We have a production team that is responsible for each "Item Number". Is there an easy way to direct my awk outputs to a different file depending on what team is responsible for that item? I was trying to do this with an array, but it hasn't worked so far. Obviously I would need to produce some input that includes what "Item Number" goes with each team. Right now I am directing my outputs like this:
Code:
print $9 >> outputFile
Thanks again! ...Steve
 
Old 11-11-2006, 10:20 AM   #15
druuna
LQ Veteran
 
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 7,514
Blog Entries: 1

Rep: Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140Reputation: 1140
Hi,

Sorry about the nawk/awk mix up. I was testing it at work on a SUN box, I won't bore you with details, but you need to use nawk instead of awk on a SUN box. My mistake, it will work with awk on a 'normal' linux box.

I'm not sure if I understand your question. I think you want to send (Item Number) XXXXXXX-XXXX to group X, XXXXXXX-YYYYY to group Y and XXXXXXX-ZZZZ to group Z (send to a specific group based on item number).

You could do something like this:
Code:
#!/bin/bash
#
# filter.sh - show relevant information
#
# Usage: filter.sh <date> <infile>
#        date => MM/DD/YY

case $# in
  2);;
  *|1) echo "Usage: $0 <date> <infile>";exit;;
esac;

thisDate=$1
inFile=$2

awk -v thisDate=${thisDate} '
BEGIN {
      # Fill array with itemnumber -> team pairs:
      toTeam["XXXXXXX-AAAA"] = "z-team.outfile"
      toTeam["XXXXXXX-XXXX"] = "x-team.outfile"
      toTeam["XXXXXXX-YYYY"] = "y-team.outfile"
      toTeam["XXXXXXX-ZZZZ"] = "z-team.outfile"
      }
/   Item Number: / {
                    gsub(/,/,"")
                    printf "%-13s", $3 >> toTeam[$3]
                    printf "%11s", $9 >> toTeam[$3]
                    trnqty = $9
                    outFile = toTeam[$3]
                   }
/ Cum Received: /  {
                    gsub(/,/,"")
                    printf "%14s", $5 >> outFile
                    cumrec = $5
                   }
$1 ~ /[0-9MW]/ && $2 == thisDate {
                                  gsub(/,/,"")
                                  printf "%13s", $5 >> outFile
                                  reqqty = $5
                                  printf " %11.1f\n", reqqty - cumrec - trnqty >> outFile
                                 }
$1 ~ /[0-9MW]/ && $3 == thisDate {
                                  gsub(/,/,"")
                                  printf "%13s", $6 >> outFile
                                  reqqty = $6
                                  printf " %11.1f\n", reqqty - cumrec - trnqty >> outFile
                                 }
$1 == thisDate {
                gsub(/,/,"")
                printf "%13s", $4 >> outFile
                reqqty = $4
                printf " %11.1f\n", reqqty - cumrec - trnqty >> outFile
               }
' $inFile
You will end up with the following (input file has 4 entries. 2 XXXX, 1 YYYY and 1 ZZZZ):
Code:
$ ./filter.sh 12/18/06 infile 

$ ls -l *team*
-rw-r-----  1 druuna internet 128 Nov 11 17:17 x-team.outfile
-rw-r-----  1 druuna internet  64 Nov 11 17:17 y-team.outfile
-rw-r-----  1 druuna internet  64 Nov 11 17:17 z-team.outfile

$ cat x-team.outfile 
XXXXXXX-XXXX      1638.0       47502.0      59346.0     10206.0
XXXXXXX-XXXX      1638.0       47502.0      59346.0     10206.0

$ cat y-team.outfile 
XXXXXXX-YYYY      1638.0       47502.0      59346.0     10206.0

$ cat z-team.outfile 
XXXXXXX-ZZZZ      1638.0       47502.0      59346.0     10206.0
This solution is only practical if there aren't too many different item numbers and/or too many new item numbers. This will also make it possible to send 2 or more different item numbers to the same team (see dummy XXXXX-AAAA entry).

Another solution would be to use the item numbers as filename. But this only works if you (your co-workers) know which item number is for which team.

I decided to not print the header line. If the output(file) is imported in excel, you don't need it (ok, I'm lazy as well, takes a lot of extra lines to put a header line in the file )

Hope this is what you wanted.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to change some text of a certain line of a text file with bash and *nix scripting alred Programming 6 07-10-2006 11:55 AM
scrip for filtering text files irfanhab Programming 3 04-09-2006 02:18 AM
Which light text editor can copy text from file and paste in browser? davidas Linux - Software 9 03-06-2006 11:28 AM
Open Office fonts look ugly (and print ugly too) TheOneAndOnlySM Linux - Software 6 10-17-2003 07:12 PM
Text filtering Dark_Helmet Programming 7 06-04-2003 12:17 PM


All times are GMT -5. The time now is 04:30 AM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration