LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-06-2007, 03:38 PM   #1
cdog
Member
 
Registered: Dec 2005
Posts: 65

Rep: Reputation: 15
How to scan a file with 2 different field separators?


I have a linux assignment - I'm not asking for any code, just an idea - and the task is to scan a text file and sort it first by one field separator, than every field to sort it by another separator. Awk looks like a good choice but I don't know how to use 2 FS in one awk script or to call an awk script from another awk script.
 
Old 02-06-2007, 04:12 PM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
If the scanning process is sequential (that is the output of the first scanning represents in toto the input for the second scanning) you can pipe two awk calls, as in
Code:
 awk -F, 'some_awk_code_here' filename | awk -F: 'some_other_awk_code_here'
In this example the option -F tells what separator is being used ("," in the first call, ":" in the second one). For a more sofisticated interaction between I/O, please post an example of what has to be done. It will be my pleasure to help!
 
Old 02-06-2007, 04:31 PM   #3
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora
Posts: 3,935
Blog Entries: 5

Rep: Reputation: Disabled
For the hell of it, here's another suggestion. We'll go with the same assumption -- that one delimiter is a comma and the other is a colon.

First, run it through tr to change all delimiters to the same type:
Code:
tr ',' ':' < some-data-file
After that, all commas will be translated to colons. So you can use colucix's awk invocation above, except only for colons. (In other words, once everything has been changed to a single delimiter, life becomes easier.)

[ caveat: It's entirely possible I am misunderstanding what you are trying to do. I'm not too sure what you mean when you say "sort by separator". ]
 
Old 02-06-2007, 05:19 PM   #4
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Here's an extract from the info file for sort which may suggest a way to accomplish your task.
Code:
   * Sort a set of log files, primarily by IPv4 address and secondarily
     by time stamp.  If two lines' primary and secondary keys are
     identical, output the lines in the same order that they were
     input.  The log files contain lines that look like this:

          4.150.156.3 - - [01/Apr/2004:06:31:51 +0000] message 1
          211.24.3.231 - - [24/Apr/2004:20:17:39 +0000] message 2

     Fields are separated by exactly one space.  Sort IPv4 addresses
     lexicographically, e.g., 212.61.52.2 sorts before 212.129.233.201
     because 61 is less than 129.

          sort -s -t ' ' -k 4.9n -k 4.5M -k 4.2n -k 4.14,4.21 file*.log |
          sort -s -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n

     This example cannot be done with a single `sort' invocation, since
     IPv4 address components are separated by `.' while dates come just
     after a space.  So it is broken down into two invocations of
     `sort': the first sorts by time stamp and the second by IPv4
     address.  The time stamp is sorted by year, then month, then day,
     and finally by hour-minute-second field, using `-k' to isolate each
     field.  Except for hour-minute-second there's no need to specify
     the end of each key field, since the `n' and `M' modifiers sort
     based on leading prefixes that cannot cross field boundaries.  The
     IPv4 addresses are sorted lexicographically.  The second sort uses
     `-s' so that ties in the primary key are broken by the secondary
     key; the first sort uses `-s' so that the combination of the two
     sorts is stable.
 
Old 02-07-2007, 07:41 AM   #5
cdog
Member
 
Registered: Dec 2005
Posts: 65

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by colucix
If the scanning process is sequential (that is the output of the first scanning represents in toto the input for the second scanning) you can pipe two awk calls, as in
Code:
 awk -F, 'some_awk_code_here' filename | awk -F: 'some_other_awk_code_here'
In this example the option -F tells what separator is being used ("," in the first call, ":" in the second one). For a more sofisticated interaction between I/O, please post an example of what has to be done. It will be my pleasure to help!
excellent idea colucix. is there a way to implement this into one awk script?
anomie, I cannot transform every separator because I need different functions to be run on different fields.
PTrenholme, also a good idea, but I don't want to sort the fields.
thank you all.
 
Old 02-07-2007, 08:25 AM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by cdog
is there a way to implement this into one awk script?
Do you mean to implement a single script to perform a single call to awk? The answer is always YES, since awk is a very powerful scripting language. However in this case i suggest to specify the first Field Separator in the BEGIN section of the script, e.g.
Code:
BEGIN { FS = "," }
then we can process each field by splitting it in other subfield by means of the split function, e.g.
Code:
split($2,names,":")
this is just an example which split the 2nd field using ":" as separator and assigning the splitted fields to the array "names". Then you can do some other processing on each element of the array.
A correct answer to your question requires the knowledge of the task you have to accomplish. By the way, this is a general issue for the great awk language!
 
Old 02-07-2007, 08:59 AM   #7
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

(GNU) Awk accepts multiple separators.

awk -F",|:" '{ .........}' infile

The , and the : are used as separator.
Code:
$ cat infile 
foo,bar:foobar,barfoo:end

$ awk -F":|," '{ print $2, $5 }' infile 
bar end
Hope this helps.
 
Old 02-07-2007, 10:33 AM   #8
cdog
Member
 
Registered: Dec 2005
Posts: 65

Original Poster
Rep: Reputation: 15
thanks guys but I cannot use your ideas:
colucix: I need the fields inside the big field in order and using arrays I cannot acomplish this.
druuna: I need to ditinguish between the fields separated by ":" and the ones separated by ","
 
Old 02-07-2007, 10:39 AM   #9
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora
Posts: 3,935
Blog Entries: 5

Rep: Reputation: Disabled
cdog,

Post some sample data and how you want the results to come out. That'll make this less ambiguous and get you help quicker.
 
Old 02-07-2007, 10:56 AM   #10
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

If you need to distinguish between the field separators, it seems that PTrenholme gave the answer (post #4). Sort can do this.

Man sort or info sort for details.
 
Old 02-07-2007, 11:56 AM   #11
cdog
Member
 
Registered: Dec 2005
Posts: 65

Original Poster
Rep: Reputation: 15
druuna I don't want to sort the input. is there a way to use sort with its options and not sort the input?
anomie, here is an example: january,february,june:sunday,saturday,monday. the output will be 1,2,6:1,7,2; something like that
 
Old 02-07-2007, 12:20 PM   #12
cdog
Member
 
Registered: Dec 2005
Posts: 65

Original Poster
Rep: Reputation: 15
colucix, I take it back your idea works, runnning throw the array using for(index in array) does not get the elements in order but using for (i=1;i=array_size;i++) does. thanks
 
Old 02-07-2007, 12:28 PM   #13
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

If your input has a fixed layout you could use something like this (shortened, but you probably get the idea):
Code:
#!/bin/bash

awk '
  BEGIN {
          FS = "[,:]"

          # Fill month array with months/number pairs
          month["january"] = "1"; month["february"] = "2"
          month["june"] = "6"

          # Fill week array with week/number pairs
          week["sunday"] = "1"   ; week["monday"] = "2"
          week["saturday"] = "7"
  }
  {
    print month[$1]","month[$2]","month[$3]":"week[$4]","week[$5]","week[$6]
  } ' infile
Hope this helps.
 
Old 02-07-2007, 01:38 PM   #14
cdog
Member
 
Registered: Dec 2005
Posts: 65

Original Poster
Rep: Reputation: 15
druuna the input is not fix, but I managed to solve it using something similar. thanks.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
quick hand with awk multiple field separators pld Programming 10 05-28-2010 07:51 AM
Sort File by Field - but with a Twist! ;) moo-cow Programming 8 06-12-2006 11:26 AM
What is the data type field definition to save RTF file? Linux4BC Linux - General 3 06-02-2004 04:19 AM
MySQL Load Data Separators joelhop Linux - Newbie 4 05-15-2004 10:26 PM
Reading data from file (field organizzation) eiem Programming 1 03-29-2004 05:03 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 12:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration