LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   How to scan a file with 2 different field separators? (https://www.linuxquestions.org/questions/linux-software-2/how-to-scan-a-file-with-2-different-field-separators-526174/)

cdog 02-06-2007 03:38 PM

How to scan a file with 2 different field separators?
 
I have a linux assignment - I'm not asking for any code, just an idea - and the task is to scan a text file and sort it first by one field separator, than every field to sort it by another separator. Awk looks like a good choice but I don't know how to use 2 FS in one awk script or to call an awk script from another awk script.

colucix 02-06-2007 04:12 PM

If the scanning process is sequential (that is the output of the first scanning represents in toto the input for the second scanning) you can pipe two awk calls, as in
Code:

awk -F, 'some_awk_code_here' filename | awk -F: 'some_other_awk_code_here'
In this example the option -F tells what separator is being used ("," in the first call, ":" in the second one). For a more sofisticated interaction between I/O, please post an example of what has to be done. It will be my pleasure to help!

anomie 02-06-2007 04:31 PM

For the hell of it, here's another suggestion. We'll go with the same assumption -- that one delimiter is a comma and the other is a colon.

First, run it through tr to change all delimiters to the same type:
Code:

tr ',' ':' < some-data-file
After that, all commas will be translated to colons. So you can use colucix's awk invocation above, except only for colons. (In other words, once everything has been changed to a single delimiter, life becomes easier.)

[ caveat: It's entirely possible I am misunderstanding what you are trying to do. I'm not too sure what you mean when you say "sort by separator". ]

PTrenholme 02-06-2007 05:19 PM

Here's an extract from the info file for sort which may suggest a way to accomplish your task.
Code:

  * Sort a set of log files, primarily by IPv4 address and secondarily
    by time stamp.  If two lines' primary and secondary keys are
    identical, output the lines in the same order that they were
    input.  The log files contain lines that look like this:

          4.150.156.3 - - [01/Apr/2004:06:31:51 +0000] message 1
          211.24.3.231 - - [24/Apr/2004:20:17:39 +0000] message 2

    Fields are separated by exactly one space.  Sort IPv4 addresses
    lexicographically, e.g., 212.61.52.2 sorts before 212.129.233.201
    because 61 is less than 129.

          sort -s -t ' ' -k 4.9n -k 4.5M -k 4.2n -k 4.14,4.21 file*.log |
          sort -s -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n

    This example cannot be done with a single `sort' invocation, since
    IPv4 address components are separated by `.' while dates come just
    after a space.  So it is broken down into two invocations of
    `sort': the first sorts by time stamp and the second by IPv4
    address.  The time stamp is sorted by year, then month, then day,
    and finally by hour-minute-second field, using `-k' to isolate each
    field.  Except for hour-minute-second there's no need to specify
    the end of each key field, since the `n' and `M' modifiers sort
    based on leading prefixes that cannot cross field boundaries.  The
    IPv4 addresses are sorted lexicographically.  The second sort uses
    `-s' so that ties in the primary key are broken by the secondary
    key; the first sort uses `-s' so that the combination of the two
    sorts is stable.


cdog 02-07-2007 07:41 AM

Quote:

Originally Posted by colucix
If the scanning process is sequential (that is the output of the first scanning represents in toto the input for the second scanning) you can pipe two awk calls, as in
Code:

awk -F, 'some_awk_code_here' filename | awk -F: 'some_other_awk_code_here'
In this example the option -F tells what separator is being used ("," in the first call, ":" in the second one). For a more sofisticated interaction between I/O, please post an example of what has to be done. It will be my pleasure to help!

excellent idea colucix. is there a way to implement this into one awk script?
anomie, I cannot transform every separator because I need different functions to be run on different fields.
PTrenholme, also a good idea, but I don't want to sort the fields.
thank you all.

colucix 02-07-2007 08:25 AM

Quote:

Originally Posted by cdog
is there a way to implement this into one awk script?

Do you mean to implement a single script to perform a single call to awk? The answer is always YES, since awk is a very powerful scripting language. However in this case i suggest to specify the first Field Separator in the BEGIN section of the script, e.g.
Code:

BEGIN { FS = "," }
then we can process each field by splitting it in other subfield by means of the split function, e.g.
Code:

split($2,names,":")
this is just an example which split the 2nd field using ":" as separator and assigning the splitted fields to the array "names". Then you can do some other processing on each element of the array.
A correct answer to your question requires the knowledge of the task you have to accomplish. By the way, this is a general issue for the great awk language!

druuna 02-07-2007 08:59 AM

Hi,

(GNU) Awk accepts multiple separators.

awk -F",|:" '{ .........}' infile

The , and the : are used as separator.
Code:

$ cat infile
foo,bar:foobar,barfoo:end

$ awk -F":|," '{ print $2, $5 }' infile
bar end

Hope this helps.

cdog 02-07-2007 10:33 AM

thanks guys but I cannot use your ideas:
colucix: I need the fields inside the big field in order and using arrays I cannot acomplish this.
druuna: I need to ditinguish between the fields separated by ":" and the ones separated by ","

anomie 02-07-2007 10:39 AM

cdog,

Post some sample data and how you want the results to come out. That'll make this less ambiguous and get you help quicker.

druuna 02-07-2007 10:56 AM

Hi,

If you need to distinguish between the field separators, it seems that PTrenholme gave the answer (post #4). Sort can do this.

Man sort or info sort for details.

cdog 02-07-2007 11:56 AM

druuna I don't want to sort the input. is there a way to use sort with its options and not sort the input?
anomie, here is an example: january,february,june:sunday,saturday,monday. the output will be 1,2,6:1,7,2; something like that

cdog 02-07-2007 12:20 PM

colucix, I take it back your idea works, runnning throw the array using for(index in array) does not get the elements in order but using for (i=1;i=array_size;i++) does. thanks

druuna 02-07-2007 12:28 PM

Hi,

If your input has a fixed layout you could use something like this (shortened, but you probably get the idea):
Code:

#!/bin/bash

awk '
  BEGIN {
          FS = "[,:]"

          # Fill month array with months/number pairs
          month["january"] = "1"; month["february"] = "2"
          month["june"] = "6"

          # Fill week array with week/number pairs
          week["sunday"] = "1"  ; week["monday"] = "2"
          week["saturday"] = "7"
  }
  {
    print month[$1]","month[$2]","month[$3]":"week[$4]","week[$5]","week[$6]
  } ' infile

Hope this helps.

cdog 02-07-2007 01:38 PM

druuna the input is not fix, but I managed to solve it using something similar. thanks.


All times are GMT -5. The time now is 02:07 AM.