[SOLVED] awk

kcapple · 05-16-2013, 12:01 AM

Hi, is there a ways to read multiple files in a single awk command?

For example:

Code:

awk -f file1 file2 file3

I've search about it with google, most of them suggest using FNR. But I don't understand how it works. It will be a great help if someone able to explain it in simple term with some example.

*I'm new to Unix enviroment*

Ygrex · 05-16-2013, 12:14 AM

do you mean this: awk -f file1 -f file2 -f file3
or this: cat file2 file3 | awk -f file1
?

kcapple · 05-16-2013, 12:20 AM

Quote:

Originally Posted by Ygrex

do you mean this: awk -f file1 -f file2 -f file3
or this: cat file2 file3 | awk -f file1
?

I'm sorry, I've miss some part. The command suppose to be like this

Code:

awk -f awk_script file1 file2 file3

What I'm trying to do is I write some awk script to read multiple file and process it. But the problem is I don't understand how to do it.

Ygrex · 05-16-2013, 02:11 AM

is this wrong for you?

Code:

cat file[123] | awk -f awk_script

alternatively providing you want to preserve line numbers:

Code:

for f in file[123] ; do awk -f awk_script "$f" ; done

druuna · 05-16-2013, 02:34 AM

Quote:

Originally Posted by kcapple

I'm sorry, I've miss some part. The command suppose to be like this

Code:

awk -f awk_script file1 file2 file3

What I'm trying to do is I write some awk script to read multiple file and process it. But the problem is I don't understand how to do it.

Awk can handle multiple input files (the green part) by default. In the example given awk will start reading file1, one line at the time and when no more lines are present it will continue with file2 and then file3.

The blue part tells awk to get its commands from a file called awk_script. The programming logic ("commands") can be found inside it.

I'm not sure if and why you need the FNR variable. This variable holds the line number it processes (if multiple input files are used it starts with 1 again if it starts with a new input file).

Here's a very basic example:
Content of the input files:

Code:

$ cat file1
a
b
c
$ cat file2
1
2
3
$ cat file3
A
B
C

Content of the awk_script file:

Code:

$ cat awk_script
BEGIN { print "Start awk script" }
{
  print "File line number:", FNR, " - Line content:", $0
}
END { print "End awk script" }

And if you execute the above you will get this:

Code:

$  awk -f awk_script file1 file2 file3
Start awk script
File line number: 1  - Line content: a
File line number: 2  - Line content: b
File line number: 3  - Line content: c
File line number: 1  - Line content: 1
File line number: 2  - Line content: 2
File line number: 3  - Line content: 3
File line number: 1  - Line content: A
File line number: 2  - Line content: B
File line number: 3  - Line content: C
End awk script

Maybe these links will help:

O'Reilly: Sed/Awk

AnanthaP · 05-16-2013, 02:56 AM

Yes. Very much possible.

Syntax is is awk '{commands ..}' file1 file2 ..

FNR combined with NR and FILENAME are the way to go.

FNR tells the current record number of all files and NR of the current file.
Eg. assume that file1 has 2000 records and file2 has 1921 and file 3 has 4000.

So long as FNR==NR, you are on the first file.

FILENAME tells the name of the current file. So you can decide what to do based on this.

Actually the usage is not uncommon.

First file could have a different format/content and so on. So you need to differentiate between the files.

OK

David the H. · 05-16-2013, 01:07 PM

IMO, FNR is best used with only two files, and using it gets more complex when you get into three or more. It's also, I believe, a gawk/nawk extension and isn't available in traditional awk.

A more robust solution would probably require testing the ARGC/ARGV values that keep track of the input arguments.

http://www.gnu.org/software/gawk/man...o_002dset.html
http://www.gnu.org/software/gawk/man...C-and-ARGV.htm

Now, as for your specific question, we really need more than some vaguely-worded half-explanations about what you want to do. Please explain your exact goals in more detail, along with some examples of both input and output, and perhaps a bit of the overall coding context, so that we can understand you better. The exact methods to use often depend very much on the particulars of the coding situation, and without proper background knowledge we can only give you guesses and general suggestions.

kcapple · 05-17-2013, 06:33 AM

Quote:

Originally Posted by David the H.

IMO, FNR is best used with only two files, and using it gets more complex when you get into three or more. It's also, I believe, a gawk/nawk extension and isn't available in traditional awk.

A more robust solution would probably require testing the ARGC/ARGV values that keep track of the input arguments.

http://www.gnu.org/software/gawk/man...o_002dset.html
http://www.gnu.org/software/gawk/man...C-and-ARGV.htm

Now, as for your specific question, we really need more than some vaguely-worded half-explanations about what you want to do. Please explain your exact goals in more detail, along with some examples of both input and output, and perhaps a bit of the overall coding context, so that we can understand you better. The exact methods to use often depend very much on the particulars of the coding situation, and without proper background knowledge we can only give you guesses and general suggestions.

I have few files with the format:
File1

Code:

Red Apple 8 3
Orange 10 4
Tomatoes 10 5

File2

Code:

Orange 5 5
Red Apple 10 4
Tomatoes 11 3

File3

Code:

Tomatoes 5 4
Orange 5
Red Apple 3

The $2 is the quantities whereas the $3 is the price. I'm require to process these 3 files with a written awk script. I've look up several sites and they recommend the using of FNR==NR or the FILENAME. I'm not sure how to use it and not sure which is the best option to use. It will be a great help if you can show me an example and explain the usage (I'm the kind of guy who pick up slow)

druuna · 05-17-2013, 06:49 AM

As you might have noticed we are willing to help you, but.....

You still haven't told us what it is that needs to be done with the content of these files.

- Add the similar entries (quantities and/or price) per file?
- Add the similar entries (quantities and/or price) for all files?
- Calculate the cost for each entry per file?
- Calculate the total cost for each file?
- Calculate the total cost for all the file?
- ???
- ???

Please tell us what needs to be done so we can point you in the correct direction (which might or might not need to use of FNR).

kcapple · 05-17-2013, 07:11 AM

Quote:

Originally Posted by druuna

As you might have noticed we are willing to help you, but.....

You still haven't told us what it is that needs to be done with the content of these files.

- Add the similar entries (quantities and/or price) per file?
- Add the similar entries (quantities and/or price) for all files?
- Calculate the cost for each entry per file?
- Calculate the total cost for each file?
- Calculate the total cost for all the file?
- ???
- ???

Please tell us what needs to be done so we can point you in the correct direction (which might or might not need to use of FNR).

I'm really sorry!

I'm not really good at explaining things.
What I'm trying to do is calculate the total cost of each fruit/vegetable in every files and output in a sorted format depending on the total. For example, the total cost of Red Apple in each file is 8*3 + 10*4 + 3*3=81, Tomatoes is 103, Orange is 80. In the output it will be like this

Code:

Tomatoes 103
Red Apple 81
Orange 80

druuna · 05-17-2013, 07:35 AM

I do believe I understand what it is you are after, I do need some more info:

- Are the given examples correct: file3 seems to be missing some information.
- Are all fields separated by spaces. One fruit (Red Apple) has a space in its name, which makes this more challenging.
- Your math is also not correct 8*3 + 10*4 + 3*3=81 73

Please be careful with the examples posted, it needs to reflect the correct input used.

EDIT: You mention that the output needs to be sorted: On what field? The name or the highest/lowest total price?

kcapple · 05-17-2013, 07:45 AM

Quote:

Originally Posted by druuna

I do believe I understand what it is you are after, I do need some more info:

- Are the given examples correct: file3 seems to be missing some information.
- Are all fields separated by spaces. One fruit (Red Apple) has a space in its name, which makes this more challenging.
- Your math is also not correct 8*3 + 10*4 + 3*3=81 73

Please be careful with the examples posted, it needs to reflect the correct input used.

EDIT: You mention that the output needs to be sorted: On what field? The name or the highest/lowest total price?

I'm sorry I'm too clumsy

file3:

Code:

Tomatoes 5 4
Orange 5 4
Red Apple 3 4

Yes, the field is separated with spaces and you're right with the calculation too.

druuna · 05-17-2013, 08:19 AM

The Red Apple entry makes it more challenging due to the extra field it creates (the Red Apple lines has 4 fields and the rest have 3 fields). But it is possible, here's one way:

Code:

$ cat awk_script
!/^Red/ { fruits[$1]      = fruits[$1] + $2 * $3 }
 /^Red/ { fruits[$1" "$2] = fruits[$1" "$2] + $3 * $4 }
END {
  for ( item in fruits )
    print item, fruits[item]
}

The above code uses an array called fruits to store the fruit and its total cost.
The green line looks for lines that do not (the !) start with Red. If this is the case then the name of the fruit ($1) is stored as the unique index and fields 2 and 3 are multiplied and added to the value present.

The blue line looks for entries that start with Red, but now the index consists of 2 fields ($1 = Red and $2 = Apple). The amount and price are now $3 and $4.

Once all the files are processed, the brown part is executed. This prints all the entries in the array (index and value).

A sample run with the 3 files you posted as input:

Code:

$ awk -f awk_script file1 file2 file3 
Tomatoes 103
Red Apple 76
Orange 85

And you might have noticed that FNR isn't needed.

BTW: You did not answer the question about sorting the output. If that is needed use the sort command. Sorting in (g)awk is possible but cumbersome.

kcapple · 05-17-2013, 09:04 AM

Quote:

Originally Posted by druuna

The Red Apple entry makes it more challenging due to the extra field it creates (the Red Apple lines has 4 fields and the rest have 3 fields). But it is possible, here's one way:

Code:

$ cat awk_script
!/^Red/ { fruits[$1]      = fruits[$1] + $2 * $3 }
 /^Red/ { fruits[$1" "$2] = fruits[$1" "$2] + $3 * $4 }
END {
  for ( item in fruits )
    print item, fruits[item]
}

The above code uses an array called fruits to store the fruit and its total cost.
The green line looks for lines that do not (the !) start with Red. If this is the case then the name of the fruit ($1) is stored as the unique index and fields 2 and 3 are multiplied and added to the value present.

The blue line looks for entries that start with Red, but now the index consists of 2 fields ($1 = Red and $2 = Apple). The amount and price are now $3 and $4.

Once all the files are processed, the brown part is executed. This prints all the entries in the array (index and value).

A sample run with the 3 files you posted as input:

Code:

$ awk -f awk_script file1 file2 file3 
Tomatoes 103
Red Apple 76
Orange 85

And you might have noticed that FNR isn't needed.

BTW: You did not answer the question about sorting the output. If that is needed use the sort command. Sorting in (g)awk is possible but cumbersome.

Thanks for the explanation and example! It's helpful!! <3