LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 05-16-2013, 12:01 AM   #1
kcapple
LQ Newbie
 
Registered: May 2013
Posts: 19

Rep: Reputation: Disabled
awk - how to read multiple files?


Hi, is there a ways to read multiple files in a single awk command?

For example:
Code:
awk -f file1 file2 file3
I've search about it with google, most of them suggest using FNR. But I don't understand how it works. It will be a great help if someone able to explain it in simple term with some example.

*I'm new to Unix enviroment*

Last edited by kcapple; 05-16-2013 at 12:02 AM.
 
Old 05-16-2013, 12:14 AM   #2
Ygrex
Member
 
Registered: Nov 2004
Location: Russia (St.Petersburg)
Distribution: Debian
Posts: 641

Rep: Reputation: 66
do you mean this: awk -f file1 -f file2 -f file3
or this: cat file2 file3 | awk -f file1
?
 
Old 05-16-2013, 12:20 AM   #3
kcapple
LQ Newbie
 
Registered: May 2013
Posts: 19

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Ygrex View Post
do you mean this: awk -f file1 -f file2 -f file3
or this: cat file2 file3 | awk -f file1
?
I'm sorry, I've miss some part. The command suppose to be like this
Code:
awk -f awk_script file1 file2 file3
What I'm trying to do is I write some awk script to read multiple file and process it. But the problem is I don't understand how to do it.

Last edited by kcapple; 05-16-2013 at 12:22 AM.
 
Old 05-16-2013, 02:11 AM   #4
Ygrex
Member
 
Registered: Nov 2004
Location: Russia (St.Petersburg)
Distribution: Debian
Posts: 641

Rep: Reputation: 66
is this wrong for you?
Code:
cat file[123] | awk -f awk_script
alternatively providing you want to preserve line numbers:
Code:
for f in file[123] ; do awk -f awk_script "$f" ; done
 
Old 05-16-2013, 02:34 AM   #5
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Quote:
Originally Posted by kcapple View Post
I'm sorry, I've miss some part. The command suppose to be like this
Code:
awk -f awk_script file1 file2 file3
What I'm trying to do is I write some awk script to read multiple file and process it. But the problem is I don't understand how to do it.
Awk can handle multiple input files (the green part) by default. In the example given awk will start reading file1, one line at the time and when no more lines are present it will continue with file2 and then file3.

The blue part tells awk to get its commands from a file called awk_script. The programming logic ("commands") can be found inside it.

I'm not sure if and why you need the FNR variable. This variable holds the line number it processes (if multiple input files are used it starts with 1 again if it starts with a new input file).

Here's a very basic example:
Content of the input files:
Code:
$ cat file1
a
b
c
$ cat file2
1
2
3
$ cat file3
A
B
C
Content of the awk_script file:
Code:
$ cat awk_script
BEGIN { print "Start awk script" }
{
  print "File line number:", FNR, " - Line content:", $0
}
END { print "End awk script" }
And if you execute the above you will get this:
Code:
$  awk -f awk_script file1 file2 file3
Start awk script
File line number: 1  - Line content: a
File line number: 2  - Line content: b
File line number: 3  - Line content: c
File line number: 1  - Line content: 1
File line number: 2  - Line content: 2
File line number: 3  - Line content: 3
File line number: 1  - Line content: A
File line number: 2  - Line content: B
File line number: 3  - Line content: C
End awk script
Maybe these links will help:

Last edited by druuna; 05-16-2013 at 02:38 AM.
 
1 members found this post helpful.
Old 05-16-2013, 02:56 AM   #6
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Distribution: UBUNTU 5.10 since Jul-18,2006 on Intel 820 DC
Posts: 583

Rep: Reputation: 121Reputation: 121
Yes. Very much possible.

Syntax is is awk '{commands ..}' file1 file2 ..

FNR combined with NR and FILENAME are the way to go.

FNR tells the current record number of all files and NR of the current file.
Eg. assume that file1 has 2000 records and file2 has 1921 and file 3 has 4000.

So long as FNR==NR, you are on the first file.

FILENAME tells the name of the current file. So you can decide what to do based on this.

Actually the usage is not uncommon.

First file could have a different format/content and so on. So you need to differentiate between the files.

OK

Last edited by AnanthaP; 05-16-2013 at 03:01 AM.
 
Old 05-16-2013, 01:07 PM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
IMO, FNR is best used with only two files, and using it gets more complex when you get into three or more. It's also, I believe, a gawk/nawk extension and isn't available in traditional awk.

A more robust solution would probably require testing the ARGC/ARGV values that keep track of the input arguments.

http://www.gnu.org/software/gawk/man...o_002dset.html
http://www.gnu.org/software/gawk/man...C-and-ARGV.htm

Now, as for your specific question, we really need more than some vaguely-worded half-explanations about what you want to do. Please explain your exact goals in more detail, along with some examples of both input and output, and perhaps a bit of the overall coding context, so that we can understand you better. The exact methods to use often depend very much on the particulars of the coding situation, and without proper background knowledge we can only give you guesses and general suggestions.
 
Old 05-17-2013, 06:33 AM   #8
kcapple
LQ Newbie
 
Registered: May 2013
Posts: 19

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by David the H. View Post
IMO, FNR is best used with only two files, and using it gets more complex when you get into three or more. It's also, I believe, a gawk/nawk extension and isn't available in traditional awk.

A more robust solution would probably require testing the ARGC/ARGV values that keep track of the input arguments.

http://www.gnu.org/software/gawk/man...o_002dset.html
http://www.gnu.org/software/gawk/man...C-and-ARGV.htm

Now, as for your specific question, we really need more than some vaguely-worded half-explanations about what you want to do. Please explain your exact goals in more detail, along with some examples of both input and output, and perhaps a bit of the overall coding context, so that we can understand you better. The exact methods to use often depend very much on the particulars of the coding situation, and without proper background knowledge we can only give you guesses and general suggestions.
I have few files with the format:
File1
Code:
Red Apple 8 3
Orange 10 4
Tomatoes 10 5
File2
Code:
Orange 5 5
Red Apple 10 4
Tomatoes 11 3
File3
Code:
Tomatoes 5 4
Orange 5
Red Apple 3
The $2 is the quantities whereas the $3 is the price. I'm require to process these 3 files with a written awk script. I've look up several sites and they recommend the using of FNR==NR or the FILENAME. I'm not sure how to use it and not sure which is the best option to use. It will be a great help if you can show me an example and explain the usage (I'm the kind of guy who pick up slow)

Last edited by kcapple; 05-17-2013 at 06:36 AM.
 
Old 05-17-2013, 06:49 AM   #9
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
As you might have noticed we are willing to help you, but.....

You still haven't told us what it is that needs to be done with the content of these files.

- Add the similar entries (quantities and/or price) per file?
- Add the similar entries (quantities and/or price) for all files?
- Calculate the cost for each entry per file?
- Calculate the total cost for each file?
- Calculate the total cost for all the file?
- ???
- ???

Please tell us what needs to be done so we can point you in the correct direction (which might or might not need to use of FNR).
 
Old 05-17-2013, 07:11 AM   #10
kcapple
LQ Newbie
 
Registered: May 2013
Posts: 19

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by druuna View Post
As you might have noticed we are willing to help you, but.....

You still haven't told us what it is that needs to be done with the content of these files.

- Add the similar entries (quantities and/or price) per file?
- Add the similar entries (quantities and/or price) for all files?
- Calculate the cost for each entry per file?
- Calculate the total cost for each file?
- Calculate the total cost for all the file?
- ???
- ???

Please tell us what needs to be done so we can point you in the correct direction (which might or might not need to use of FNR).
I'm really sorry! I'm not really good at explaining things.
What I'm trying to do is calculate the total cost of each fruit/vegetable in every files and output in a sorted format depending on the total. For example, the total cost of Red Apple in each file is 8*3 + 10*4 + 3*3=81, Tomatoes is 103, Orange is 80. In the output it will be like this
Code:
Tomatoes 103
Red Apple 81
Orange 80
 
Old 05-17-2013, 07:35 AM   #11
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
I do believe I understand what it is you are after, I do need some more info:

- Are the given examples correct: file3 seems to be missing some information.
- Are all fields separated by spaces. One fruit (Red Apple) has a space in its name, which makes this more challenging.
- Your math is also not correct 8*3 + 10*4 + 3*3=81 73

Please be careful with the examples posted, it needs to reflect the correct input used.

EDIT: You mention that the output needs to be sorted: On what field? The name or the highest/lowest total price?

Last edited by druuna; 05-17-2013 at 07:38 AM.
 
Old 05-17-2013, 07:45 AM   #12
kcapple
LQ Newbie
 
Registered: May 2013
Posts: 19

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by druuna View Post
I do believe I understand what it is you are after, I do need some more info:

- Are the given examples correct: file3 seems to be missing some information.
- Are all fields separated by spaces. One fruit (Red Apple) has a space in its name, which makes this more challenging.
- Your math is also not correct 8*3 + 10*4 + 3*3=81 73

Please be careful with the examples posted, it needs to reflect the correct input used.

EDIT: You mention that the output needs to be sorted: On what field? The name or the highest/lowest total price?
I'm sorry I'm too clumsy
file3:
Code:
Tomatoes 5 4
Orange 5 4
Red Apple 3 4
Yes, the field is separated with spaces and you're right with the calculation too.
 
Old 05-17-2013, 08:19 AM   #13
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
The Red Apple entry makes it more challenging due to the extra field it creates (the Red Apple lines has 4 fields and the rest have 3 fields). But it is possible, here's one way:
Code:
$ cat awk_script
!/^Red/ { fruits[$1]      = fruits[$1] + $2 * $3 }
 /^Red/ { fruits[$1" "$2] = fruits[$1" "$2] + $3 * $4 }
END {
  for ( item in fruits )
    print item, fruits[item]
}
The above code uses an array called fruits to store the fruit and its total cost.
The green line looks for lines that do not (the !) start with Red. If this is the case then the name of the fruit ($1) is stored as the unique index and fields 2 and 3 are multiplied and added to the value present.

The blue line looks for entries that start with Red, but now the index consists of 2 fields ($1 = Red and $2 = Apple). The amount and price are now $3 and $4.

Once all the files are processed, the brown part is executed. This prints all the entries in the array (index and value).

A sample run with the 3 files you posted as input:
Code:
$ awk -f awk_script file1 file2 file3 
Tomatoes 103
Red Apple 76
Orange 85
And you might have noticed that FNR isn't needed.

BTW: You did not answer the question about sorting the output. If that is needed use the sort command. Sorting in (g)awk is possible but cumbersome.
 
1 members found this post helpful.
Old 05-17-2013, 09:04 AM   #14
kcapple
LQ Newbie
 
Registered: May 2013
Posts: 19

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by druuna View Post
The Red Apple entry makes it more challenging due to the extra field it creates (the Red Apple lines has 4 fields and the rest have 3 fields). But it is possible, here's one way:
Code:
$ cat awk_script
!/^Red/ { fruits[$1]      = fruits[$1] + $2 * $3 }
 /^Red/ { fruits[$1" "$2] = fruits[$1" "$2] + $3 * $4 }
END {
  for ( item in fruits )
    print item, fruits[item]
}
The above code uses an array called fruits to store the fruit and its total cost.
The green line looks for lines that do not (the !) start with Red. If this is the case then the name of the fruit ($1) is stored as the unique index and fields 2 and 3 are multiplied and added to the value present.

The blue line looks for entries that start with Red, but now the index consists of 2 fields ($1 = Red and $2 = Apple). The amount and price are now $3 and $4.

Once all the files are processed, the brown part is executed. This prints all the entries in the array (index and value).

A sample run with the 3 files you posted as input:
Code:
$ awk -f awk_script file1 file2 file3 
Tomatoes 103
Red Apple 76
Orange 85
And you might have noticed that FNR isn't needed.

BTW: You did not answer the question about sorting the output. If that is needed use the sort command. Sorting in (g)awk is possible but cumbersome.
Thanks for the explanation and example! It's helpful!! <3
 
  


Reply

Tags
awk


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Replacing multiple string in multiple files with awk jnorbert Linux - Newbie 9 03-26-2013 12:39 PM
[SOLVED] awk question - read in txt files, offset data by given amount, output new txt files pomico Programming 19 09-17-2012 11:43 AM
extracting columns from multiple files with awk orcaja Linux - Newbie 7 02-14-2012 10:24 PM
[SOLVED] Awk on multiple .gz files BarataPT Programming 3 03-22-2011 05:13 AM
How to split a file into multiple files using AWK? keenboy Linux - General 1 08-05-2010 01:18 PM


All times are GMT -5. The time now is 06:58 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration