LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-11-2010, 10:43 AM   #1
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
AWK Multiple Passes


Is it possible to make an AWK script that goes over the file multiple times, every time doing something different to it?
 
Old 03-11-2010, 11:39 AM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
The gawk manual suggests a function to re-read the file multiple times: http://www.gnu.org/manual/gawk/gawk....ewind-Function. Once, I used something slightly different without using a function:
Code:
NR == 1 {

  #
  #  Retrieve the number of lines in the current file
  #
  ( "cat " FILENAME " | wc -l" ) | getline NL   
  
}

{

  #
  #  Code here
  #

}

FNR == NL {

  #
  #  Rewind up to 4 times (file parsed 5 times)
  #
  while ( ++count < 5 ) {
    ARGC++
    ARGV[ARGIND+1] = FILENAME
    nextfile
  }
  
}
Note that the line count is performed only once, since NR == 1 is the first parsed record (you can't to this in the BEGIN section, since FILENAME is not assigned there). On the contrary the rewind is applied every time at the end of the file, since FNR is the record number of the current file.
 
Old 03-11-2010, 11:40 AM   #3
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2401Reputation: 2401Reputation: 2401Reputation: 2401Reputation: 2401Reputation: 2401Reputation: 2401Reputation: 2401Reputation: 2401Reputation: 2401Reputation: 2401
Hi,

Could you elaborate a bit? Maybe give an example of what you want to accomplish?

Doing multiple actions on a specific line, maybe only when meeting certain criteria, can be done in one go (just an example).
 
Old 03-11-2010, 12:43 PM   #4
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Basically I would like it to be like running different awk scripts one after another on the file.

I just wonder if this is possible, of would it be better to resort to a bash script?
 
Old 03-11-2010, 12:57 PM   #5
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by MTK358 View Post
Basically I would like it to be like running different awk scripts one after another on the file.

I just wonder if this is possible, of would it be better to resort to a bash script?
Maybe you don't really need to rewind, since you can perform multiple actions on each line - as druuna pointed out. On the other hand, if you think it is a requirement, you can rely on the "count" variable in the example above to perform different actions at each passage. Just take care to initialize "count" to zero in the BEGIN section, otherwise it will be the null string upon the first passage.
 
Old 03-11-2010, 08:37 PM   #6
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by MTK358 View Post
Is it possible to make an AWK script that goes over the file multiple times, every time doing something different to it?
sure, besides the rewind function in the gawk manual, if you have a specific number of times you want to process, eg 6 times
Code:
awk '{
 ...
}' file file file file file file
or if you have variable number of them

Code:
awk 'BEGIN{
 
   while(1){
       while(( getline line <"file") > 0 ){
           print line
       }      
       close("file")
       printf "Quit?: "
       getline choice<"-"
       if (choice~/^(q|Q$/) exit
   }
}
'
or use a for loop with a start and end range
Code:
awk 'BEGIN{ 
   for(i=1;i<100;i++){
       while(( getline line <"file") > 0 ){
           print line
       }      
       close("file")
       printf "Quit?: "
       getline choice<"-"
       if (choice~/^(q|Q$/) exit
   }
}
'

or just simply
Code:
for i in {1..100}
do
  awk '
   .....
  ' file
done

Last edited by ghostdog74; 03-11-2010 at 08:44 PM.
 
Old 03-12-2010, 07:39 AM   #7
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
But I want it to do a different script each time!
 
Old 03-12-2010, 08:01 AM   #8
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by MTK358 View Post
But I want it to do a different script each time!
If the code you want to run at each passage is totally different every time, maybe there is no need to have all in a single awk program. In this case you can stick with running multiple awk statements (or programs).

If your requirement (which sincerely it's still not clear) is to have all the code in a single program, the method I suggested in posts #2 and #5 could be a good starting point. You can differentiate the code to run at each passage by means of the loop variable "count". For example something like:
Code:
<omitted>

count == 0 {
   #
   # code to run at first pass, here
   #
}

count == 1 {
   #
   # code to run at second pass, here
   #
}

<omitted>
or you can use explicit if conditions inside any block (action).
 
Old 03-12-2010, 08:13 AM   #9
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Basically I wanted the equivalent of this:

Code:
mv oldfile tempfile
awk -f script1.awk tempfile > newfile
mv newfile tempfile
awk -f script2.awk tempfile > newfile
mv newfile tempfile
awk -f script3.awk tempfile > newfile
rm tempfile
 
Old 03-12-2010, 08:18 AM   #10
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Do you mean many awk scripts in many awk processes simultaneously processing a single file? Or a single awk process processing many files ... processing another file while holding (not yet closing) a previous file?

Edit: Please ignore.. message was delayed.

Last edited by konsolebox; 03-12-2010 at 08:19 AM.
 
Old 03-12-2010, 09:15 AM   #11
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by MTK358 View Post
Basically I wanted the equivalent of this:

Code:
mv oldfile tempfile
awk -f script1.awk tempfile > newfile
mv newfile tempfile
awk -f script2.awk tempfile > newfile
mv newfile tempfile
awk -f script3.awk tempfile > newfile
rm tempfile
Ok. The difference in respect of your previous assertion is that the code must run on a different file at every pass, not truly the same. In other words, each of these awk programs should do a progressive modification to the original content of the file, shouldn't they?!

In my opinion, the suggestion given in post #3 by druuna is the more suitable: you can try to find a logic to parse the original file only once - by performing multiple actions on the same record - and do all the modifications at once.

Otherwise, you have to exchange (move) the output with (to) the input file before the code rewinds it. To do this you have many options in GNU awk, maybe "using getline from a pipe" being the most suitable. But sincerely I'm afraid this is a waste of (machine and human) time.
 
Old 03-12-2010, 11:13 PM   #12
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 9,431
Blog Entries: 4

Rep: Reputation: 3375Reputation: 3375Reputation: 3375Reputation: 3375Reputation: 3375Reputation: 3375Reputation: 3375Reputation: 3375Reputation: 3375Reputation: 3375Reputation: 3375
Several things to consider here:

(1) When you expressed "what you wanted to do" in terms of a series of invocations of awk, followed by a series of mv commands ... well, what's so wrong with that? What if you simply wrote one script that generated another script as output, then executed the generated script? If that expression of the problem is sensible and effective and easy for a person to understand ... "Go for it! You're done!"

(2) TMTOWTDI = There's More Than One Way To Do It. This, of course, is a fundamental mantra of the Perl community, and they just happen to have a programming-language that is, in some ways (and yet, not quite) a big-brother of awk. Still, the essential idea of their "mantra" is well worth considering. There is, always, "more than one way to do it."

So... go for something that is simple, understandable, maintainable, and "it works." These days, the computer will accomplish the task with blistering speed, no matter how you write it. If you can easily explain the rationale behind what you did, and if "you can, repeatably, do it again at will," then it's perfectly sensible to just implement that solution and call it "done."

The FISI Rule = " it, ship it!"

Last edited by sundialsvcs; 03-12-2010 at 11:15 PM.
 
Old 03-12-2010, 11:31 PM   #13
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by sundialsvcs View Post
...
These days, the computer will accomplish the task with blistering speed, no matter how you write it.
...
Well, no. I still remember a coworker's program written in C++ working orders of magnitude slower than my Perl script doing the same thing because the C++ program was written using O(N ^ 2) algorithm.

So the manager cursed and rewrote that C++ program.
 
Old 03-12-2010, 11:32 PM   #14
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by sundialsvcs View Post
(2) TMTOWTDI = There's More Than One Way To Do It.
here's mine :TMTOWTDIBNAATB = There's More Than One Way To Do It, But Not All Are The Best.
 
Old 03-13-2010, 02:03 AM   #15
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Quote:
Originally Posted by sundialsvcs View Post
So... go for something that is simple, understandable, maintainable, and "it works." These days, the computer will accomplish the task with blistering speed, no matter how you write it. If you can easily explain the rationale behind what you did, and if "you can, repeatably, do it again at will," then it's perfectly sensible to just implement that solution and call it "done."
Good rationale but I hope it's not an alibi for not being able to make code better.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
AWK - combining multiple columns AlexYZ Programming 5 02-24-2010 08:09 AM
printing multiple columns with awk kdelover Programming 16 12-16-2009 10:10 AM
matching multiple values in awk vgr12386 Programming 3 06-15-2009 04:54 AM
AWK/SED Multiple pattern matching over multiple lines issue GigerMalmensteen Programming 15 12-03-2006 06:08 PM
Multiple Field Seperators in Awk... TheDarktrooper Programming 6 05-06-2004 05:50 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:25 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration