LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 03-19-2011, 03:25 PM   #1
Richard Rahl
Member
 
Registered: Aug 2006
Location: Cape Breton, D'Hara
Distribution: Ubuntu, Redhat
Posts: 83

Rep: Reputation: 15
gawk getline eats next line for processing


Good afternoon everyone,

I'm new to awk/gawk but I'm working on a simple data processing script.

My script uses a loop with getline to check for the value on the next line to decide if it's time to terminate the loop.

This works dandy, but the problem is that getline eats that line, which then isn't processed by the rules in the remainder of the script (even though I want it to be). To illustrate what I mean, consider this simple gawk script:

Code:
{   
  print $0
}
which prints every line in the file. Now consider this next script:

Code:
{   
  print $0
  getline
  #Do stuff with next line
}
This will only print every *second* line in the file, since getline ate the next line of input and it doesn't get run through the script proper.

My question is basically how to I reset the stream such that, after I do a getline, the line read by getline continues to be the next line executed by the script?

Thanks for your help!
 
Old 03-19-2011, 04:11 PM   #2
alunduil
Member
 
Registered: Feb 2005
Location: San Antonio, TX
Distribution: Gentoo
Posts: 684

Rep: Reputation: 62
Are you trying to manipulate the line you grab with getline outside of the gawk script? Can we see the larger context of what you're trying to do? It seems at first glance like you should be using gawks redirection but I'm simply guessing at this point.

Regards,

Alunduil
 
Old 03-19-2011, 04:23 PM   #3
Richard Rahl
Member
 
Registered: Aug 2006
Location: Cape Breton, D'Hara
Distribution: Ubuntu, Redhat
Posts: 83

Original Poster
Rep: Reputation: 15
restructuring to the rescue

Imagine a file with a bunch of lines like this:

FUN
bla N
FUN
bla2
bla3 N
FUN bla4 N

I want to turn it into this:
FUN bla N
FUN bla2;bla3 N
FUN bla4 N

So I was reading the next value of the line to see if it was a "FUN" record in order to break the loop.
The problem was that, the "FUN" next line would get totally ignored. In other words, the line that started with "FUN", used to break the loop, didn't get processed by the script proper and so the "FUN bla4 N" line wouldn't get printed

Code:
#!/usr/bin/gawk -f
#Match all lines matching "FUN"
/FUN/ {   
    #It's a fun record
    currentLine = $0
    do {
      getline
      if ($1 == "FUN") {
        #Into the next record
        break
      }
      currentLine = currentLine";"$0
    } while ($(NF) != "N")
    print currentLine
  }
}
Granted, minutes after posting I realized using a do while loop was stupid, putting the condition in the while loop solved the problem since getline was never called if the "N" record was present on the current line! Silly me.

Anyway, I suppose it's still an interesting question if you understand the context. Basically "getline" eats the next line of input so that it can't be processed by the script, even if you want it to be, so I was wondering how, even if you use getline, you can make the script process the next line as if "getline" hadn't been called.
 
Old 03-19-2011, 05:14 PM   #4
hda7
Member
 
Registered: May 2009
Distribution: Debian wheezy
Posts: 244

Rep: Reputation: 31
It appears (from reading the gawk manual) that you cannot preform normal processing of the line stolen with getline. From the gawk manual:
Quote:
When `getline' changes the value of `$0' and `NF', `awk' does _not_ automatically jump to the start of the program and start testing the new record against every pattern. However, the new record is tested against any subsequent rules.
 
Old 03-19-2011, 05:15 PM   #5
alunduil
Member
 
Registered: Feb 2005
Location: San Antonio, TX
Distribution: Gentoo
Posts: 684

Rep: Reputation: 62
Interesting, on a side note perhaps sed would be easier to do the parsing you want to do?

Regards,

Alunduil
 
Old 03-20-2011, 12:28 AM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,517

Rep: Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896
Getline is inappropriate as all calls to it change NR and FNR, sometimes other items as well, so the following iteration will always read the following line.

Your options are either to save the current line each time and then do your comparison on the next time through or
you could try something like:
Code:
awk 'ORS=/ N$/?"\n":(/[0-9]$/)?";":" "' file
Of course it will depend how your data is setup to how well this will work.
 
Old 03-22-2011, 09:03 AM   #7
hda7
Member
 
Registered: May 2009
Distribution: Debian wheezy
Posts: 244

Rep: Reputation: 31
Quote:
Originally Posted by grail View Post
Getline is inappropriate as all calls to it change NR and FNR, sometimes other items as well, so the following iteration will always read the following line.
That's not strictly true. You can get input from other files with the "getline <filename" syntax, which doesn't modify NR or FNR (but does modify $0 and NF), and if you use the "getline variable <filename" variant, only variable is modified.
It is actually possible to get the input of the file being read, as the following example shows:
Code:
FNR == 1 {
    AUXFILE = FILENAME
    getline line <AUXFILE
    print
    print "Aux line 1: " line
    getline line <AUXFILE
    print "Aux line 2: " line
}
FNR == 2 {
    print
}
though since the auxiliary file starts from the beginning, this may not be very useful.
 
Old 03-22-2011, 10:09 AM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,517

Rep: Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896
Well in a way you are still proving my point, although I do see where you are coming from. Whilst in your example getline is not altering NR and FNR of original file it is in fact altering
these values for the file being used by getline
 
Old 03-22-2011, 10:44 AM   #9
hda7
Member
 
Registered: May 2009
Distribution: Debian wheezy
Posts: 244

Rep: Reputation: 31
Quote:
Originally Posted by grail View Post
Well in a way you are still proving my point, although I do see where you are coming from. Whilst in your example getline is not altering NR and FNR of original file it is in fact altering
these values for the file being used by getline
It is not correct to associate NR with any particular file, as it is a global count of the number of records read across all normal input files (not including "getline <filename" files).
Also, while "getline <filename" does advance the file pointer for the file it is reading, you can close() the file and start reading from the beginning again.
 
  


Reply

Tags
awk, gawk, getline, scripts


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
gawk works from command line but not from cron fantasygoat Linux - Server 3 10-25-2011 01:18 PM
[SOLVED] gawk -- print in a new line without overwriting januka Programming 6 09-21-2010 01:30 PM
'\n' eats line when stdout opened from file descriptor rsood Programming 4 08-31-2008 12:14 AM
getline delimiter + move to next line? blizunt7 Programming 3 07-09-2005 12:08 AM
Deleting a line with gawk/awk caps_phisto Linux - General 4 11-06-2004 02:31 PM


All times are GMT -5. The time now is 12:00 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration