LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   gawk getline eats next line for processing (http://www.linuxquestions.org/questions/programming-9/gawk-getline-eats-next-line-for-processing-869657/)

Richard Rahl 03-19-2011 04:25 PM

gawk getline eats next line for processing
 
Good afternoon everyone,

I'm new to awk/gawk but I'm working on a simple data processing script.

My script uses a loop with getline to check for the value on the next line to decide if it's time to terminate the loop.

This works dandy, but the problem is that getline eats that line, which then isn't processed by the rules in the remainder of the script (even though I want it to be). To illustrate what I mean, consider this simple gawk script:

Code:


  print $0
}

which prints every line in the file. Now consider this next script:

Code:


  print $0
  getline
  #Do stuff with next line
}

This will only print every *second* line in the file, since getline ate the next line of input and it doesn't get run through the script proper.

My question is basically how to I reset the stream such that, after I do a getline, the line read by getline continues to be the next line executed by the script?

Thanks for your help!

alunduil 03-19-2011 05:11 PM

Are you trying to manipulate the line you grab with getline outside of the gawk script? Can we see the larger context of what you're trying to do? It seems at first glance like you should be using gawks redirection but I'm simply guessing at this point.

Regards,

Alunduil

Richard Rahl 03-19-2011 05:23 PM

restructuring to the rescue
 
Imagine a file with a bunch of lines like this:

FUN
bla N
FUN
bla2
bla3 N
FUN bla4 N

I want to turn it into this:
FUN bla N
FUN bla2;bla3 N
FUN bla4 N

So I was reading the next value of the line to see if it was a "FUN" record in order to break the loop.
The problem was that, the "FUN" next line would get totally ignored. In other words, the line that started with "FUN", used to break the loop, didn't get processed by the script proper and so the "FUN bla4 N" line wouldn't get printed

Code:

#!/usr/bin/gawk -f
#Match all lines matching "FUN"
/FUN/ { 
    #It's a fun record
    currentLine = $0
    do {
      getline
      if ($1 == "FUN") {
        #Into the next record
        break
      }
      currentLine = currentLine";"$0
    } while ($(NF) != "N")
    print currentLine
  }
}

Granted, minutes after posting I realized using a do while loop was stupid, putting the condition in the while loop solved the problem since getline was never called if the "N" record was present on the current line! Silly me.

Anyway, I suppose it's still an interesting question if you understand the context. Basically "getline" eats the next line of input so that it can't be processed by the script, even if you want it to be, so I was wondering how, even if you use getline, you can make the script process the next line as if "getline" hadn't been called.

hda7 03-19-2011 06:14 PM

It appears (from reading the gawk manual) that you cannot preform normal processing of the line stolen with getline. From the gawk manual:
Quote:

When `getline' changes the value of `$0' and `NF', `awk' does _not_ automatically jump to the start of the program and start testing the new record against every pattern. However, the new record is tested against any subsequent rules.

alunduil 03-19-2011 06:15 PM

Interesting, on a side note perhaps sed would be easier to do the parsing you want to do?

Regards,

Alunduil

grail 03-20-2011 01:28 AM

Getline is inappropriate as all calls to it change NR and FNR, sometimes other items as well, so the following iteration will always read the following line.

Your options are either to save the current line each time and then do your comparison on the next time through or
you could try something like:
Code:

awk 'ORS=/ N$/?"\n":(/[0-9]$/)?";":" "' file
Of course it will depend how your data is setup to how well this will work.

hda7 03-22-2011 10:03 AM

Quote:

Originally Posted by grail (Post 4296693)
Getline is inappropriate as all calls to it change NR and FNR, sometimes other items as well, so the following iteration will always read the following line.

That's not strictly true. You can get input from other files with the "getline <filename" syntax, which doesn't modify NR or FNR (but does modify $0 and NF), and if you use the "getline variable <filename" variant, only variable is modified.
It is actually possible to get the input of the file being read, as the following example shows:
Code:

FNR == 1 {
    AUXFILE = FILENAME
    getline line <AUXFILE
    print
    print "Aux line 1: " line
    getline line <AUXFILE
    print "Aux line 2: " line
}
FNR == 2 {
    print
}

though since the auxiliary file starts from the beginning, this may not be very useful.

grail 03-22-2011 11:09 AM

Well in a way you are still proving my point, although I do see where you are coming from. Whilst in your example getline is not altering NR and FNR of original file it is in fact altering
these values for the file being used by getline

hda7 03-22-2011 11:44 AM

Quote:

Originally Posted by grail (Post 4299174)
Well in a way you are still proving my point, although I do see where you are coming from. Whilst in your example getline is not altering NR and FNR of original file it is in fact altering
these values for the file being used by getline

It is not correct to associate NR with any particular file, as it is a global count of the number of records read across all normal input files (not including "getline <filename" files).
Also, while "getline <filename" does advance the file pointer for the file it is reading, you can close() the file and start reading from the beginning again.


All times are GMT -5. The time now is 06:54 AM.