Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
03-19-2011, 03:25 PM
|
#1
|
Member
Registered: Aug 2006
Location: Cape Breton, D'Hara
Distribution: Ubuntu, Redhat
Posts: 83
Rep:
|
gawk getline eats next line for processing
Good afternoon everyone,
I'm new to awk/gawk but I'm working on a simple data processing script.
My script uses a loop with getline to check for the value on the next line to decide if it's time to terminate the loop.
This works dandy, but the problem is that getline eats that line, which then isn't processed by the rules in the remainder of the script (even though I want it to be). To illustrate what I mean, consider this simple gawk script:
which prints every line in the file. Now consider this next script:
Code:
{
print $0
getline
#Do stuff with next line
}
This will only print every *second* line in the file, since getline ate the next line of input and it doesn't get run through the script proper.
My question is basically how to I reset the stream such that, after I do a getline, the line read by getline continues to be the next line executed by the script?
Thanks for your help!
|
|
|
03-19-2011, 04:11 PM
|
#2
|
Member
Registered: Feb 2005
Location: San Antonio, TX
Distribution: Gentoo
Posts: 684
Rep:
|
Are you trying to manipulate the line you grab with getline outside of the gawk script? Can we see the larger context of what you're trying to do? It seems at first glance like you should be using gawks redirection but I'm simply guessing at this point.
Regards,
Alunduil
|
|
|
03-19-2011, 04:23 PM
|
#3
|
Member
Registered: Aug 2006
Location: Cape Breton, D'Hara
Distribution: Ubuntu, Redhat
Posts: 83
Original Poster
Rep:
|
restructuring to the rescue
Imagine a file with a bunch of lines like this:
FUN
bla N
FUN
bla2
bla3 N
FUN bla4 N
I want to turn it into this:
FUN bla N
FUN bla2;bla3 N
FUN bla4 N
So I was reading the next value of the line to see if it was a "FUN" record in order to break the loop.
The problem was that, the "FUN" next line would get totally ignored. In other words, the line that started with "FUN", used to break the loop, didn't get processed by the script proper and so the "FUN bla4 N" line wouldn't get printed
Code:
#!/usr/bin/gawk -f
#Match all lines matching "FUN"
/FUN/ {
#It's a fun record
currentLine = $0
do {
getline
if ($1 == "FUN") {
#Into the next record
break
}
currentLine = currentLine";"$0
} while ($(NF) != "N")
print currentLine
}
}
Granted, minutes after posting I realized using a do while loop was stupid, putting the condition in the while loop solved the problem since getline was never called if the "N" record was present on the current line! Silly me.
Anyway, I suppose it's still an interesting question if you understand the context. Basically "getline" eats the next line of input so that it can't be processed by the script, even if you want it to be, so I was wondering how, even if you use getline, you can make the script process the next line as if "getline" hadn't been called.
|
|
|
03-19-2011, 05:14 PM
|
#4
|
Member
Registered: May 2009
Distribution: Debian wheezy
Posts: 252
Rep:
|
It appears (from reading the gawk manual) that you cannot preform normal processing of the line stolen with getline. From the gawk manual:
Quote:
When `getline' changes the value of `$0' and `NF', `awk' does _not_ automatically jump to the start of the program and start testing the new record against every pattern. However, the new record is tested against any subsequent rules.
|
|
|
|
03-19-2011, 05:15 PM
|
#5
|
Member
Registered: Feb 2005
Location: San Antonio, TX
Distribution: Gentoo
Posts: 684
Rep:
|
Interesting, on a side note perhaps sed would be easier to do the parsing you want to do?
Regards,
Alunduil
|
|
|
03-20-2011, 12:28 AM
|
#6
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
Getline is inappropriate as all calls to it change NR and FNR, sometimes other items as well, so the following iteration will always read the following line.
Your options are either to save the current line each time and then do your comparison on the next time through or
you could try something like:
Code:
awk 'ORS=/ N$/?"\n":(/[0-9]$/)?";":" "' file
Of course it will depend how your data is setup to how well this will work.
|
|
|
03-22-2011, 09:03 AM
|
#7
|
Member
Registered: May 2009
Distribution: Debian wheezy
Posts: 252
Rep:
|
Quote:
Originally Posted by grail
Getline is inappropriate as all calls to it change NR and FNR, sometimes other items as well, so the following iteration will always read the following line.
|
That's not strictly true. You can get input from other files with the "getline < filename" syntax, which doesn't modify NR or FNR (but does modify $0 and NF), and if you use the "getline variable < filename" variant, only variable is modified.
It is actually possible to get the input of the file being read, as the following example shows:
Code:
FNR == 1 {
AUXFILE = FILENAME
getline line <AUXFILE
print
print "Aux line 1: " line
getline line <AUXFILE
print "Aux line 2: " line
}
FNR == 2 {
print
}
though since the auxiliary file starts from the beginning, this may not be very useful.
|
|
|
03-22-2011, 10:09 AM
|
#8
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
Well in a way you are still proving my point, although I do see where you are coming from. Whilst in your example getline is not altering NR and FNR of original file it is in fact altering
these values for the file being used by getline
|
|
|
03-22-2011, 10:44 AM
|
#9
|
Member
Registered: May 2009
Distribution: Debian wheezy
Posts: 252
Rep:
|
Quote:
Originally Posted by grail
Well in a way you are still proving my point, although I do see where you are coming from. Whilst in your example getline is not altering NR and FNR of original file it is in fact altering
these values for the file being used by getline
|
It is not correct to associate NR with any particular file, as it is a global count of the number of records read across all normal input files (not including "getline < filename" files).
Also, while "getline < filename" does advance the file pointer for the file it is reading, you can close() the file and start reading from the beginning again.
|
|
|
All times are GMT -5. The time now is 05:59 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|