LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-01-2014, 12:34 PM   #1
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Peculiar awk behaviour confusing me :(


So I was working on a solution for a different problem when I came across an unusual result whilst using awk.

Input data:
Code:
1 , 2
3 , 4
5 , 6
7 , 8
9 , 10
Now the data is trivial and not at all important, but the format will help demonstrate the issue. The first objective is to print each number on its own line using awk.
Trivially this can be solved with (of course there are other ways):
Code:
$ awk '1' RS'[,\n ]+' file
1
2
3
4
5
6
7
8
9
10
So far so good. Now we want to print every second entry from the list.
Again there are alternate solutions, but my quirk comes from using getline:
Code:
$ awk 'getline' RS'[,\n ]+' file
2
4
6
8
I would have expected that the number 10 should also have been displayed for after the number 8 is returned the 9th line of our output, based on the first script showing that by
setting RS we should return 10 lines, is read and then getline should retrieve the number 10 and display it.

If someone would explain where my thinking has gone awry I would be very appreciative
 
Old 05-01-2014, 03:02 PM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,147

Rep: Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264
Does it depend on whether the last line of your file ends with a newline?
 
Old 05-01-2014, 10:34 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Original Poster
Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
No unfortunately. I should have added another example to show that it also seems to be something to do with the plus in the regex.

Using the same input file if we remove the space and the plus sign:
Code:
$ awk '{print "|"$0"|"}' RS='[,\n]' file
|1 |
| 2|
|3 |
| 4|
|5 |
| 6|
|7 |
| 8|
|9 |
| 10|
The pipes just make it clearer that we also have white space. If we now add getline we get:
Code:
$ awk '{getline;print "|"$0"|"}' RS='[,\n]' file
| 2|
| 4|
| 6|
| 8|
| 10|
This is exactly the output I would have expected, ie. every second entry is returned from our 10 records based on RS
 
Old 05-02-2014, 11:35 AM   #4
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,147

Rep: Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264
This works in awk (which I like to call "mini Perl") on Ubuntu:

Code:
awk 'BEGIN {RS="[,\n ]+"} getline' file
2
4
6
8
10
 
Old 05-02-2014, 12:21 PM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Original Poster
Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Are you using the default on Ubuntu, ie. mawk?

As under gawk (latest version) it does not

What is even more interesting is depending on how you use getline I can also get different results:
Code:
$ awk 'BEGIN{RS="[,\n ]+"}getline' file
2
4
6
8
$ awk 'BEGIN{RS="[,\n ]+"}{getline;print}' file
2
4
6
8
9
Now the second does make partial sense as the last getline when on record 9 retrieves nothing hence the print displays the record it is on
 
Old 05-02-2014, 03:15 PM   #6
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,147

Rep: Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264
Yes, it is mawk.

Also, I got this with gawk on CentOS:
Code:
 awk -v RS="[,\n ]+" '{getline;print}  END {print ERRNO}' file
2
4
6
8
9
0

Last edited by smallpond; 05-02-2014 at 03:27 PM.
 
Old 05-03-2014, 03:50 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Original Poster
Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Thanks for confirming it is gawk specific. So I wonder if this is a bug I should raise or am I misunderstanding how getline is functioning when RS is a computed regex using a modifier like plus??

Here is more proof that RS is working as expected:
Code:
$ awk 'BEGIN{RS="[,\n ]+"}NR<9{getline}1' file
2
4
6
8
9
10
So getline is functioning correctly for all records prior to the ninth and nine and ten do exist
 
Old 05-03-2014, 04:16 AM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,137

Rep: Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122
Not a week goes by that I don't find something bewildering in awk.

Welcome to the club ....
 
Old 05-03-2014, 05:29 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Original Poster
Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Well I thought I was getting to the better understanding point ... generally
 
Old 05-03-2014, 04:52 PM   #10
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,784

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
Quote:
Originally Posted by grail View Post
So I wonder if this is a bug I should raise or am I misunderstanding how getline is functioning when RS is a computed regex using a modifier like plus??
It looks like a bug to me. There is some weird dependency on the exact form of regex:
Code:
% gawk --version | head -1
GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.2-p3, GNU MP 6.0.0)
% cat input
1
2
3
4
5
6
7
8
9
10
% gawk -vRS='[\n]+' '{old = $0; got = getline; new = $0; printf("%s|getline=%s|%s\n", old, got, new)}' input
1|getline=1|2
3|getline=1|4
5|getline=1|6
7|getline=1|8
9|getline=0|9
% gawk -vRS='\n+' '{old = $0; got = getline; new = $0; printf("%s|getline=%s|%s\n", old, got, new)}' input
1|getline=1|2
3|getline=1|4
5|getline=1|6
7|getline=1|8
9|getline=1|10
% gawk -vRS='(\n)+' '{old = $0; got = getline; new = $0; printf("%s|getline=%s|%s\n", old, got, new)}' input
1|getline=1|2
3|getline=1|4
5|getline=1|6
7|getline=1|8
9|getline=0|9
% gawk -vRS='(\n){1,}' '{old = $0; got = getline; new = $0; printf("%s|getline=%s|%s\n", old, got, new)}' input
1|getline=1|2
3|getline=1|4
5|getline=1|6
7|getline=1|8
9|getline=1|10
% mawk -vRS='(\n)+' '{old = $0; got = getline; new = $0; printf("%s|getline=%s|%s\n", old, got, new)}' input
1|getline=1|2
3|getline=1|4
5|getline=1|6
7|getline=1|8
9|getline=1|10
 
Old 05-04-2014, 04:32 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Original Poster
Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Well I have forwarded the issue as a bug and will leave this question open until I hear back.

Should anyone else have any answers to prove it is not a bug, by all means please let us know
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Weird behaviour of a simple awk program colucix Programming 7 10-16-2013 10:36 AM
[SOLVED] Weird awk behaviour with NOT regexp switch sarenace Programming 7 05-18-2012 12:54 AM
confusing behaviour of a command aggrishabh Linux - Newbie 2 01-05-2011 11:26 AM
Peculiar behaviour in Different terminals raghesh Linux - Server 0 03-13-2008 11:23 PM
awk strange behaviour in tcsh flyingalex Linux - Software 5 10-17-2003 10:27 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:25 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration