LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-27-2017, 11:14 AM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,879

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
awk -- asorti produces unexpected result


With this InFile ...
Code:
112 MAIN ST
110 MAIN ST
201 MAIN ST
203 MAIN ST
205 MAIN ST
207 MAIN ST
206 MAIN ST
204 MAIN ST
202 MAIN ST
108 MAIN ST
... this awk ...
Code:
awk '{a[NR]=$0} 
 END {asort(a,b);
   for (j=1;j<=NR;j++) print b[j]}' $InFile
... produced this ...
Code:
108 MAIN ST
110 MAIN ST
112 MAIN ST
201 MAIN ST
202 MAIN ST
203 MAIN ST
204 MAIN ST
205 MAIN ST
206 MAIN ST
207 MAIN ST
... which is the desired and expected result.

With the same InFile this awk ...
Code:
awk '{a[NR]=$0} 
 END {n=asorti(a,b);
   for (j=1;j<=n;j++) print a[b[j]]}' $InFile
... produced this ...
Code:
112 MAIN ST
108 MAIN ST
110 MAIN ST
201 MAIN ST
203 MAIN ST
205 MAIN ST
207 MAIN ST
206 MAIN ST
204 MAIN ST
202 MAIN ST
... which is not the desired result.

Please advise.

Daniel B. Martin
 
Old 02-27-2017, 12:47 PM   #2
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,770

Rep: Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210
Array indices in awk are strings, not numbers. Thus "10" is greater than "1" but less than "2".

A couple of alternatives when storing the original array:
Code:
a[sprintf "%0.8d", NR] = $0
a[100000000 + NR] = $0
The first pads the number to 8 digits with leading zeros. The second does pretty much the equivalent by adding a large (enough?) offset.

Modern versions of awk do have the ability to use numeric array indices, but that is not the default.

Last edited by rknichols; 02-27-2017 at 12:50 PM.
 
1 members found this post helpful.
Old 02-27-2017, 02:00 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,879

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Thanks for taking an interest but your suggestions didn't resolve the mystery.
Code:
a[sprintf "%0.8d", NR] = $0
produced a syntax error which was corrected by adding parentheses.
Code:
a[sprintf("%0.8d", NR)] = $0
produced the same incorrect sort as shown in post #1.

Code:
a[100000000 + NR] = $0
produced the same incorrect sort as shown in post #1.

Daniel B. Martin
 
Old 02-27-2017, 02:13 PM   #4
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,770

Rep: Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210
Sorry about the syntax error. I keep forgetting that in awk, "printf" is a statement while "sprintf()" is a function.

But, you got the same incorrect result?? Either one works fine for me. The code
Code:
awk '{a[sprintf("%0.8d", NR)] = $0}
 END {n=asorti(a,b);
   for (j=1;j<=n;j++) print a[b[j]]}' $InFile
does what it should and reproduces the lines in the same order that they were read. The original index was by line number, and sorting by that number prints the lines in that same order.
 
1 members found this post helpful.
Old 02-27-2017, 02:24 PM   #5
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,258
Blog Entries: 24

Rep: Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193
Quote:
Originally Posted by rknichols View Post
But, you got the same incorrect result?? Either one works fine for me. The code
Code:
awk '{a[sprintf("%0.8d", NR)] = $0}
 END {n=asorti(a,b);
   for (j=1;j<=n;j++) print a[b[j]]}' $InFile
does what it should and reproduces the lines in the same order that they were read. The original index was by line number, and sorting by that number prints the lines in that same order.
I was about to post and ask what you were expecting the sort by index to produce? As rnichols says, it prints the lines in the same order they were read (with zero padding of the indexes).

Please try that again with sprintf(...) as quoted after a cup of coffee or tea and verify if it indeed produces the incorrect result from post #1... what version of awk and OS?

Last edited by astrogeek; 02-27-2017 at 03:11 PM. Reason: Added retry comment...
 
Old 02-27-2017, 07:42 PM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,879

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Mint 17.3, GNU Awk 4.0.1.

Allow me to reframe my question in a more direct manner.

With this InFile ...
Code:
112 MAIN ST
110 MAIN ST
201 MAIN ST
203 MAIN ST
205 MAIN ST
207 MAIN ST
206 MAIN ST
204 MAIN ST
202 MAIN ST
108 MAIN ST
... how may I code awk with asorti to produce this sorted result?
Code:
108 MAIN ST
110 MAIN ST
112 MAIN ST
201 MAIN ST
202 MAIN ST
203 MAIN ST
204 MAIN ST
205 MAIN ST
206 MAIN ST
207 MAIN ST
Daniel B. Martin
 
Old 02-27-2017, 08:02 PM   #7
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,258
Blog Entries: 24

Rep: Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193Reputation: 4193
How about this?

Code:
awk '{a[sprintf("%0.8d", $1)]=$0}
 END {n=asorti(a,b);
    for (j=1;j<=n;j++) print a[b[j]]}' $InFile

Last edited by astrogeek; 02-27-2017 at 08:05 PM. Reason: InFile sub
 
1 members found this post helpful.
Old 02-27-2017, 08:28 PM   #8
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,879

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Thank you, astrogeek.

This works...
Code:
awk '{a[sprintf("%0.8d", $1)]=$0}
 END {n=asorti(a,b);
    for (j=1;j<=n;j++) print a[b[j]]}' $InFile
... and so does this ...
Code:
awk '{a[$0]=$0}
 END {n=asorti(a,b);
    for (j=1;j<=n;j++) print a[b[j]]}' $InFile
SOLVED!

Daniel B. Martin
 
Old 02-28-2017, 09:44 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
Just thought I would put in 2 cents. Firstly, the original solution shown would not have worked as the indices were set to NR, so the best you could have hoped for was the data returned in the exact same order. Obviously the padding option works, but probably the more (g)awk way would be:
Code:
awk 'BEGIN{PROCINFO["sorted_in"] = "@ind_num_asc"}{a[$1]=$0}END{n=asorti(a,b);for(i=1;i<=n;i++)print a[b[i]]}' file
You can check out the sorted_in options in the manual here
 
1 members found this post helpful.
Old 02-28-2017, 10:47 AM   #10
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,770

Rep: Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210
Quote:
Originally Posted by grail View Post
but probably the more (g)awk way would be:
Code:
awk 'BEGIN{PROCINFO["sorted_in"] = "@ind_num_asc"}{a[$1]=$0}END{n=asorti(a,b);for(i=1;i<=n;i++)print a[b[i]]}' file
You can check out the sorted_in options in the manual here
That does require gawk version >=4.0. Earlier versions lack that feature.
 
Old 02-28-2017, 11:37 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
Quote:
Originally Posted by rknichols View Post
That does require gawk version >=4.0. Earlier versions lack that feature.
Post #6 saves on that one
 
Old 02-28-2017, 08:02 PM   #12
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,879

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
In post #9 grail proposed ...
Code:
awk 'BEGIN{PROCINFO["sorted_in"] = "@ind_num_asc"}
          {a[$1]=$0}
       END{n=asorti(a,b);
           for(i=1;i<=n;i++) print a[b[i]]}' $InFile
... which works.

In post #8 I proposed ...
Code:
awk '{a[$0]=$0}
 END {n=asorti(a,b);
    for (j=1;j<=n;j++) print a[b[j]]}' $InFile
... which works.

Please explain what is gained by the PROCINFO.

Daniel B. Martin
 
Old 02-28-2017, 08:51 PM   #13
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,770

Rep: Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210
Quote:
Originally Posted by danielbmartin View Post
In post #8 I proposed ...
Code:
awk '{a[$0]=$0}
 END {n=asorti(a,b);
    for (j=1;j<=n;j++) print a[b[j]]}' $InFile
... which works.
I presume that this is part of some larger awk script. Otherwise, it makes no sense to write a script to produce the same result as "sort $InFile".

Second, you could just as well have used "asort()" rather than "asorti()" on the original array ( a[NR] = $0 ) rather than create an array where each index is identical to the line. I don't understand the reason for insisting on "asorti()". There is one difference in the current approach ( a[$0] = $0 ). It will have the effect of eliminating duplicates, since a subsequent occurrence of the same line will overwrite the first. Perhaps that is what you want, though your original code did not do that.
 
Old 02-28-2017, 09:55 PM   #14
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,879

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by rknichols View Post
I presume that this is part of some larger awk script. ...
Yes, of course. When posting here I make a small program to illustrate the question and avoid extraneous details.
Quote:
Second, you could just as well have used "asort()" rather than "asorti()" ...
Yes, and I did that in my own (larger) script. However I wanted to use asorti in the belief that it is more efficient if the InFile is large. Better to manipulate indices rather than actual data. Is this correct?
Quote:
There is one difference in the current approach ( a[$0] = $0 ). It will have the effect of eliminating duplicates, since a subsequent occurrence of the same line will overwrite the first.
The actual source data is real estate ownership data from Wake County, North Carolina, USA. These are public records, freely downloadable by anyone. There is only one line for each street address.

Daniel B. Martin
 
Old 03-01-2017, 04:34 AM   #15
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
The purpose of setting the PROCINFO is that it guarantees the numbers will be sorted as numbers, whereas by simply placing the data in the array as is, it will firstly be as a string which then allows gawk
to choose its own interpretation of whether the data is sorted numerically or as a complete string. So the short answer is, your method has no guarantee it will work.
 
1 members found this post helpful.
  


Reply

Tags
asort, asorti, awk


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] SORT - unexpected result danielbmartin Programming 3 09-30-2015 10:18 AM
[SOLVED] Bash substring delivers unexpected result danielbmartin Programming 1 03-04-2013 09:39 AM
sox - simple command including '-r' produces different file result each time wb0gaz Linux - Software 2 08-26-2012 08:04 AM
mysql count(*) unexpected result kpachopoulos Programming 3 01-03-2008 10:08 AM
cp command problem..... unexpected result hamster Linux - General 2 04-10-2003 04:57 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration