ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Sorry about the syntax error. I keep forgetting that in awk, "printf" is a statement while "sprintf()" is a function.
But, you got the same incorrect result?? Either one works fine for me. The code
Code:
awk '{a[sprintf("%0.8d", NR)] = $0}
END {n=asorti(a,b);
for (j=1;j<=n;j++) print a[b[j]]}' $InFile
does what it should and reproduces the lines in the same order that they were read. The original index was by line number, and sorting by that number prints the lines in that same order.
But, you got the same incorrect result?? Either one works fine for me. The code
Code:
awk '{a[sprintf("%0.8d", NR)] = $0}
END {n=asorti(a,b);
for (j=1;j<=n;j++) print a[b[j]]}' $InFile
does what it should and reproduces the lines in the same order that they were read. The original index was by line number, and sorting by that number prints the lines in that same order.
I was about to post and ask what you were expecting the sort by index to produce? As rnichols says, it prints the lines in the same order they were read (with zero padding of the indexes).
Please try that again with sprintf(...) as quoted after a cup of coffee or tea and verify if it indeed produces the incorrect result from post #1... what version of awk and OS?
Last edited by astrogeek; 02-27-2017 at 03:11 PM.
Reason: Added retry comment...
Just thought I would put in 2 cents. Firstly, the original solution shown would not have worked as the indices were set to NR, so the best you could have hoped for was the data returned in the exact same order. Obviously the padding option works, but probably the more (g)awk way would be:
awk '{a[$0]=$0}
END {n=asorti(a,b);
for (j=1;j<=n;j++) print a[b[j]]}' $InFile
... which works.
I presume that this is part of some larger awk script. Otherwise, it makes no sense to write a script to produce the same result as "sort $InFile".
Second, you could just as well have used "asort()" rather than "asorti()" on the original array ( a[NR] = $0 ) rather than create an array where each index is identical to the line. I don't understand the reason for insisting on "asorti()". There is one difference in the current approach ( a[$0] = $0 ). It will have the effect of eliminating duplicates, since a subsequent occurrence of the same line will overwrite the first. Perhaps that is what you want, though your original code did not do that.
I presume that this is part of some larger awk script. ...
Yes, of course. When posting here I make a small program to illustrate the question and avoid extraneous details.
Quote:
Second, you could just as well have used "asort()" rather than "asorti()" ...
Yes, and I did that in my own (larger) script. However I wanted to use asorti in the belief that it is more efficient if the InFile is large. Better to manipulate indices rather than actual data. Is this correct?
Quote:
There is one difference in the current approach ( a[$0] = $0 ). It will have the effect of eliminating duplicates, since a subsequent occurrence of the same line will overwrite the first.
The actual source data is real estate ownership data from Wake County, North Carolina, USA. These are public records, freely downloadable by anyone. There is only one line for each street address.
The purpose of setting the PROCINFO is that it guarantees the numbers will be sorted as numbers, whereas by simply placing the data in the array as is, it will firstly be as a string which then allows gawk
to choose its own interpretation of whether the data is sorted numerically or as a complete string. So the short answer is, your method has no guarantee it will work.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.