Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I hesitate to post something while the OP is still studying up on it, but I've been working on my own *advanced* solution, and I'm a bit stumped on one point. Hopefully it will work to inspire him, rather than act as a spoon-feeding.
I decided to try my hand at using gawk's new arrays of arrays feature, and I've managed to get each line in the desired order (c1-4 + (all) c5's + c6's + c7's + c8's).
(I could probably dig deeper into array nesting to make it neater, but I've decided to stop at two dimensions for the time being.)
What I can't quite figure out is how to keep the lines themselves in their original input order, since awk likes to process arrays according to some internal logic. I can't directly use asort/asorti or a simple counting loop, because the main index is a complex string, rather than a simple series.
Can someone suggest a fix for this, or am I just barking up the wrong tree here?
Code:
#!/usr/local/bin/gawk -f
# Requires gawk v.4.0+
#(it's in the above location on my system)
BEGIN{
SUBSEP=" "
}
{
ar[$1,$2,$3,$4][1] = ar[$1,$2,$3,$4][1]" "$5
ar[$1,$2,$3,$4][2] = ar[$1,$2,$3,$4][2]" "$6
ar[$1,$2,$3,$4][3] = ar[$1,$2,$3,$4][3]" "$7
ar[$1,$2,$3,$4][4] = ar[$1,$2,$3,$4][4]" "$8
}
END{
for ( i in ar ) {
printf "%s", i
for ( j=1 ; j <= 4; j ++ ) { printf "%s", ar[i][j] }
printf "\n"
}
}
Last edited by David the H.; 11-10-2011 at 08:31 PM.
Reason: just noticed the link was broken
Yes, I've already looked into that. It suffers from the same limitations as the asort options. The only built-in choices are numerical and string sorting. Indeed, it appears that asort simply accesses PROCINFO["sorted_in"] internally.
So what I need is something that keeps track of the original input order as it comes in. I'm thinking along the lines of a separate array to keep track of the indexes as they come along, and a related sorting function to reorder the output, but I can't quite wrap my mind around how to do it.
Well unless you are using version 4+ of gawk (as mentioned by David) this will never work as previous version do not have array in array ability.
Maybe you also missed this line:
Quote:
I hesitate to post something while the OP is still studying up on it, but I've been working on my own *advanced* solution, and I'm a bit stumped on one point. Hopefully it will work to inspire him, rather than act as a spoon-feeding.
Which generally means if your not sure what your doing this is probably not the solution you want to try and understand first.
Domo arigato, grail. An additional array level was one of the solutions I was thinking about. I just couldn't work out the implementation of it. At a certain level of complexity my brain apparently starts to overheat and I can't keep track of how everything is supposed to work.
I'm not sure what you gave works quite right though. I believe that if i was, for example, 2 on an input line, then 1 on the next input line, then there was no match on the line after that, then i++ would mistakenly increment it back to 2.
It took surprisingly long for me to work out the kinks, but here's my final solution. I also made the variables more regular and descriptive, as well as making it properly respect the output separator:
Code:
#!/usr/local/bin/gawk -f
BEGIN{
SUBSEP=OFS=" "
}
{
found = 0
for( g in ar ) {
if( ($1,$2,$3,$4) in ar[g] ) {
found = 1
break
}
}
if ( ! found ) {
g = length(ar) + 1
}
ar[g][$1,$2,$3,$4][1] = ar[g][$1,$2,$3,$4][1] OFS $5
ar[g][$1,$2,$3,$4][2] = ar[g][$1,$2,$3,$4][2] OFS $6
ar[g][$1,$2,$3,$4][3] = ar[g][$1,$2,$3,$4][3] OFS $7
ar[g][$1,$2,$3,$4][4] = ar[g][$1,$2,$3,$4][4] OFS $8
}
END{
for ( g = 1 ; g <= length(ar) ; g++ ) {
for ( first4 in ar[g] ) {
printf "%s", first4
for ( i=1 ; i <= 4 ; i++ ) { printf "%s", ar[g][first4][i] }
printf "\n"
}
}
}
@eagal, Sorry if I led you down the wrong track. I was really posting for my own edification more than anything else.
However, you might take some clues from the code I posted. awk's traditional arrays can likely do it as well, if you think about it creatively. I may see what I can come up with as well, if I have time to work on it.
I'm not sure what you gave works quite right though. I believe that if i was, for example, 2 on an input line, then 1 on the next input line, then there was no match on the line after that, then i++ would mistakenly increment it back to 2.
Good catch David I knocked it up before leaving work so hadn't really had a chance to test it.
@eagal - the general solution is actually easier to follow than using this more advanced feature (hint)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.