LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-09-2011, 12:15 PM   #16
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037

I hesitate to post something while the OP is still studying up on it, but I've been working on my own *advanced* solution, and I'm a bit stumped on one point. Hopefully it will work to inspire him, rather than act as a spoon-feeding.

I decided to try my hand at using gawk's new arrays of arrays feature, and I've managed to get each line in the desired order (c1-4 + (all) c5's + c6's + c7's + c8's).

(I could probably dig deeper into array nesting to make it neater, but I've decided to stop at two dimensions for the time being.)

What I can't quite figure out is how to keep the lines themselves in their original input order, since awk likes to process arrays according to some internal logic. I can't directly use asort/asorti or a simple counting loop, because the main index is a complex string, rather than a simple series.

Can someone suggest a fix for this, or am I just barking up the wrong tree here?
Code:
#!/usr/local/bin/gawk -f
# Requires gawk v.4.0+
#(it's in the above location on my system)

BEGIN{
     SUBSEP=" "
     }

{
     ar[$1,$2,$3,$4][1] = ar[$1,$2,$3,$4][1]" "$5
     ar[$1,$2,$3,$4][2] = ar[$1,$2,$3,$4][2]" "$6
     ar[$1,$2,$3,$4][3] = ar[$1,$2,$3,$4][3]" "$7
     ar[$1,$2,$3,$4][4] = ar[$1,$2,$3,$4][4]" "$8
}

END{
     for ( i in ar ) {
               printf "%s", i
               for ( j=1 ; j <= 4; j ++ ) { printf "%s", ar[i][j] }
               printf "\n"
        }
   }

Last edited by David the H.; 11-10-2011 at 08:31 PM. Reason: just noticed the link was broken
 
Old 11-09-2011, 11:51 PM   #17
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
How about adding the sort option prior to for loop call:
Code:
PROCINFO["sorted_in"] = "@ind_num_asc"
for( i in ar)...
See Here for details.
 
Old 11-10-2011, 12:58 AM   #18
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Yes, I've already looked into that. It suffers from the same limitations as the asort options. The only built-in choices are numerical and string sorting. Indeed, it appears that asort simply accesses PROCINFO["sorted_in"] internally.

So what I need is something that keeps track of the original input order as it comes in. I'm thinking along the lines of a separate array to keep track of the indexes as they come along, and a related sorting function to reorder the output, but I can't quite wrap my mind around how to do it.
 
Old 11-10-2011, 02:13 AM   #19
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Sorry ... misunderstood requirement ... how about:
Code:
#!/usr/bin/awk -f

BEGIN{
     SUBSEP=" "
     }

{
    found = 0

    for(p in ar)
        if(($1,$2,$3,$4) in ar[p])
            found = p

    if ( found )
        i = found
    else
        i++

     ar[i][$1,$2,$3,$4][1] = ar[i][$1,$2,$3,$4][1]" "$5
     ar[i][$1,$2,$3,$4][2] = ar[i][$1,$2,$3,$4][2]" "$6
     ar[i][$1,$2,$3,$4][3] = ar[i][$1,$2,$3,$4][3]" "$7
     ar[i][$1,$2,$3,$4][4] = ar[i][$1,$2,$3,$4][4]" "$8
}

END{
    for ( x = 1; x <= i; x++)
        for (f in ar[x]){
            printf "%s", f
            for ( j=1 ; j <= 4; j ++ ) { printf "%s", ar[x][f][j] }
               printf "\n"

   }
}
 
1 members found this post helpful.
Old 11-10-2011, 09:18 AM   #20
eagal
LQ Newbie
 
Registered: Nov 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Thanks a lot for your help. I try to understand it and to test it. I did not know why the test showed:
gawk: test1:10: ar[$1,$2,$3,$4][1] = ar[$1,$2,$3,$4][1]" "$5
gawk: test1:10: ^ syntax error
gawk: test1:10: ar[$1,$2,$3,$4][1] = ar[$1,$2,$3,$4][1]" "$5
gawk: test1:10: ^ syntax error
gawk: test1:11: ar[$1,$2,$3,$4][2] = ar[$1,$2,$3,$4][2]" "$6
gawk: test1:11: ^ syntax error
gawk: test1:11: ar[$1,$2,$3,$4][2] = ar[$1,$2,$3,$4][2]" "$6
gawk: test1:11: ^ syntax error
gawk: test1:12: ar[$1,$2,$3,$4][3] = ar[$1,$2,$3,$4][3]" "$7
gawk: test1:12: ^ syntax error
gawk: test1:12: ar[$1,$2,$3,$4][3] = ar[$1,$2,$3,$4][3]" "$7
gawk: test1:12: ^ syntax error
gawk: test1:13: ar[$1,$2,$3,$4][4] = ar[$1,$2,$3,$4][4]" "$8
gawk: test1:13: ^ syntax error
gawk: test1:13: ar[$1,$2,$3,$4][4] = ar[$1,$2,$3,$4][4]" "$8
gawk: test1:13: ^ syntax error
gawk: test1:19: for ( j=1 ; j <= 4; j ++ ) { printf "%s", ar[i][j] }
gawk: test1:19: ^ syntax error
 
Old 11-10-2011, 10:46 AM   #21
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Well unless you are using version 4+ of gawk (as mentioned by David) this will never work as previous version do not have array in array ability.

Maybe you also missed this line:
Quote:
I hesitate to post something while the OP is still studying up on it, but I've been working on my own *advanced* solution, and I'm a bit stumped on one point. Hopefully it will work to inspire him, rather than act as a spoon-feeding.
Which generally means if your not sure what your doing this is probably not the solution you want to try and understand first.

Last edited by grail; 11-10-2011 at 10:48 AM.
 
Old 11-10-2011, 06:27 PM   #22
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Domo arigato, grail. An additional array level was one of the solutions I was thinking about. I just couldn't work out the implementation of it. At a certain level of complexity my brain apparently starts to overheat and I can't keep track of how everything is supposed to work.

I'm not sure what you gave works quite right though. I believe that if i was, for example, 2 on an input line, then 1 on the next input line, then there was no match on the line after that, then i++ would mistakenly increment it back to 2.

It took surprisingly long for me to work out the kinks, but here's my final solution. I also made the variables more regular and descriptive, as well as making it properly respect the output separator:
Code:
#!/usr/local/bin/gawk -f

BEGIN{
	SUBSEP=OFS=" "
}

{
	found = 0
	for( g in ar ) {
		if( ($1,$2,$3,$4) in ar[g] ) {
			found = 1
			break
		}
	}

	if ( ! found ) {
		g = length(ar) + 1
	}

	ar[g][$1,$2,$3,$4][1] = ar[g][$1,$2,$3,$4][1] OFS $5
	ar[g][$1,$2,$3,$4][2] = ar[g][$1,$2,$3,$4][2] OFS $6
	ar[g][$1,$2,$3,$4][3] = ar[g][$1,$2,$3,$4][3] OFS $7
	ar[g][$1,$2,$3,$4][4] = ar[g][$1,$2,$3,$4][4] OFS $8

}

END{
	for ( g = 1 ; g <= length(ar) ; g++ ) {
		for ( first4 in ar[g] ) {
				printf "%s", first4
 				for ( i=1 ; i <= 4 ; i++ ) { printf "%s", ar[g][first4][i] }
				printf "\n"
		}
	}
}

@eagal, Sorry if I led you down the wrong track. I was really posting for my own edification more than anything else.

However, you might take some clues from the code I posted. awk's traditional arrays can likely do it as well, if you think about it creatively. I may see what I can come up with as well, if I have time to work on it.
 
Old 11-10-2011, 08:20 PM   #23
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Quote:
I'm not sure what you gave works quite right though. I believe that if i was, for example, 2 on an input line, then 1 on the next input line, then there was no match on the line after that, then i++ would mistakenly increment it back to 2.
Good catch David I knocked it up before leaving work so hadn't really had a chance to test it.

@eagal - the general solution is actually easier to follow than using this more advanced feature (hint)
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to copy some lines in a file and delete these lines after gartura Linux - General 1 07-20-2010 08:55 AM
Delete Duplicate Lines in a file, leaving only the unique lines left xmrkite Linux - Software 6 01-14-2010 06:18 PM
replace several lines in a file with other lines in another file if condition yara Linux - General 12 10-27-2009 03:46 PM
How would you combine files excluding the lines that are different? darcman Linux - Software 10 01-19-2009 07:23 PM
Substitute specific lines with lines from another file rahmathullakm Programming 4 01-10-2009 05:47 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:21 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration