Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
07-10-2012, 12:38 AM
|
#1
|
Member
Registered: Feb 2012
Posts: 89
Rep: 
|
Issue for going back to previous line under conditions
I have this input:
Code:
Joe|info.1
Bob|info.1
Bob|info.2
I would like to write the different info about the same person on the same line like that:
Code:
Joe|info.1
Bob|info.1|info.2
I tried:
Code:
awk 'BEGIN{FS=OFS="|"} {if(a[$1]++ == 0) {print; stored = $0}; else if(a[$1]++ > 0) print stored FS $2}'
But I get the duplicate original info:
Code:
Joe|info.1
Bob|info.1
Bob|info.1|info.2
It's because I print the first if statement, but if I don't I don't have the first line...
Any advice !
Thanks in advance
|
|
|
07-10-2012, 12:52 AM
|
#2
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
As a quick alternative:
Code:
awk -F"|" 'NR==1{printf $0}x{if(x!=$1)printf "\n%s",$0;else printf "|%s",$2}{x=$1}' file
|
|
|
07-10-2012, 01:09 AM
|
#3
|
Member
Registered: Feb 2012
Posts: 89
Original Poster
Rep: 
|
Thanks for your help, but this alternative is too quick ! :-)
It doesn't work for me...
The point here is to say if a[$1] exist only once then print the entire line, and if a[$1] exist more than once then go back to the first occurence and add the supplementary fields from the next occurrences.
Last edited by Trd300; 07-10-2012 at 01:32 AM.
|
|
|
07-10-2012, 02:14 AM
|
#4
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852
|
How about something like this?
Code:
#example file input
$ cat file.txt
Joe|info.1
Bob|info.1
Bob|info.2
David|info.1
Bob|info.3
Grail|info.1
David|info.2
Trd|info.1
Foo|info.1
$ awk 'BEGIN{ FS="|" } { a[$1]=(a[$1]?a[$1]:$1) FS $2 } END{ for (i in a){ print a[i] } }' file.txt
Foo|info.1
Grail|info.1
David|info.1|info.2
Bob|info.1|info.2|info.3
Trd|info.1
Joe|info.1
Caveats are that it assumes there are only two fields per line, and the output is (as you can see) unsorted in relation to the original, due to awk's internal array index tracking.
Last edited by David the H.; 07-10-2012 at 02:19 AM.
Reason: formatting clean-up
|
|
1 members found this post helpful.
|
07-10-2012, 03:10 AM
|
#5
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
I guess from the initial data I was of the understanding the data was sorted by column 1 (hence my suggested solution).
Your current process obviously cannot work as using print will leave the line intact but not allow for additional entries to be added.
Therefore, in an unsorted list (although will of course work for sorted, but requires storing before printing), David's solution is the way to go 
Further to David's solution, if name order were important you could use an asorti in the END solution.
|
|
|
07-10-2012, 08:53 AM
|
#6
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852
|
Yeah, maybe I should've added that to my caveats. If the list is unsorted, then you're going to have to store every line in memory and print everything out at the end. That's not a problem for small amounts of input text, but it won't work if there's more than the system memory can handle.
As for controlling array sorting, see here:
http://www.gnu.org/software/gawk/man...y-Sorting.html
Rather than using asorti though, if you want the output sorted alphabetically, for example, you can simply add a PROCINFO setting to the BEGIN section:
Code:
BEGIN{ PROCINFO["sorted_in"]="@ind_str_asc" ; FS=OFS="|" }
Note that only recent versions of gawk can do this. older gawk and other awk implementations don't have any sorting features built-in, and you'd have to manually roll your own index tracking function. You'll also have to do so if you need the output order to be identical to the input, and it isn't already in one of the pre-set sorting types.
(And wouldn't it be nice if the gawk developers added a setting or two for "input order"?)
|
|
|
07-10-2012, 07:14 PM
|
#7
|
Member
Registered: Feb 2012
Posts: 89
Original Poster
Rep: 
|
Thanks David & grail !
The order it returns the output doesn't really matter.
I didn't know this syntax:
Code:
{ a[$i]=(a[$i]?a[$i]:$i) FS $j }
it's very handy, and asorti and PROCINFO as well.
Thanks guys !
Last edited by Trd300; 07-10-2012 at 09:48 PM.
|
|
|
07-10-2012, 09:47 PM
|
#8
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
If it is sorted then my solution negates having to store the data.
|
|
|
07-11-2012, 10:42 AM
|
#9
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852
|
Quote:
Originally Posted by Trd300
I didn't know this syntax:
Code:
{ a[$i]=(a[$i]?a[$i]:$i) FS $j }
it's very handy, and asorti and PROCINFO as well.
|
Yeah, " condition?value1:value2" is the ternary operator, a kind of short form of if/then/else.
In this case, if a previously-set value for array entry " a[$i]" exists, then use it, otherwise use " $i".
|
|
|
All times are GMT -5. The time now is 10:18 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|