merge data from two lines within a "group" onto one line
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: at work: Red Hat Enterprise Linux Server release 5.8 (Tikanga); at home: what do you recommend?
Posts: 24
Rep:
merge data from two lines within a "group" onto one line
I have a text file with some different data something like this , I was trying to figure out how to parse though it to
in a sense merge data from two lines within a "group" onto one line.
prompt> cat sample.txt
ID1 NAME FIRST TOM
ID1 NAME LAST SMITH
ID1 ADDRESS MYTOWN USA
ID2 NAME FIRST DAVE
ID2 NAME LAST BROWN
ID2 ADDRESS ANYTOWN USA
ID3 NAME LAST JONES
ID3 ADDRESS SOMETOWN USA
I want to make this into a new file like this to put the first and last name together on one line and leave the address line alone.
ID1 TOM SMITH
ID1 ADDRESS MYTOWN USA
ID2 DAVE BROWN
ID2 ADDRESS ANYTOWN USA
ID3 JONES
ID3 ADDRESS SOMETOWN USA
I thought I figured out how to parse though the ID's but I am not so sure:
prompt> my.awk
BEGIN{OFS=FS=" "}
{if($1 in a)
{a[$1]=a[$1]} else {a[$1]=$0}}
END {asort(a); for(i in a) print a[i]}
What I am getting:
prompt> /bin/gawk -f my.awk sample.txt
ID1 NAME FIRST TOM
ID2 NAME FIRST DAVE
ID3 NAME LAST JONES
Then I thought what about this:
prompt> cat my2.awk
BEGIN{OFS=FS=" "}
{if($1 in a) {a[$1]=a[$1] " " $NF} else {a[$1]=$0}}
END {asort(a); for(i in a) print a[i]}
Resulted in this below, which got the first and last name together, but I got the USA from the address too
and still no address on it's own line and the last name on the second record did not pick up the "BROWN",
so I think I need to specify the fields I want in the print, but I wasn't sure how to do that either.
ID1 NAME FIRST TOM SMITH USA
ID2 NAME FIRST DAVE JR USA
ID3 NAME LAST JONES USA
Distribution: at work: Red Hat Enterprise Linux Server release 5.8 (Tikanga); at home: what do you recommend?
Posts: 24
Original Poster
Rep:
Yes that is a solution, though I realize perhaps I needed to be more clear in that I was trying to do this within awk specifically. I can certainly close this and give you credit for solving and re-phrase my question if you feel that that is best.
Thank you.
bop-a-nator
I was looking to do it with in the awk script itself. As I am already parsing though the file which contains other data too. I simply have a subset of data within a file, I need to merge data from two lines together, and was trying to find a simply way to illustrate the problem I was trying to solve within an awk script. Basically as it it goes though the bigger awk and finds the records that begin with ID, then it needs to loop around in these to find the NAME identifier of FIRST and LAST then put the values of those on the same line.
To do this entirely in awk, I think we need to be a bit more exacting in our matching logic. It also helps to write it out as a stand-alone script, rather than try to cram it all onto the command line.
Code:
#!/usr/bin/awk -f
{
if ( $2 == "NAME" )
{
if ( $3 == "FIRST" ) { fn[$1]=$4 }
if ( $3 == "LAST" ) { ln[$1]=$4 }
next
}
if ( $2 == "ADDRESS" )
{
name = fn[$1] ? fn[$1] OFS ln[$1] : ln[$1]
print $1 , name
print $0
}
}
The above assumes that there's always a "LAST" name, but "FIRST" is optional. You'll have to redo the name variable setting if it can be otherwise. It also assumes that the "ADDRESS" line always follows the name fields. If not, then you'll either have to save the address too and print everything out in an END section after the main processing is complete.
There's also a final assumption that the names are all single words. The code would have to get more complex if there could be a $5 field on the "NAME" lines.
PS: Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques.
Last edited by David the H.; 01-27-2013 at 10:40 AM.
Reason: more compact code
@David:
Indeed, you've given a more strict (+perfect) solution. Could you explain the following line in your code i.e. what does ? and : do here, and how it's storing all this inside 'name':-
In this case I used it to ensure that the space between the two names only appears when both are present. It's kind of hard to handle optional spaces without something like it.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.