merge data from two lines within a "group" onto one line
I have a text file with some different data something like this , I was trying to figure out how to parse though it to
in a sense merge data from two lines within a "group" onto one line. prompt> cat sample.txt ID1 NAME FIRST TOM ID1 NAME LAST SMITH ID1 ADDRESS MYTOWN USA ID2 NAME FIRST DAVE ID2 NAME LAST BROWN ID2 ADDRESS ANYTOWN USA ID3 NAME LAST JONES ID3 ADDRESS SOMETOWN USA I want to make this into a new file like this to put the first and last name together on one line and leave the address line alone. ID1 TOM SMITH ID1 ADDRESS MYTOWN USA ID2 DAVE BROWN ID2 ADDRESS ANYTOWN USA ID3 JONES ID3 ADDRESS SOMETOWN USA I thought I figured out how to parse though the ID's but I am not so sure: prompt> my.awk BEGIN{OFS=FS=" "} {if($1 in a) {a[$1]=a[$1]} else {a[$1]=$0}} END {asort(a); for(i in a) print a[i]} What I am getting: prompt> /bin/gawk -f my.awk sample.txt ID1 NAME FIRST TOM ID2 NAME FIRST DAVE ID3 NAME LAST JONES Then I thought what about this: prompt> cat my2.awk BEGIN{OFS=FS=" "} {if($1 in a) {a[$1]=a[$1] " " $NF} else {a[$1]=$0}} END {asort(a); for(i in a) print a[i]} Resulted in this below, which got the first and last name together, but I got the USA from the address too and still no address on it's own line and the last name on the second record did not pick up the "BROWN", so I think I need to specify the fields I want in the print, but I wasn't sure how to do that either. ID1 NAME FIRST TOM SMITH USA ID2 NAME FIRST DAVE JR USA ID3 NAME LAST JONES USA Thanks for helping a newbie! bop-a-nator |
You can try it:-
Code:
#!/bin/bash |
Yes that is a solution, though I realize perhaps I needed to be more clear in that I was trying to do this within awk specifically. I can certainly close this and give you credit for solving and re-phrase my question if you feel that that is best.
Thank you. bop-a-nator I was looking to do it with in the awk script itself. As I am already parsing though the file which contains other data too. I simply have a subset of data within a file, I need to merge data from two lines together, and was trying to find a simply way to illustrate the problem I was trying to solve within an awk script. Basically as it it goes though the bigger awk and finds the records that begin with ID, then it needs to loop around in these to find the NAME identifier of FIRST and LAST then put the values of those on the same line. |
To be honest, I am also a beginner in awk. But whenever awk combines with shell, it creates magic. So I prefer both, instead of awk or shell alone.
In your case, I will give it a try to write whole script in awk itself. |
To do this entirely in awk, I think we need to be a bit more exacting in our matching logic. It also helps to write it out as a stand-alone script, rather than try to cram it all onto the command line.
Code:
#!/usr/bin/awk -f There's also a final assumption that the names are all single words. The code would have to get more complex if there could be a $5 field on the "NAME" lines. PS: Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques. |
@David:
Indeed, you've given a more strict (+perfect) solution. Could you explain the following line in your code i.e. what does ? and : do here, and how it's storing all this inside 'name':- Code:
name = fn[$1] ? fn[$1] OFS ln[$1] : ln[$1] |
It's called a ternary operator, a kind of simplified if/else pattern available in several programming languages.
http://www.gnu.org/software/gawk/man...ional-Exp.html In this case I used it to ensure that the space between the two names only appears when both are present. It's kind of hard to handle optional spaces without something like it. |
All times are GMT -5. The time now is 11:33 AM. |