LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-23-2013, 03:33 PM   #1
bop-a-nator
LQ Newbie
 
Registered: Sep 2012
Location: North East USA
Distribution: at work: Red Hat Enterprise Linux Server release 5.8 (Tikanga); at home: what do you recommend?
Posts: 24

Rep: Reputation: Disabled
merge data from two lines within a "group" onto one line


I have a text file with some different data something like this , I was trying to figure out how to parse though it to
in a sense merge data from two lines within a "group" onto one line.

prompt> cat sample.txt
ID1 NAME FIRST TOM
ID1 NAME LAST SMITH
ID1 ADDRESS MYTOWN USA
ID2 NAME FIRST DAVE
ID2 NAME LAST BROWN
ID2 ADDRESS ANYTOWN USA
ID3 NAME LAST JONES
ID3 ADDRESS SOMETOWN USA

I want to make this into a new file like this to put the first and last name together on one line and leave the address line alone.

ID1 TOM SMITH
ID1 ADDRESS MYTOWN USA
ID2 DAVE BROWN
ID2 ADDRESS ANYTOWN USA
ID3 JONES
ID3 ADDRESS SOMETOWN USA

I thought I figured out how to parse though the ID's but I am not so sure:

prompt> my.awk
BEGIN{OFS=FS=" "}
{if($1 in a)
{a[$1]=a[$1]} else {a[$1]=$0}}
END {asort(a); for(i in a) print a[i]}

What I am getting:

prompt> /bin/gawk -f my.awk sample.txt
ID1 NAME FIRST TOM
ID2 NAME FIRST DAVE
ID3 NAME LAST JONES

Then I thought what about this:

prompt> cat my2.awk
BEGIN{OFS=FS=" "}
{if($1 in a) {a[$1]=a[$1] " " $NF} else {a[$1]=$0}}
END {asort(a); for(i in a) print a[i]}

Resulted in this below, which got the first and last name together, but I got the USA from the address too
and still no address on it's own line and the last name on the second record did not pick up the "BROWN",
so I think I need to specify the fields I want in the print, but I wasn't sure how to do that either.

ID1 NAME FIRST TOM SMITH USA
ID2 NAME FIRST DAVE JR USA
ID3 NAME LAST JONES USA

Thanks for helping a newbie!
bop-a-nator
 
Old 01-23-2013, 10:44 PM   #2
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
You can try it:-
Code:
#!/bin/bash
INFILE=/home/username/sample.txt  # This file is your sample.txt input file
TEMP=/tmp/ids.txt
awk '!_[$1]++ {print $1}' $INFILE > $TEMP
while read -r id
do
gawk -v name="$id" 'BEGIN {ORS=" "}; $1 ~ name && /FIRST/ {print name,$4}' $INFILE && awk -v name="$id" '$1 ~ name && /LAST/ {print $4}' $INFILE

gawk -v name="$id" '$1 ~ name && /ADDRESS/ {print name,"ADDRESS",$3,$4}' $INFILE
done < $TEMP
\rm $TEMP
 
Old 01-24-2013, 01:51 PM   #3
bop-a-nator
LQ Newbie
 
Registered: Sep 2012
Location: North East USA
Distribution: at work: Red Hat Enterprise Linux Server release 5.8 (Tikanga); at home: what do you recommend?
Posts: 24

Original Poster
Rep: Reputation: Disabled
Yes that is a solution, though I realize perhaps I needed to be more clear in that I was trying to do this within awk specifically. I can certainly close this and give you credit for solving and re-phrase my question if you feel that that is best.

Thank you.
bop-a-nator

I was looking to do it with in the awk script itself. As I am already parsing though the file which contains other data too. I simply have a subset of data within a file, I need to merge data from two lines together, and was trying to find a simply way to illustrate the problem I was trying to solve within an awk script. Basically as it it goes though the bigger awk and finds the records that begin with ID, then it needs to loop around in these to find the NAME identifier of FIRST and LAST then put the values of those on the same line.
 
Old 01-24-2013, 02:18 PM   #4
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
To be honest, I am also a beginner in awk. But whenever awk combines with shell, it creates magic. So I prefer both, instead of awk or shell alone.

In your case, I will give it a try to write whole script in awk itself.
 
Old 01-27-2013, 11:03 AM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian + kde 4 / 5
Posts: 6,834

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
To do this entirely in awk, I think we need to be a bit more exacting in our matching logic. It also helps to write it out as a stand-alone script, rather than try to cram it all onto the command line.

Code:
#!/usr/bin/awk -f

{
if ( $2 == "NAME" )
  {
    if ( $3 == "FIRST" ) { fn[$1]=$4 }
    if ( $3 == "LAST"  ) { ln[$1]=$4 }
    next
  }

if ( $2 == "ADDRESS" )
  {
    name = fn[$1] ? fn[$1] OFS ln[$1] : ln[$1]
    print $1 , name
    print $0
  }
}
The above assumes that there's always a "LAST" name, but "FIRST" is optional. You'll have to redo the name variable setting if it can be otherwise. It also assumes that the "ADDRESS" line always follows the name fields. If not, then you'll either have to save the address too and print everything out in an END section after the main processing is complete.

There's also a final assumption that the names are all single words. The code would have to get more complex if there could be a $5 field on the "NAME" lines.


PS: Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques.

Last edited by David the H.; 01-27-2013 at 11:40 AM. Reason: more compact code
 
Old 01-27-2013, 01:20 PM   #6
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
@David:
Indeed, you've given a more strict (+perfect) solution. Could you explain the following line in your code i.e. what does ? and : do here, and how it's storing all this inside 'name':-
Code:
name = fn[$1] ? fn[$1] OFS ln[$1] : ln[$1]

Last edited by shivaa; 01-27-2013 at 01:22 PM.
 
Old 01-27-2013, 05:23 PM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian + kde 4 / 5
Posts: 6,834

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
It's called a ternary operator, a kind of simplified if/else pattern available in several programming languages.

http://www.gnu.org/software/gawk/man...ional-Exp.html

In this case I used it to ensure that the space between the two names only appears when both are present. It's kind of hard to handle optional spaces without something like it.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how can I "cat" or "grep" a file to ignore lines starting with "#" ??? callagga Linux - Newbie 7 08-16-2013 07:58 AM
[SOLVED] sql: put "beginning of line" and "end of line" within a charset... masavini Programming 7 09-19-2012 09:03 AM
bash - how to remove lines from "FILE_A" which presents in "FILE_B" ? Vilmerok Programming 4 03-13-2009 05:27 AM
compiling code resulting in: 'unrecognized command line "-fstart-group"' freeindy Programming 2 05-14-2008 02:28 AM
javascript merge "if confirm" with "submit" rblampain Linux - Software 6 09-18-2005 11:44 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:14 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration