LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 08-29-2006, 11:33 AM   #1
Kvetch
Member
 
Registered: Mar 2004
Posts: 35

Rep: Reputation: 15
sed/awk sort help


I stink at scripting and am trying to learn some and was hoping I could get some help.
I have a file that contains thousands of entries like the following (all comma separated)

foo,foo bar,foobar,blah,123,boo,boo bar,boobar
foo,foo bar,foobar,blah,345,boo,boo bar,boobar
foo,foo bar,foobar,blah,345,boo,boo bar,boobar
foo,foo bar,foobar,blah,234,boo,boo bar,boobar
foo,foo bar,foobar,blah,113,boo,boo bar,boobar
foo,foo bar,foobar,blah,123,boo,boo bar,boobar
foo,foo bar,foobar,blah,113,boo,boo bar,boobar

I want to sort the file based on column with the numbers
I can do something like

awk -F "," '{ print $5 | "sort" }' file

but this only prints out the 5th column. How can I get it to print out the entire line?

In a separate script I want to sort the same file based on the 5th column but only keep one record of the line if the 5th column has the same value on another line. So I want to remove duplicate lines based on the 5th column's value. So I am looking for the end result to look something like

foo,foo bar,foobar,blah,113,boo,boo bar,boobar
foo,foo bar,foobar,blah,123,boo,boo bar,boobar
foo,foo bar,foobar,blah,234,boo,boo bar,boobar
foo,foo bar,foobar,blah,345,boo,boo bar,boobar

If anyone has any suggestions I would greatly appreciate it.

Thanks,
Nick
 
Old 08-29-2006, 12:24 PM   #2
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Distribution: UBUNTU 5.10 since Jul-18,2006 on Intel 820 DC
Posts: 583

Rep: Reputation: 121Reputation: 121
See `man sort`. AFAIR, it has a -u flag.

ATB.

HTH.

End
 
Old 08-29-2006, 01:02 PM   #3
Kvetch
Member
 
Registered: Mar 2004
Posts: 35

Original Poster
Rep: Reputation: 15
Thanks. That gives me something like
113
123
345
How would I print the whole line ($0)?

Last edited by Kvetch; 08-29-2006 at 03:05 PM.
 
Old 08-29-2006, 01:12 PM   #4
ramram29
Member
 
Registered: Jul 2003
Location: Miami, Florida, USA
Distribution: Debian
Posts: 848
Blog Entries: 1

Rep: Reputation: 47
awk '{print $5; printf $0}' file | sort
 
Old 08-29-2006, 01:17 PM   #5
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 56
How about just using sort?
Code:
sort -k5 < file.txt | sort -u
 
Old 08-29-2006, 01:20 PM   #6
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Distribution: UBUNTU 5.10 since Jul-18,2006 on Intel 820 DC
Posts: 583

Rep: Reputation: 121Reputation: 121
Send (pipe) the sorted file to awk and someting like (in awk)

if (MyVal != $5) {
print $0 ;
MyVal = $5
}


End
 
Old 08-29-2006, 01:59 PM   #7
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-19-live-usb/ aix
Posts: 3,562

Rep: Reputation: 522Reputation: 522Reputation: 522Reputation: 522Reputation: 522Reputation: 522
um, i might just be showing my ignorance but how about some combination of
sort | uniq
 
Old 08-29-2006, 02:26 PM   #8
Kvetch
Member
 
Registered: Mar 2004
Posts: 35

Original Poster
Rep: Reputation: 15
thanks guys I appreciate all the suggestions.

homey - using sort like that doesn't work. I believe sort can only deliminate on spaces not commas.

schneidz - yea that is what I am trying to figure out but I can't manage to get the context down

ramram29 - that still only prints column 5 not the entire line

AnanthaP - I am not following you. How are you saying I should run the if statement?
awk -F "," '{ print $5 | "sort -u" }' file > file2
then somehow run your if statement against file2?
 
Old 08-29-2006, 03:06 PM   #9
Kvetch
Member
 
Registered: Mar 2004
Posts: 35

Original Poster
Rep: Reputation: 15
So far if I do something like
sort -k 5 file |uniq -f 4 > file2
It gives me
foo,foo bar,foobar,blah,113,boo,boo bar,boobar
foo,foo bar,foobar,blah,113,boo,boo bar,boobar
foo,foo bar,foobar,blah,113,boo,boo bar,boobar
foo,foo bar,foobar,blah,123,boo,boo bar,boobar
foo,foo bar,foobar,blah,123,boo,boo bar,boobar
foo,foo bar,foobar,blah,123,boo,boo bar,boobar
foo,foo bar,foobar,blah,234,boo,boo bar,boobar
foo,foo bar,foobar,blah,345,boo,boo bar,boobar
foo,foo bar,foobar,blah,345,boo,boo bar,boobar
 
Old 08-29-2006, 04:48 PM   #10
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
Was there anything wrong with AnanthaP's first suggestion?
Code:
ada@barnabas:~/tmp> sort -t"," -k5 -nu filename
foo,foo bar,foobar,blah,113,boo,boo bar,boobar
foo,foo bar,foobar,blah,123,boo,boo bar,boobar
foo,foo bar,foobar,blah,234,boo,boo bar,boobar
foo,foo bar,foobar,blah,345,boo,boo bar,boobar
 
Old 08-29-2006, 06:24 PM   #11
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 56
Quote:
Was there anything wrong with AnanthaP's first suggestion?
sort -t"," -k5 -nu filename

How did you get that from this?
Quote:
See `man sort`. AFAIR, it has a -u flag.

ATB.

HTH.

End
 
Old 08-29-2006, 09:52 PM   #12
Kvetch
Member
 
Registered: Mar 2004
Posts: 35

Original Poster
Rep: Reputation: 15
Thanks AnanthaP, homey and spirit receiver. I kept leaving off the -k5 by mistake. sorry for the confusion and thanks again everyone.
 
Old 08-30-2006, 04:55 AM   #13
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
Quote:
Originally Posted by homey
How did you get that from this?
That accounted for the -u switch in particular, I was wondering why everybody tried piping the file through more than a single command.
 
Old 08-30-2006, 07:38 AM   #14
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Distribution: UBUNTU 5.10 since Jul-18,2006 on Intel 820 DC
Posts: 583

Rep: Reputation: 121Reputation: 121
why the pipe? Because he wanted to print he first instnce of the same key. Awk is necessary since uniq would look at the full record.
To quote from omthe original post.

<Quote>but only keep one record of the line if the 5th column has the same value on another line. So I want to remove duplicate lines based on the 5th column's value
 
Old 08-30-2006, 08:02 AM   #15
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
Quote:
Originally Posted by AnanthaP
Because he wanted to print he first instnce of the same key.
Well, that's precisely what sort's -u switch does.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
....How to use awk to sort.... nisson Linux - General 12 11-02-2012 03:51 PM
Sed and Awk Gins Programming 7 04-19-2006 10:32 AM
how to delete duplicates entries in xml file using sed/awk/sort ? catzilla Linux - Software 1 10-28-2005 02:57 PM
How to loop or sort in bash, awk or sed? j4r0d Programming 1 09-09-2004 03:22 AM
awk/sed help pantera Programming 1 05-13-2004 11:59 PM


All times are GMT -5. The time now is 11:06 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration