LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-19-2005, 08:46 PM   #1
carl.waldbieser
Member
 
Registered: Jun 2005
Location: Pennsylvania
Distribution: Kubuntu
Posts: 197

Rep: Reputation: 32
bash: Unique lines based on specific fields.


I was wondering if anyone knows a simpler way to sort the lines in a file based on particular (non-adjacent) fields. Consider this sample file:

Code:
three apple 1
two banana 2
one pear 3
zero pineapple 10
one orange 5
one lime 3
two lemon 7
four grape 5
Say I want entries such that fields 1 and 3 must be unique for a given record. I know I can do:
Code:
$ awk '{print $1,$3,$2}' temp.txt | sort -k 1,2 -u | awk '{print $1,$3,$2}'
four grape 5
one pear 3
one orange 5
three apple 1
two banana 2
two lemon 7
zero pineapple 10
However, I was wondering if there was a more compact way to do this. Juggling the fields with awk and then juggling them back can be somewhat challenging, especially when there are a lot of fields in a record.

Any thoughts?

Last edited by carl.waldbieser; 08-19-2005 at 11:57 PM.
 
Old 08-20-2005, 09:03 PM   #2
jonaskoelker
Senior Member
 
Registered: Jul 2004
Location: Denmark
Distribution: Ubuntu, Debian
Posts: 1,524

Rep: Reputation: 46
Code:
$ sort -k 1 -k 3 -u myfile
filters out entries where the combination of field 1 and 3 is unique, not each field in itself. This is what you want, right?

Otherwise, try
Code:
$ sort -k 1 -u | sort -k 3 -u
hth --Jonas
 
Old 08-21-2005, 02:27 AM   #3
carl.waldbieser
Member
 
Registered: Jun 2005
Location: Pennsylvania
Distribution: Kubuntu
Posts: 197

Original Poster
Rep: Reputation: 32
Quote:
Originally posted by jonaskoelker
Code:
$ sort -k 1 -k 3 -u myfile
filters out entries where the combination of field 1 and 3 is unique, not each field in itself. This is what you want, right?

Otherwise, try
Code:
$ sort -k 1 -u | sort -k 3 -u
hth --Jonas
Well, you understand what I want to do. However, neither of the solutions you proposed seems to work, though.

Code:
$ sort -k 1 -k 3 -u temp.txt

four grape 5
one lime 3
one orange 5
one pear 3
three apple 1
two banana 2
two lemon 7
zero pineapple 10

$ sort -k 1 -u temp.txt | sort -k 3 -u

three apple 1
zero pineapple 10
two banana 2
one lime 3
four grape 5
two lemon 7
The output in the first case contains a duplicate ("one lime 3" and "one pear 3").
The output in the second case eliminated "one orange 5", which was unique.

I kept scratching my head because I thought the first soulution ought to work. Then I tried:
Code:
sort -k 1,1 -k 3 -u temp.txt

four grape 5
one lime 3
one orange 5
one pear 3
three apple 1
two banana 2
two lemon 7
zero pineapple 10
And it gave me the result I was looking for. After studying the man page, I think it is because if you only specify one argument for the key, it sorts from that field to the last field. So in essence, the sort was by f1,f2,f3,f3. All the lines were considered unique because all the fields were included.

Thanks, I knew there had to be an easier way!

EDIT: I accidently posted the wrong output in the final solution. Corrected in my next post.

Last edited by carl.waldbieser; 08-21-2005 at 11:37 AM.
 
Old 08-21-2005, 06:07 AM   #4
jonaskoelker
Senior Member
 
Registered: Jul 2004
Location: Denmark
Distribution: Ubuntu, Debian
Posts: 1,524

Rep: Reputation: 46
Quote:
I kept screating my head because I though the first solution ought to work...
Well, I missed a thing or two--but it got you in the right direction :-)

Congrats on getting it solved.

--Jonas
 
Old 08-21-2005, 10:02 AM   #5
eddiebaby1023
Member
 
Registered: May 2005
Posts: 378

Rep: Reputation: 33
Quote:
And it gave me the result I was looking for.
I was going to post that solution yesterday, but it doesn't give you the result you said you wanted in your first post! You've got
Code:
one lime 3
one pear 3
in your result, which you said you didn't want, "one" and "3" being a duplicate.
 
Old 08-21-2005, 11:31 AM   #6
carl.waldbieser
Member
 
Registered: Jun 2005
Location: Pennsylvania
Distribution: Kubuntu
Posts: 197

Original Poster
Rep: Reputation: 32
Quote:
Originally posted by eddiebaby1023
I was going to post that solution yesterday, but it doesn't give you the result you said you wanted in your first post! You've got
Code:
one lime 3
one pear 3
in your result, which you said you didn't want, "one" and "3" being a duplicate.
My bad. I think I just copied the wrong output into my last post.

The actual output I get is:
Code:
$ sort -k 1,1 -k 3 -u temp.txt

four grape 5
one pear 3
one orange 5
three apple 1
two banana 2
two lemon 7
zero pineapple 10

Last edited by carl.waldbieser; 08-21-2005 at 11:35 AM.
 
Old 08-21-2005, 03:26 PM   #7
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 231Reputation: 231Reputation: 231
Quote:
Originally posted by carl.waldbieser
After studying the man page, I think it is because if you only specify one argument for the key, it sorts from that field to the last field. So in essence, the sort was by f1,f2,f3,f3. All the lines were considered unique because all the fields were included.
There is no "I think" about it -- you are exactly right about "f1,f2,f3,f3".

BTW, it's not in the man page (perhaps in the <shudder /> info page), but you can fine tune your keys to the character position:
Code:
sort -k f.n,g.m
where f & g are field numbers and n & m are position numbers.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
replacing specific lines in a text document stellarmarine1 Linux - General 1 09-07-2004 03:34 PM
search for specific text in fields using awk Helene Programming 2 04-23-2004 01:13 AM
Joining multiple lines and summing fields elconde Programming 1 02-13-2004 11:42 PM
cat: output specific number of lines mikeshn Linux - Software 3 12-31-2003 01:15 PM
How to limit telnet access to a specific directory based on logon? Saeven Linux - Networking 3 10-20-2002 06:17 PM


All times are GMT -5. The time now is 03:50 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration