ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I was wondering if anyone knows a simpler way to sort the lines in a file based on particular (non-adjacent) fields. Consider this sample file:
Code:
three apple 1
two banana 2
one pear 3
zero pineapple 10
one orange 5
one lime 3
two lemon 7
four grape 5
Say I want entries such that fields 1 and 3 must be unique for a given record. I know I can do:
Code:
$ awk '{print $1,$3,$2}' temp.txt | sort -k 1,2 -u | awk '{print $1,$3,$2}'
four grape 5
one pear 3
one orange 5
three apple 1
two banana 2
two lemon 7
zero pineapple 10
However, I was wondering if there was a more compact way to do this. Juggling the fields with awk and then juggling them back can be somewhat challenging, especially when there are a lot of fields in a record.
Any thoughts?
Last edited by carl.waldbieser; 08-19-2005 at 10:57 PM.
filters out entries where the combination of field 1 and 3 is unique, not each field in itself. This is what you want, right?
Otherwise, try
Code:
$ sort -k 1 -u | sort -k 3 -u
hth --Jonas
Well, you understand what I want to do. However, neither of the solutions you proposed seems to work, though.
Code:
$ sort -k 1 -k 3 -u temp.txt
four grape 5
one lime 3
one orange 5
one pear 3
three apple 1
two banana 2
two lemon 7
zero pineapple 10
$ sort -k 1 -u temp.txt | sort -k 3 -u
three apple 1
zero pineapple 10
two banana 2
one lime 3
four grape 5
two lemon 7
The output in the first case contains a duplicate ("one lime 3" and "one pear 3").
The output in the second case eliminated "one orange 5", which was unique.
I kept scratching my head because I thought the first soulution ought to work. Then I tried:
Code:
sort -k 1,1 -k 3 -u temp.txt
four grape 5
one lime 3
one orange 5
one pear 3
three apple 1
two banana 2
two lemon 7
zero pineapple 10
And it gave me the result I was looking for. After studying the man page, I think it is because if you only specify one argument for the key, it sorts from that field to the last field. So in essence, the sort was by f1,f2,f3,f3. All the lines were considered unique because all the fields were included.
Thanks, I knew there had to be an easier way!
EDIT: I accidently posted the wrong output in the final solution. Corrected in my next post.
Last edited by carl.waldbieser; 08-21-2005 at 10:37 AM.
Originally posted by eddiebaby1023 I was going to post that solution yesterday, but it doesn't give you the result you said you wanted in your first post! You've got
Code:
one lime 3
one pear 3
in your result, which you said you didn't want, "one" and "3" being a duplicate.
My bad. I think I just copied the wrong output into my last post.
The actual output I get is:
Code:
$ sort -k 1,1 -k 3 -u temp.txt
four grape 5
one pear 3
one orange 5
three apple 1
two banana 2
two lemon 7
zero pineapple 10
Last edited by carl.waldbieser; 08-21-2005 at 10:35 AM.
Originally posted by carl.waldbieser After studying the man page, I think it is because if you only specify one argument for the key, it sorts from that field to the last field. So in essence, the sort was by f1,f2,f3,f3. All the lines were considered unique because all the fields were included.
There is no "I think" about it -- you are exactly right about "f1,f2,f3,f3".
BTW, it's not in the man page (perhaps in the <shudder /> info page), but you can fine tune your keys to the character position:
Code:
sort -k f.n,g.m
where f & g are field numbers and n & m are position numbers.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.