[SOLVED] Sorting output by a column with numbers AND letters.
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Sorting output by a column with numbers AND letters.
Hi all,
I have a program that prints out lines like:
A FK5108 C10 1
or
A FK5108 O6 3
and I want to be able to pipe it to sort on that third column, by letter first, then number. But I keep coming getting files sorted like:
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C1 4
A FK5108 C12 2
A FK5108 O6 3
(field separations all start at same place, so columns are not jagged like above.)
I have read the sort man pages, and have tried -n for the numbers, and -k for the position to start sorting, among other things. I also tried inputting a second position to start sorting, which sort should supposedly refer to if the two entries are identical at the first place being compared, but it seems to just ignore the second one. I just can't get it to sort the numbers properly...
Thanks for any help I can get with this! For now I am manually opening the file in emacs and changing them around, needless to say, very time consuming.
** I have also looked through page after page on this forum and other sort help pages without finding anyone posting about my specific problem. (Perhaps I am just using the wrong keywords in my search?) So I am sorry if this question has already been asked/answered somewhere else.
Last edited by velocity_girl; 05-22-2011 at 03:23 PM.
Reason: Making output I get more clear
Click here to see the post LQ members have rated as the most helpful post in this thread.
But its actually just the 3rd field I need to sort by. I want it to first sort by whether its a C or O (or an N or F, which aren't shown in my example), then sort them by the number that follows, so
A FK5108 C1 4
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3
Last edited by velocity_girl; 05-22-2011 at 03:37 PM.
Reason: typo
$ cat infile
A FK5108 C11 1
A FK5108 O6 3
A FK5108 N5 6
A FK5108 C10 1
A FK5108 O5 3
A FK5108 C12 2
A FK5108 C1 4
$ sort -k 3,3n infile
A FK5108 C1 4
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3
Hope this helps.
Last edited by druuna; 05-22-2011 at 03:46 PM.
Reason: highlighted actual command
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C1 2
A FK5108 C12 2
A FK5108 O6 3
However, I looked at some more of the real lines of my file, (I had made up the last column in my above example, since I didn't think it was important since I wasn't sorting on it), and it seems in cases where the 4th field of a line with only one letter and one digit is the same as the 3rd character of the 3rd field (of another line, with two digits), thats where the unwanted ordering appears. When you use -k and specify with 3,3n, does it read on past the field separating blank space? And if so, is there a way to have it stop after it hits the blank space after the 3rd field?
Last edited by velocity_girl; 05-22-2011 at 04:08 PM.
Reason: Clarifiying wording
Sort on my machine(s) shows the same behaviour as druuna's ... what version
version/number is your sort?
Code:
sort --version
sort (GNU coreutils) 8.5
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and Paul Eggert.
sort -k 3,3 file0
A FK5108 C1 4
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3
sort -k 3,3n file0
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 C1 4
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3
sort -k 3.3 file0
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 C1 4
A FK5108 O5 3
A FK5108 N5 6
A FK5108 O6 3
sort -k 3.3n file0
A FK5108 C1 4
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
sort -n file0
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 C1 4
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3
sort file0
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 C1 4
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3
sort --version
sort (GNU coreutils) 8.5
Copyright (C) 2010 Free Software Foundation, Inc.
Also, in general, if you can persuade the prog that generates that data to pad the nums out to the same length ie C3 => C03, then this sort of manipulation will be easier in any lang.
Here's a way to temporarily set the C locale and run sort:
Code:
LC_ALL=C sort -k3,3n infile
So running that line took care of the C1 being out of place, but now I get output like:
Code:
A FK5108 C1 2
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 C14 1
A FK5108 C15 2
A FK5108 C17 1
A FK5108 C2 1
A FK5108 C24 1
A FK5108 C27 1
A FK5108 C3 6
A FK5108 C30 2
A FK5108 O1 1
A FK5108 O10 2
A FK5108 O2 9
I don't think sort (not limited to the sort command) can handle a string the way you want to. I did a few test and cannot find any way to do this using sort (or asort in awk).
If at all possible I would change the output that is created (see chrism01's reply post #8).
If that is not possible you could write a script that is able to do what you want/need. Here's an example based on the posted input from post #10 (+ some extra):
Code:
$ cat foo.sh
#!/bin/bash
cat infile | \
while read A B C D
do
printf "%s %s %s %s %s\n" "$A" "$B" "${C:0:1}" "${C:1}" "$D"
done | \
sort -k3,3 -k4,4n | \
while read A B C D E
do
printf "%s\t%s\t%s%s\t%s\n" "$A" "$B" "$C" "$D" "$E"
done
$ cat infile
A FK5108 C30 2
A FK5108 O34 9
A FK5108 C10 1
A FK5108 C14 1
A FK5108 O1 1
A FK5108 C15 2
A FK5108 C24 1
A FK5108 C27 1
A FK5108 C3 6
A FK5108 O10 2
A FK5108 C1 2
A FK5108 O2 9
A FK5108 C12 2
A FK5108 O23 9
A FK5108 O3 9
A FK5108 C11 1
A FK5108 C17 1
A FK5108 C2 1
$ ./foo.sh
A FK5108 C1 2
A FK5108 C2 1
A FK5108 C3 6
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 C14 1
A FK5108 C15 2
A FK5108 C17 1
A FK5108 C24 1
A FK5108 C27 1
A FK5108 C30 2
A FK5108 O1 1
A FK5108 O2 9
A FK5108 O3 9
A FK5108 O10 2
A FK5108 O23 9
A FK5108 O34 9
It is not clear from your posts if the input examples you posted is the actual data, if not then the above script might need some tuning.
Sorry it took so long for me to get back to this... I'm a grad student, so busyness tends to come in waves. And strong waves at that!
Quote:
Originally Posted by druuna
If at all possible I would change the output that is created (see chrism01's reply post #8).
Unfortunately, I cannot change the output. Its the standard format for the atom names in a certain file type, and later programs that I might feed this data into would pitch a fit if I changed that numbering/labeling.
Quote:
Originally Posted by druuna
If that is not possible you could write a script that is able to do what you want/need. Here's an example based on the posted input from post #10 (+ some extra):
Thanks a lot for this script, it seems to do the trick for what I need! (And will save me A LOT of time in not having to go in a manually switch the out of order entries...)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.