ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm trying to get the unix command line sort (e.g., /usr/bin/sort) to sort a tab delimited file based on 2 fields:
1) An alpha-numberic field (e.g. "abc1")
2) A numeric only field (1538921)
I want to sort on #1 first, and in cases in which the alpha-numeric is the same, sort based on field #2. Sorting based on a single field alone seems fairly straight forward, but I have no clue how to do this 2-level sorting.
An example:
bbb 234
aaa 123
aaa 111
bbb 891
Would give:
aaa 111
aaa 123
bbb 234
bbb 891
I know how to accomplish it in programming languages, but this is going to be called by a Ruby program and needs to sort the input data which may be several gigabytes in size, and Ruby just won't cut it for doing this quickly and without running out of memory (where as Unix sort is very fast!).
I may have over simplified my problem. My file format has about 10 other tab delimited fields that are irrelevant to the sorting, and would in fact throw this default sort off. My fields of interest are at indexes 5 and 8, and so there are other fields in there as well.
Okay, I'm having problems even sorting on a *single* field, let alone two. Here is an example: sort, numerically, based on the 2nd field, delimited by tabs:
Example file:
a bc 987 asd
d e f 876 lpoae
g hi 123 liead
j kl 234 ppoasd
m no 873 apoie
The space between the letters and the number are tabs, all other is white space. So the first line is actually:
a bc\t987\tasd
The default "delimiter" seems to be any white space. However, as noted above, some of my fields may have values that contain spaces, so I want the delimiter to use tabs (\t) instead. I have attempted the following:
sort -t \t +1n tmpfile
And it does ... apparently nothing. If I do:
sort +1n tmp
It is clear there is some sorting going on, but, it is using all whitespace as a seperator, resulting in incorrect results.
What on earth am I getting wrong here? This seems straight forward enough ... but, anytime I specify -t \t I don't get expected results.
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541
Rep:
Given the file, "junk"
Code:
a bc 987 asd
d ef 876 lpoae
g hi 123 liead
j kl 234 ppoasd
m no 873 apoie
where the lines are a<space>bc<tab>987<space>asd, ... you do not need to specify the tab delimiter (any "white space" is treated the same, including multiple spaces); you would sort on the numeric column with
Code:
sort +2n junk
g hi 123 liead
j kl 234 ppoasd
m no 873 apoie
d ef 876 lpoae
a bc 987 asd
On the other hand, if you had a non white space delimiter, say a vertical bar
Look in the sort info manual. It has several examples of sorting using multiple keys with different options. You use more than one -k argument.
Code:
* Sort the password file on the fifth field and ignore any leading
blanks. Sort lines with equal values in field five on the numeric
user ID in field three. Fields are separated by `:'.
sort -t : -k 5b,5 -k 3,3n /etc/passwd
sort -t : -n -k 5b,5 -k 3,3 /etc/passwd
sort -t : -b -k 5,5 -k 3,3n /etc/passwd
Thanks all for the pointers. My main problem lay in the fact that I thought "-t \t" or "-t\t" would specify tab as a delimiter, but that wasn't working.
For anyone finding this thread looking for a similar answer ... this is what I finally got working:
sort -t $'\t' +4 -5 +5n -6 fileName
It sorts a tab delimited file, sorting first on field 4, and then numerically on field 5 where field 4 is the same.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.