LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-11-2007, 01:28 PM   #1
belorion
Member
 
Registered: Aug 2003
Distribution: Slackware 10.0
Posts: 124

Rep: Reputation: 16
Unix sort on multiple fields


I'm trying to get the unix command line sort (e.g., /usr/bin/sort) to sort a tab delimited file based on 2 fields:

1) An alpha-numberic field (e.g. "abc1")
2) A numeric only field (1538921)

I want to sort on #1 first, and in cases in which the alpha-numeric is the same, sort based on field #2. Sorting based on a single field alone seems fairly straight forward, but I have no clue how to do this 2-level sorting.

An example:

bbb 234
aaa 123
aaa 111
bbb 891

Would give:

aaa 111
aaa 123
bbb 234
bbb 891


I know how to accomplish it in programming languages, but this is going to be called by a Ruby program and needs to sort the input data which may be several gigabytes in size, and Ruby just won't cut it for doing this quickly and without running out of memory (where as Unix sort is very fast!).

Any suggestions?

thanks,
Matt
 
Old 11-11-2007, 01:51 PM   #2
/bin/bash
Senior Member
 
Registered: Jul 2003
Location: Indiana
Distribution: Mandrake Slackware-current QNX4.25
Posts: 1,802

Rep: Reputation: 47
Actually sort does what you want:
$ cat file
bbb 234
aaa 123
aaa 111
bbb 891

$ sort file

aaa 111
aaa 123
bbb 234
bbb 891
 
Old 11-11-2007, 03:37 PM   #3
belorion
Member
 
Registered: Aug 2003
Distribution: Slackware 10.0
Posts: 124

Original Poster
Rep: Reputation: 16
I may have over simplified my problem. My file format has about 10 other tab delimited fields that are irrelevant to the sorting, and would in fact throw this default sort off. My fields of interest are at indexes 5 and 8, and so there are other fields in there as well.
 
Old 11-11-2007, 05:01 PM   #4
belorion
Member
 
Registered: Aug 2003
Distribution: Slackware 10.0
Posts: 124

Original Poster
Rep: Reputation: 16
Okay, I'm having problems even sorting on a *single* field, let alone two. Here is an example: sort, numerically, based on the 2nd field, delimited by tabs:

Example file:
a bc 987 asd
d e f 876 lpoae
g hi 123 liead
j kl 234 ppoasd
m no 873 apoie

The space between the letters and the number are tabs, all other is white space. So the first line is actually:
a bc\t987\tasd

The default "delimiter" seems to be any white space. However, as noted above, some of my fields may have values that contain spaces, so I want the delimiter to use tabs (\t) instead. I have attempted the following:

sort -t \t +1n tmpfile

And it does ... apparently nothing. If I do:

sort +1n tmp

It is clear there is some sorting going on, but, it is using all whitespace as a seperator, resulting in incorrect results.

What on earth am I getting wrong here? This seems straight forward enough ... but, anytime I specify -t \t I don't get expected results.
 
Old 11-11-2007, 05:34 PM   #5
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,241

Rep: Reputation: 1407Reputation: 1407Reputation: 1407Reputation: 1407Reputation: 1407Reputation: 1407Reputation: 1407Reputation: 1407Reputation: 1407Reputation: 1407
note that
Code:
~$ echo \t
t
Sorting on `t' delimited fields obviously isn't what you want...

For some reason "\t" doesn't work right, sort doesn't want to interpret it:
Code:
~$ sort -t \\t -k2 -n example
sort: multi-character tab `\\t'
You can enter a tab using $'<backslash-code>' syntax
Code:
~$ sort -t $'\t' -k2 -n example
g hi    123     liead
j kl    234     ppoasd
m no    873     apoie
d e f   876     lpoae
a bc    987     asd
Also entering a literal tab by typing Ctrl+v TAB works
Code:
~$ sort -t'        ' -k2 -n blah 
g hi    123     liead
j kl    234     ppoasd
m no    873     apoie
d e f   876     lpoae
a bc    987     asd

Last edited by ntubski; 11-11-2007 at 05:41 PM.
 
Old 11-11-2007, 06:09 PM   #6
belorion
Member
 
Registered: Aug 2003
Distribution: Slackware 10.0
Posts: 124

Original Poster
Rep: Reputation: 16
Fantastic! Thank you so much!
 
Old 11-11-2007, 06:12 PM   #7
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1060Reputation: 1060Reputation: 1060Reputation: 1060Reputation: 1060Reputation: 1060Reputation: 1060Reputation: 1060
Given the file, "junk"
Code:
a bc    987 asd
d ef    876 lpoae
g hi    123 liead
j kl    234 ppoasd
m no    873 apoie
where the lines are a<space>bc<tab>987<space>asd, ... you do not need to specify the tab delimiter (any "white space" is treated the same, including multiple spaces); you would sort on the numeric column with
Code:
sort +2n junk

g hi    123 liead
j kl    234 ppoasd
m no    873 apoie
d ef    876 lpoae
a bc    987 asd
On the other hand, if you had a non white space delimiter, say a vertical bar
Code:
cat junk

a|bc|987|asd
d|ef|876|lpoae
g|hi|123|liead
j|kl|234|ppoasd
m|no|873|apoie

sort -t'|' +2n junk

g|hi|123|liead
j|kl|234|ppoasd
m|no|873|apoie
d|ef|876|lpoae
a|bc|987|asd
That what you're trying to do?
 
Old 11-11-2007, 06:21 PM   #8
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671
Look in the sort info manual. It has several examples of sorting using multiple keys with different options. You use more than one -k argument.
Code:
   * Sort the password file on the fifth field and ignore any leading
     blanks.  Sort lines with equal values in field five on the numeric
     user ID in field three.  Fields are separated by `:'.

          sort -t : -k 5b,5 -k 3,3n /etc/passwd
          sort -t : -n -k 5b,5 -k 3,3 /etc/passwd
          sort -t : -b -k 5,5 -k 3,3n /etc/passwd
 
Old 11-12-2007, 02:15 PM   #9
belorion
Member
 
Registered: Aug 2003
Distribution: Slackware 10.0
Posts: 124

Original Poster
Rep: Reputation: 16
Thanks all for the pointers. My main problem lay in the fact that I thought "-t \t" or "-t\t" would specify tab as a delimiter, but that wasn't working.

For anyone finding this thread looking for a similar answer ... this is what I finally got working:

sort -t $'\t' +4 -5 +5n -6 fileName

It sorts a tab delimited file, sorting first on field 4, and then numerically on field 5 where field 4 is the same.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Unix data sort question amytys Programming 2 07-01-2005 06:24 AM
gawk & sort commands in unix fanatic_ravi Linux - Software 0 01-25-2005 05:10 AM
Joining multiple lines and summing fields elconde Programming 1 02-13-2004 11:42 PM
unix sort from the right side dazdaz Linux - General 6 11-11-2003 09:34 AM
Handling multiple forms with same name hidden fields coolman0stress Programming 5 09-04-2003 02:34 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:58 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration