LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 05-22-2011, 03:18 PM   #1
velocity_girl
LQ Newbie
 
Registered: May 2009
Location: Boston, MA
Distribution: SuSE 11.4
Posts: 5

Rep: Reputation: 0
Sorting output by a column with numbers AND letters.


Hi all,

I have a program that prints out lines like:

A FK5108 C10 1

or

A FK5108 O6 3


and I want to be able to pipe it to sort on that third column, by letter first, then number. But I keep coming getting files sorted like:

A FK5108 C10 1
A FK5108 C11 1
A FK5108 C1 4
A FK5108 C12 2
A FK5108 O6 3

(field separations all start at same place, so columns are not jagged like above.)

I have read the sort man pages, and have tried -n for the numbers, and -k for the position to start sorting, among other things. I also tried inputting a second position to start sorting, which sort should supposedly refer to if the two entries are identical at the first place being compared, but it seems to just ignore the second one. I just can't get it to sort the numbers properly...

Thanks for any help I can get with this! For now I am manually opening the file in emacs and changing them around, needless to say, very time consuming.


** I have also looked through page after page on this forum and other sort help pages without finding anyone posting about my specific problem. (Perhaps I am just using the wrong keywords in my search?) So I am sorry if this question has already been asked/answered somewhere else.

Last edited by velocity_girl; 05-22-2011 at 03:23 PM. Reason: Making output I get more clear
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 05-22-2011, 03:29 PM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

Is this what you are looking for:
Code:
sort -k 1,1 -k 2,2 -k 3,3n infile
This sorts on field 1, then field 2 if needed and then on field 3 (numeric) if needed. If field one is always a capital a (A) you can shorten it to:
Code:
sort -k 1,2 -k 3,3n infile
Hope this helps.
 
Old 05-22-2011, 03:36 PM   #3
velocity_girl
LQ Newbie
 
Registered: May 2009
Location: Boston, MA
Distribution: SuSE 11.4
Posts: 5

Original Poster
Rep: Reputation: 0
Thanks for the fast response druuna!

But its actually just the 3rd field I need to sort by. I want it to first sort by whether its a C or O (or an N or F, which aren't shown in my example), then sort them by the number that follows, so

A FK5108 C1 4
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3

Last edited by velocity_girl; 05-22-2011 at 03:37 PM. Reason: typo
 
Old 05-22-2011, 03:44 PM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

Code:
$ cat infile
A FK5108 C11 1
A FK5108 O6 3
A FK5108 N5 6
A FK5108 C10 1
A FK5108 O5 3
A FK5108 C12 2
A FK5108 C1 4

$ sort -k 3,3n infile
A FK5108 C1 4
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3
Hope this helps.

Last edited by druuna; 05-22-2011 at 03:46 PM. Reason: highlighted actual command
 
Old 05-22-2011, 04:04 PM   #5
velocity_girl
LQ Newbie
 
Registered: May 2009
Location: Boston, MA
Distribution: SuSE 11.4
Posts: 5

Original Poster
Rep: Reputation: 0
Again, I appreciate the fast reply.

So I just tried your suggestion, but I still get


A FK5108 C10 1
A FK5108 C11 1
A FK5108 C1 2
A FK5108 C12 2
A FK5108 O6 3

However, I looked at some more of the real lines of my file, (I had made up the last column in my above example, since I didn't think it was important since I wasn't sorting on it), and it seems in cases where the 4th field of a line with only one letter and one digit is the same as the 3rd character of the 3rd field (of another line, with two digits), thats where the unwanted ordering appears. When you use -k and specify with 3,3n, does it read on past the field separating blank space? And if so, is there a way to have it stop after it hits the blank space after the 3rd field?

Last edited by velocity_girl; 05-22-2011 at 04:08 PM. Reason: Clarifiying wording
 
Old 05-22-2011, 07:46 PM   #6
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,978
Blog Entries: 11

Rep: Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879
Sort on my machine(s) shows the same behaviour as druuna's ... what version
version/number is your sort?
Code:
sort --version
sort (GNU coreutils) 8.5
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.

Cheers,
Tink

Last edited by Tinkster; 05-22-2011 at 07:52 PM.
 
Old 05-22-2011, 09:01 PM   #7
Karl Godt
Member
 
Registered: Mar 2010
Location: Kiel , Germany
Distribution: once:SuSE6.2,Debian3.1, aurox9.2+3,Mandrake?,DSL? then:W7st,WVHB, #!8.10.02,PUPPY4.3.1 now:Macpup
Posts: 308

Rep: Reputation: 45
Code:
sort -k 3,3 file0
A FK5108 C1 4
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3

sort -k 3,3n file0
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 C1 4
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3

sort -k 3.3 file0
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 C1 4
A FK5108 O5 3
A FK5108 N5 6
A FK5108 O6 3

sort -k 3.3n file0
A FK5108 C1 4
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2

sort -n file0
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 C1 4
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3

sort file0
A FK5108 C10 1
A FK5108 C11 1
A FK5108 C12 2
A FK5108 C1 4
A FK5108 N5 6
A FK5108 O5 3
A FK5108 O6 3

sort --version
sort (GNU coreutils) 8.5
Copyright (C) 2010 Free Software Foundation, Inc.
It might be locale@utf-8 related .
Code:
echo $LANG
One one puppy I have to
Code:
dc 2,3 2,3 \* p
on most others
Code:
dc 2.3 2.3 \* p
5.29
echo $LANG
en_US
 
1 members found this post helpful.
Old 05-23-2011, 01:15 AM   #8
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,280

Rep: Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032
Also, in general, if you can persuade the prog that generates that data to pad the nums out to the same length ie C3 => C03, then this sort of manipulation will be easier in any lang.
 
1 members found this post helpful.
Old 05-23-2011, 01:43 AM   #9
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

The generated output is locale specific.

Here's a way to temporarily set the C locale and run sort:
Code:
LC_ALL=C sort -k3,3n infile
In the above example the locale is only set for the sort command, afterwards the default locale is restored.

Hope this helps.
 
Old 05-23-2011, 06:58 AM   #10
velocity_girl
LQ Newbie
 
Registered: May 2009
Location: Boston, MA
Distribution: SuSE 11.4
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by druuna View Post

Here's a way to temporarily set the C locale and run sort:
Code:
LC_ALL=C sort -k3,3n infile

So running that line took care of the C1 being out of place, but now I get output like:

Code:
 
A       FK5108   C1     2
A       FK5108   C10    1
A       FK5108   C11    1
A       FK5108   C12    2
A       FK5108   C14    1
A       FK5108   C15    2
A       FK5108   C17    1
A       FK5108   C2     1
A       FK5108   C24    1
A       FK5108   C27    1
A       FK5108   C3     6
A       FK5108   C30    2
A       FK5108   O1     1
A       FK5108   O10    2
A       FK5108   O2     9
 
Old 05-23-2011, 09:22 AM   #11
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

I don't think sort (not limited to the sort command) can handle a string the way you want to. I did a few test and cannot find any way to do this using sort (or asort in awk).

If at all possible I would change the output that is created (see chrism01's reply post #8).

If that is not possible you could write a script that is able to do what you want/need. Here's an example based on the posted input from post #10 (+ some extra):
Code:
$ cat foo.sh
#!/bin/bash

cat infile | \
while read A B C D
do
  printf "%s %s %s %s %s\n" "$A" "$B" "${C:0:1}" "${C:1}" "$D"
done | \
sort -k3,3 -k4,4n | \
while read A B C D E
do
  printf "%s\t%s\t%s%s\t%s\n" "$A" "$B" "$C" "$D" "$E"
done

$ cat infile
A       FK5108   C30    2
A       FK5108   O34    9
A       FK5108   C10    1
A       FK5108   C14    1
A       FK5108   O1     1
A       FK5108   C15    2
A       FK5108   C24    1
A       FK5108   C27    1
A       FK5108   C3     6
A       FK5108   O10    2
A       FK5108   C1     2
A       FK5108   O2     9
A       FK5108   C12    2
A       FK5108   O23    9
A       FK5108   O3     9
A       FK5108   C11    1
A       FK5108   C17    1
A       FK5108   C2     1

$ ./foo.sh
A       FK5108  C1      2
A       FK5108  C2      1
A       FK5108  C3      6
A       FK5108  C10     1
A       FK5108  C11     1
A       FK5108  C12     2
A       FK5108  C14     1
A       FK5108  C15     2
A       FK5108  C17     1
A       FK5108  C24     1
A       FK5108  C27     1
A       FK5108  C30     2
A       FK5108  O1      1
A       FK5108  O2      9
A       FK5108  O3      9
A       FK5108  O10     2
A       FK5108  O23     9
A       FK5108  O34     9
It is not clear from your posts if the input examples you posted is the actual data, if not then the above script might need some tuning.

Hope this helps.
 
2 members found this post helpful.
Old 05-23-2011, 10:54 AM   #12
Karl Godt
Member
 
Registered: Mar 2010
Location: Kiel , Germany
Distribution: once:SuSE6.2,Debian3.1, aurox9.2+3,Mandrake?,DSL? then:W7st,WVHB, #!8.10.02,PUPPY4.3.1 now:Macpup
Posts: 308

Rep: Reputation: 45
Mean something like this ? ( took me hours for uniq command would not work as I thought
Code:
#!/bin/sh

LANG=C 
sort -k3,3 -t '|' file0 | cut -f 3 -d '|' > file3
cat file3 | grep -o -E '[0-9]*|[0-9]*[[:punct:]]*[0-9]*' > file9
N=`cat file9 |sed 's#\.#\\\.#g' | sort -u| sort -g|tr '\n' ' '`
rm -f file10
for i in $N ; do 
 echo $i; echo ;
 grep ".*|.*|.*$i[a-zA-Z]*|.*" file0 | grep -v "|.*|.*$i[0-9]" >> file10; 
 
 echo ; 
done
rm -f file30
rm -f file40
VAR=
BULK=
cp -f file10 file20
cat file10 | while read line ; do echo $line ; done
 
cat file10 | while read line ; do 
echo $line ; 
LINE=`echo "$line" | sed 's#\.#\\\.#g'`
echo "$LINE"
grep -m 1 "$LINE" file20 >> file30
VAR=`grep -m 1 "$LINE" file20`
if [ -z "`echo "$BULK" | grep "$LINE"`" ] ; then
 BULK="`echo -e "$BULK""\n""$line"`"
 echo "$BULK"
 echo "$BULK" | tail -n 1 >> file40
fi
done
 
diff file0 file40
used '|' as separators of the original input file infile (file0)
 
Old 06-08-2011, 03:50 PM   #13
velocity_girl
LQ Newbie
 
Registered: May 2009
Location: Boston, MA
Distribution: SuSE 11.4
Posts: 5

Original Poster
Rep: Reputation: 0
Thanks again for all the help everyone.

Sorry it took so long for me to get back to this... I'm a grad student, so busyness tends to come in waves. And strong waves at that!

Quote:
Originally Posted by druuna View Post
If at all possible I would change the output that is created (see chrism01's reply post #8).
Unfortunately, I cannot change the output. Its the standard format for the atom names in a certain file type, and later programs that I might feed this data into would pitch a fit if I changed that numbering/labeling.

Quote:
Originally Posted by druuna View Post
If that is not possible you could write a script that is able to do what you want/need. Here's an example based on the posted input from post #10 (+ some extra):
Thanks a lot for this script, it seems to do the trick for what I need! (And will save me A LOT of time in not having to go in a manually switch the out of order entries...)
 
Old 06-09-2011, 01:03 AM   #14
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

@velocity_girl: You're welcome
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Any way to show the permissions with numbers instead of with letters? tirengarfio Linux - Newbie 2 06-03-2010 05:20 PM
sorting a file by column AutoC Programming 6 08-07-2009 05:43 PM
C programming - sorting random numbers Gigantor Programming 8 12-05-2005 10:32 PM
dpkg, letters in column on the left johnMG Debian 2 09-23-2005 03:33 PM
Sorting of multi-column output rytrom Linux - Newbie 11 09-15-2003 11:31 AM


All times are GMT -5. The time now is 01:39 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration