LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 11-14-2013, 11:20 AM   #1
fincher69
Member
 
Registered: Jan 2005
Location: Tallahassee, FL
Distribution: Kubuntu 13.04
Posts: 69

Rep: Reputation: 15
sort seems to ignore one row


I am having an odd issue with sort. I have a file that has 362 tab-separated columns and ~23k rows. I want to sort by the value in the last column, going from highest to lowest values. My sort command looks like the following

Code:
sort -k 362,362rn file_unsorted.txt > file_sorted.txt
It seems to work and the entire file is sorted, with the exception of one value that is out of order. I have attempted to attach a histogram illustrating this. I haven't been able to discern anything different about this line or why it seems to be out of order. Any ideas about what might be happening and how to remedy it would be greatly appreciated.
Attached Images
File Type: png hist_of_na_values.png (12.1 KB, 4 views)
 
Old 11-14-2013, 11:31 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Without seeing the input I can only assume that the entry that is out of order is different somehow.

Here's an example:
Code:
$ cat input
1   2   3   4   5   6   05
1   2   3   4   5   6   04
1   2   3   4   5   6   06
1   2   3   4   5   6   03
1   2   3   4   5   6   07
1   2   3   4   5   6   O2
1   2   3   4   5   6   08
1   2   3   4   5   6   01
$ sort -k 7,7nr input
1   2   3   4   5   6   08
1   2   3   4   5   6   07
1   2   3   4   5   6   06
1   2   3   4   5   6   05
1   2   3   4   5   6   04
1   2   3   4   5   6   03
1   2   3   4   5   6   01
1   2   3   4   5   6   O2
That happens when O2 (capital o two) is used instead of 02 (zero two)
 
Old 11-14-2013, 11:45 AM   #3
fincher69
Member
 
Registered: Jan 2005
Location: Tallahassee, FL
Distribution: Kubuntu 13.04
Posts: 69

Original Poster
Rep: Reputation: 15
Since it has so many columns, it is a bit difficult to display much of the data, but I should have tried. Here's a snippet of the column I am sorting on surrounding the offending line.

1
1
1
1
105
0
0
0
0
 
Old 11-14-2013, 12:11 PM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Can you attach a file with the offending line and say 30 lines before and after?

Without the actual data its going to be a guessing game.
 
Old 11-14-2013, 02:13 PM   #5
fincher69
Member
 
Registered: Jan 2005
Location: Tallahassee, FL
Distribution: Kubuntu 13.04
Posts: 69

Original Poster
Rep: Reputation: 15
Here's a snippet. When I sort this snippet with the same command, it still mis-sorts that line. I tried using cat -v and :set list in vim to see if there were hidden characters that might be messing things up, but nothing struck me as out of place.
Attached Files
File Type: txt data_unsorted.txt (248.1 KB, 1 views)
 
Old 11-14-2013, 03:20 PM   #6
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
The line ending with 105 has 363 fields and not 362 like all the other lines.
Code:
$ awk '{ print "Number of fields: " NF "\tContent last field: " $NF }' data_unsorted.txt
.
.
Number of fields: 362   Content last field: 11
Number of fields: 362   Content last field: 57
Number of fields: 362   Content last field: 89
Number of fields: 363   Content last field: 105
Number of fields: 362   Content last field: 94
Number of fields: 362   Content last field: 44
Number of fields: 362   Content last field: 87
.
.
 
Old 11-14-2013, 06:33 PM   #7
fincher69
Member
 
Registered: Jan 2005
Location: Tallahassee, FL
Distribution: Kubuntu 13.04
Posts: 69

Original Poster
Rep: Reputation: 15
Ah, thanks! The files are tab separated, so when I process it, it appears to have the appropriate number of elements. I tried using the -t option to set the field separator to a tab and it worked. Thanks for your help!
 
  


Reply

Tags
sort


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] disable '--ignore-case' in 'sort' ta0kira Ubuntu 5 07-18-2011 09:29 AM
text manipulation in bash: sort columns according to the first row lethalfang Linux - Newbie 5 06-20-2011 04:10 PM
custom forum layout / ignore some forums or sort them ? H_TeXMeX_H LQ Suggestions & Feedback 11 05-31-2009 01:43 PM
Make sort ignore comment lines CrendKing Linux - General 3 04-17-2009 10:44 PM
Shell script to parse csv-like output, row by row utahnix Linux - General 8 12-08-2007 05:03 AM


All times are GMT -5. The time now is 09:22 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration