LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   sort command in linux (https://www.linuxquestions.org/questions/linux-newbie-8/sort-command-in-linux-4175501254/)

thiyagusham 04-10-2014 01:06 PM

sort command in linux
 
Hi ;

I have a dobut.
How sort works ? i don't understand the logic for numeric sortings.

PHP Code:

cat file5
15
23
125
225
456
456
1025 

>> what logic applied here ..

PHP Code:

sort file5
1025
125
15
225
23
456
456 

What i got it does sorting based on ascii but i dont understand the logic.

smallpond 04-10-2014 01:08 PM

First columns are in order. Where first columns are the same, second columns are in order, etc.

jdkaye 04-10-2014 01:17 PM

Another way of looking at is that the sort command treats strings of numbers like strings of letters. It sorts them not according to their value but according to their postion in the "alphabet" (0-9 instead of a-z) in the way that smallpond described it. So 1025 comes before 125 for the same reason that bacf comes before bcf in normal alphabetization.
Hope that's clear.
jdk

suicidaleggroll 04-10-2014 01:23 PM

In addition to the above, if you want to sort them numerically use the "-n" switch.

schneidz 04-10-2014 01:24 PM

its in lexicographical order.

check man sort.
if you want it in numerical order use the -n option.

thiyagusham 04-10-2014 01:32 PM

Hi ;

Thanks jdkaye ;

Could you please elaborate some better example ?

I am NOT clear. Some confusions.

thiyagusham 04-10-2014 01:37 PM

Hi All ;

Clearly i have posted , what is the logic behind for my output .
I am NOT asking abt -n option. Please provide answers for my EXACT Question

suicidaleggroll 04-10-2014 01:39 PM

Why are you getting so fussy?

As you've been told three times already, it sorts it alphabetically. 1 comes before 2, 2 comes before 3, etc. It does this on individual characters, NOT on the entire number. If two lines start with the same character, it moves to the second character, then the third, same way you alphabetize any list.

Start by just looking at the first character
Code:

$ sort file5
1
1
1
2
2
4
4

Ok, there are three 1s, so move to the next character and what do you get
Code:

0
2
5

How about the 2s
Code:

2
3

And the 456s are the same, so their "order" doesn't matter.

TB0ne 04-10-2014 03:15 PM

Quote:

Originally Posted by thiyagusham (Post 5150247)
Hi All ;
Clearly i have posted , what is the logic behind for my output . I am NOT asking abt -n option. Please provide answers for my EXACT Question

Please READ AND UNDERSTAND the answers you have been given. smallpond, suicidaleggroll, and jdkaye have ALL told you exactly why you're getting that output. You were also pointed to the man page, and as far as I know, there is NOTHING keeping you from going to Google and doing further research, or even downloading the source code for sort and looking at it.

If you don't like the answers you get here, you can always ask elsewhere...perhaps they can fit your 'exact' needs. If you don't UNDERSTAND the answers you get here, being snotty sure won't get you much further.

Aquarius_Girl 04-11-2014 04:34 AM

Here is a simplification of the excellent explanation of jdkaye.
Code:

$ sort file5
1025
125
15
225
23
456
456

You are thinking that 1025 is greater than 125 because you are
reading 1025 as one thousand twenty five, and 125 as one hundred
twenty five
.

The point here is that sort function doesn't read those numbers
the way you are reading them.

sort function is comparing these numbers digit by digit.

First it compared 1 from 1025 with 1 from 125, and found both are
equal,
then it compared 0 from 1025 with 2 from 125 and found that 0 is
smaller than 2, hence it placed 1025 ahead of 125.

pan64 04-11-2014 04:57 AM

just try to replace 0->a, 1->b, 2->c and so on, sort those strings and replace again a->0, b->1, ...

jdkaye 04-11-2014 05:29 AM

Quote:

Originally Posted by pan64 (Post 5150615)
just try to replace 0->a, 1->b, 2->c and so on, sort those strings and replace again a->0, b->1, ...

That's exactly what I said in my response and the OP could only say
Quote:

Could you please elaborate some better example ?

I am NOT clear. Some confusions.
So much for our good intentions. ;)
jdk

AnanthaP 04-11-2014 06:26 AM

It used to be called "sort-merge". That should explain the internal logic to you. This can be another thread. Briefly the `sort` command assumes line by line left to right character based sorting unless you give options. The `man` says it simply. "SORT LINES OF TEXT FILES".

You really should READ THE MANUAL.

SO AS OTHERS HAVE SAID REPEATEDLY, UNLESS YOU USE OPTIONS, SORT ASSUMES THAT THE LINES TO BE SORTED ARE CHARACTERS. UNLIKE NUMBERS AND DIGITS, THE ASCII VALUE USUALLY DETERMINES THE SEQUENCE.

You can use options to change the column delimiter, column number and nature of data to be sorted.

OK

jpollard 04-11-2014 06:39 AM

People, he may be having trouble with English.

Don't think of the file as "numbers" they aren't.

Your sequence
Code:

1025
125


is actually the following sequence of bytes:

Code:

0000000  1  0  2  5  \n  1  2  5  \n
        061 060 062 065 012 061 062 065 012

Where the second line is the octal value for each byte of the input.

Sort uses the values from the second line to sort by. And doing that means that 060 (the second byte of the first line) is less than 062 (the second byte of the second line), thus the first line is placed first.

Using the -n option to sort causes sort to first convert the data into integers.


All times are GMT -5. The time now is 09:24 PM.