LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   sort command help (https://www.linuxquestions.org/questions/linux-newbie-8/sort-command-help-785524/)

anu_1 01-28-2010 11:55 PM

sort command help
 
In a file there are two entries -->
Windows NT
Windows2008

In AIX ==>
sort filename >
Windows NT
Windows2008

In Linux the same command with the same file produces
Windows2008
Windows NT

Could anyone please explain...is this because the space is treated differently in AIX & LINUX during sort...

Thanks for help

druuna 01-29-2010 02:50 AM

Hi,

Sort uses the locale specified in the environment (the LC_ALL=xxx setting), that is probably why there is a difference in the output.

Although not all sort version support it, try using AIX sort's -A option. You could also set LC_ALL to c (LC_ALL=C), but the latter may influence more then just sort!! Be careful if this is a production environment.

Hope this clears things up a bit.

David the H. 01-29-2010 02:54 AM

I don't know about windows, but the sorting order in unix depends on the locale. Unicode sort order especially is different from the C/POSIX order. If you set your LC_COLLATE environment variable to either C or POSIX, the sorting of the above becomes the same.

Edit: Aargh, beaten by Druuna. But I can at least point out that setting LC_COLLATE only is more specific than setting LC_ALL, and won't affect the whole system.

jschiwal 01-29-2010 03:18 AM

Not sure on the answer because on my system, a space is sorted ahead of a "2" while "Windows2008" is sorted before "Windows NT".
I tried using -t' ' to change the field separator, without any difference. The order was always this way.

I took a peek at sort.c:
Code:

#ifdef POSIX_UNSPECIFIED
  /* The following block of code makes GNU sort incompatible with
    standard Unix sort, so it's ifdef'd out for now.
    The POSIX spec isn't clear on how to interpret this.
    FIXME: request clarification.

    From: kwzh@gnu.ai.mit.edu (Karl Heuer)
    Date: Thu, 30 May 96 12:20:41 -0400
    [Translated to POSIX 1003.1-2001 terminology by Paul Eggert.]

    [...]I believe I've found another bug in `sort'.

    $ cat /tmp/sort.in
    a b c 2 d
    pq rs 1 t
    $ textutils-1.15/src/sort -k1.7,1.7 </tmp/sort.in
    a b c 2 d
    pq rs 1 t
    $ /bin/sort -k1.7,1.7 </tmp/sort.in
    pq rs 1 t
    a b c 2 d

    Unix sort produced the answer I expected: sort on the single character
    in column 7.  GNU sort produced different results, because it disagrees
    on the interpretation of the key-end spec "M.N".  Unix sort reads this
    as "skip M-1 fields, then N-1 characters"; but GNU sort wants it to mean
    "skip M-1 fields, then either N-1 characters or the rest of the current
    field, whichever comes first".  This extra clause applies only to
    key-ends, not key-starts.
    */

  /* Make LIM point to the end of (one byte past) the current field.  */
  if (tab != NULL)
    {
      char *newlim;
      newlim = memchr (ptr, tab, lim - ptr);
      if (newlim)
        lim = newlim;
    }
  else
    {
      char *newlim;
      newlim = ptr;
      while (newlim < lim && blanks[to_uchar (*newlim)])
        ++newlim;
      while (newlim < lim && !blanks[to_uchar (*newlim)])
        ++newlim;
      lim = newlim;
    }
#endif

Actually, on this case, the original GNU interpretation is what I expected. A decimal point implies what follows is part of a field and not potentially several fields.

anu_1 01-29-2010 04:34 AM

Thank you all for the explanations..


All times are GMT -5. The time now is 02:10 AM.