LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Ubuntu (https://www.linuxquestions.org/questions/ubuntu-63/)
-   -   disable '--ignore-case' in 'sort' (https://www.linuxquestions.org/questions/ubuntu-63/disable-ignore-case-in-sort-892219/)

ta0kira 07-17-2011 05:01 PM

disable '--ignore-case' in 'sort'
 
[edit: The title should be regarding --dictionary-order, not --ignore-case.]

It seems that Ubuntu has --dictionary-order enabled for sort by default, which is extremely inconvenient. Is there a way to disable this? There are many other ordering options, but no "just give me ASCII order" option. I looked in the manpages, infopages, /etc/profile, alias, and used /usr/bin/sort directly; nothing worked. Is this setting compiled in? It's not just the way GNU sort itself is; sort on Slackware doesn't do this.

Thanks!

Kevin Barry

PS Kubuntu 10.04, but this also happened on a 9.* or 8.* Unbuntu I used a while ago. I'm thinking the 9.* one did this and the 8.* one was normal.

grail 07-17-2011 10:49 PM

So this will seem like a silly question, but what is dictionary order? I mean I read the man page, but are you saying you have
non-alphanumeric characters that need to be sorted?

ta0kira 07-18-2011 08:48 AM

Here is an example:
Code:

file.txt,a8546f1ee0fe69012b25ef7ee9f872c7
file.txt~,648c72675d229be256d3efba134dad5d
file.txt~
file.txt

Sorted with sort -s in Slackware:
Code:

file.txt
file.txt,a8546f1ee0fe69012b25ef7ee9f872c7
file.txt~
file.txt~,648c72675d229be256d3efba134dad5d

Sorted with sort -s in Ubuntu or with sort -s --dictionary-order in Slackware:
Code:

file.txt
file.txt~
file.txt~,648c72675d229be256d3efba134dad5d
file.txt,a8546f1ee0fe69012b25ef7ee9f872c7

I need the first result, not the second. Converting the data to hex first does the trick, but I don't want to make that a permanent part of what I'm writing just to account for a sort bug on one distribution.

I downloaded the source and patches for coreutils on Ubuntu 10.04 (http://packages.ubuntu.com/lucid/coreutils). I built sort with and without the patches on Slackware and both versions gave me the "correct" results. Built on Ubuntu 10.04, however, I had the same --dictionary-order problem before and after the patches were applied. The build on Ubuntu went differently. It seemed like more things were done than when I built it on Slackware; a lot of the output was visibly different than "standard". I used ./configure --prefix=`pwd`/install && make install in all cases.

Kevin Barry

PS Statically linking when building on Ubuntu 10.04 (export LDFLAGS=-static before ./configure) eliminates the problem; therefore, it must be a problem with another .so. ldd shows that the Ubuntu version relies on librt whereas the Slackware-built version doesn't.

druuna 07-18-2011 09:07 AM

Hi,

Could the cause be the locale setting?

From the sort manual page:
Quote:

*** WARNING *** The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.
Do you see the same behaviour if you do this:
Code:

LANG=C sort ........
Hope this helps.

Guttorm 07-18-2011 09:13 AM

Hi

Are you sure it's not locale settings making the difference? I think it uses strcoll().

From "man sort":

Quote:

*** WARNING *** The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.


Code:

LC_ALL=C sort -s test.txt
file.txt
file.txt,a8546f1ee0fe69012b25ef7ee9f872c7
file.txt~
file.txt~,648c72675d229be256d3efba134dad5d

Edit:
Slow typer again. :)

ta0kira 07-18-2011 09:29 AM

It is the locale, which I figured out between my last post and reading your posts. I found this out a much more difficult way, however. I traced it to this line in sort.c:
Code:

3060:  hard_LC_COLLATE = hard_locale (LC_COLLATE);
Setting this to 0 instead of reading the locale fixes it, which means it's definitely the locale. I'm not sure how I missed that warning in the manpage, but it's definitely there.
Kevin Barry

PS In case it's ambiguous, export LC_ALL=C is the solution I'm going with, not making a hard-coded change in C.


All times are GMT -5. The time now is 11:03 PM.