LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-03-2014, 06:53 AM   #1
lpwevers
Member
 
Registered: Apr 2005
Location: The Netherlands
Distribution: SuSE, CentOS
Posts: 181

Rep: Reputation: 21
Difference in perl sort order on Linux and Solaris, even when using locale.


Hello,

I'm hoping someone can help me out with an issue regarding perl sort. I'm facing a difference in the sort order between Solaris and Linux. I was hoping I could avoid this by using locale, but unfortunately, things still go wrong when special characters are used.

Using the code below:
Code:
#!/usr/bin/perl

use strict;
use warnings;
use locale;

my $i;
my @sortedList;
my @toSort = ('SortTest', 
              'TestSort',
              'Sort_Test',
              'Test_Sort',
              'Test1_Sort',
              'Sort1_Test',
              'Sort_1Test',
              'Test_1Sort');
@sortedList = sort (@toSort);

for ($i = 0; $i <= $#sortedList; $i++)
{
	print "$i:\t$sortedList[$i]\n";
}
I get this output:
Code:
Linux                   Solaris
0:      Sort_1Test      0:      Sort_1Test
1:      Sort1_Test      1:      Sort_Test
2:      SortTest        2:      Sort1_Test
3:      Sort_Test       3:      SortTest
4:      Test_1Sort      4:      Test_1Sort
5:      Test1_Sort      5:      Test_Sort
6:      TestSort        6:      Test1_Sort
7:      Test_Sort       7:      TestSort
On both systems I'm using the same locale settings, though I must say, Linux seems to have some extra, for which I can't really believe they would be of influence on this. (Just to make sure, I tried to unset them, but their value remains POSIX)
Code:
Linux                             Solaris
LANG=                             LANG=
LC_CTYPE=en_US.ISO8859-1          LC_CTYPE=en_US.ISO8859-1
LC_NUMERIC=en_US.ISO8859-1        LC_NUMERIC=en_US.ISO8859-1
LC_TIME=en_US.ISO8859-1           LC_TIME=en_US.ISO8859-1
LC_COLLATE=en_US.ISO8859-1        LC_COLLATE=en_US.ISO8859-1
LC_MONETARY=en_US.ISO8859-1       LC_MONETARY=en_US.ISO8859-1
LC_MESSAGES=C                     LC_MESSAGES=C
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=                           LC_ALL=
I've tried using various settings of these, and I can see there's sometimes effect on the sort order, but always, Linux will prefer numbers over special characters, no matter what I try.

Can someone please help me out and provide a solution, so that the sort-order on both Linux and Solaris is the same?

Many thanks,
Louis
 
Old 03-03-2014, 08:41 AM   #2
linosaurusroot
Member
 
Registered: Oct 2012
Distribution: OpenSuSE,RHEL,Fedora,OpenBSD
Posts: 982
Blog Entries: 2

Rep: Reputation: 244Reputation: 244Reputation: 244
Have you set these locale variables in the beginning of the perl script to make sure they are not being reset somewhere between your shell prompt and the call to sort() ?

Code:
%ENV=();
$ENV{"LC_MESSAGES"}="C";
 ...
 other locale and PATH settings here
 ...

Last edited by linosaurusroot; 03-03-2014 at 08:43 AM.
 
1 members found this post helpful.
Old 03-04-2014, 01:06 AM   #3
lpwevers
Member
 
Registered: Apr 2005
Location: The Netherlands
Distribution: SuSE, CentOS
Posts: 181

Original Poster
Rep: Reputation: 21
Quote:
Originally Posted by linosaurusroot View Post
Have you set these locale variables in the beginning of the perl script to make sure they are not being reset somewhere between your shell prompt and the call to sort() ?

Code:
%ENV=();
$ENV{"LC_MESSAGES"}="C";
 ...
 other locale and PATH settings here
 ...
Hi Linosaurusroot,

Thanks for the tip. However, I'm afraid it didn't do the trick for me. I've added the lines below to the beginning of the script, but the results remain the same I'm afraid.
Code:
$ENV{LANG}        = 'en_US';
$ENV{LC_CTYPE}    = 'en_US.ISO8859-1';
$ENV{LC_NUMERIC}  = 'en_US.ISO8859-1';
$ENV{LC_TIME}     = 'en_US.ISO8859-1';
$ENV{LC_COLLATE}  = 'en_US.ISO8859-1';
$ENV{LC_MONETARY} = 'en_US.ISO8859-1';
$ENV{LC_MESSAGES} = 'C';
 
Old 03-04-2014, 05:28 PM   #4
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941
I also cordially suggest that you pose this question at http://www.perlmonks.org.
 
1 members found this post helpful.
Old 03-15-2014, 10:28 PM   #5
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 735

Rep: Reputation: 76
Hi.

I almost always use "C" as the locale, and I rarely run into trouble.
Code:
       NOTE: Not all systems have the "POSIX" locale (not all systems are
       POSIX-conformant), so use "C" when you need explicitly to specify this
       default locale.
from man perllocale

Here are sample runs from Debian GNU/Linux and Solaris:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate perl sort, different platforms.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && . $C perl

pl " Expected output:"
cat expected-output.txt

pl " Results:"
./p1

exit 0
producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
perl 5.10.0

-----
 Expected output:
0:	Sort1_Test
1:	SortTest
2:	Sort_1Test
3:	Sort_Test
4:	Test1_Sort
5:	TestSort
6:	Test_1Sort
7:	Test_Sort

-----
 Results:
0:	Sort1_Test
1:	SortTest
2:	Sort_1Test
3:	Sort_Test
4:	Test1_Sort
5:	TestSort
6:	Test_1Sort
7:	Test_Sort
and producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: SunOS, 5.10, i86pc
Distribution        : Solaris 10 10/08 s10x_u6wos_07b X86
bash GNU bash 3.00.16
perl 5.8.4

-----
 Expected output:
0:      Sort1_Test
1:      SortTest
2:      Sort_1Test
3:      Sort_Test
4:      Test1_Sort
5:      TestSort
6:      Test_1Sort
7:      Test_Sort

-----
 Results:
0:      Sort1_Test
1:      SortTest
2:      Sort_1Test
3:      Sort_Test
4:      Test1_Sort
5:      TestSort
6:      Test_1Sort
7:      Test_Sort
with your perl code being:
Code:
#!/usr/bin/perl

use strict;
use warnings;
use locale;

my $i;
my @sortedList;
my @toSort = ('SortTest', 
              'TestSort',
              'Sort_Test',
              'Test_Sort',
              'Test1_Sort',
              'Sort1_Test',
              'Sort_1Test',
              'Test_1Sort');
@sortedList = sort (@toSort);

for ($i = 0; $i <= $#sortedList; $i++)
{
	print "$i:\t$sortedList[$i]\n";
}
Best wishes ... cheers, makyo
 
1 members found this post helpful.
Old 03-17-2014, 02:31 PM   #6
lpwevers
Member
 
Registered: Apr 2005
Location: The Netherlands
Distribution: SuSE, CentOS
Posts: 181

Original Poster
Rep: Reputation: 21
[QUOTE=makyo;5135339]Hi.

I almost always use "C" as the locale, and I rarely run into trouble.
Code:
       NOTE: Not all systems have the "POSIX" locale (not all systems are
       POSIX-conformant), so use "C" when you need explicitly to specify this
       default locale.
from man perllocale

Here are sample runs from Debian GNU/Linux and Solaris:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate perl sort, different platforms.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && . $C perl

pl " Expected output:"
cat expected-output.txt

pl " Results:"
./p1

exit 0
...

Thanks, that actually did the trick. And I must admit, I never would have thought of this solution.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Alpha Sort order for Linux Filenames and Directories parodytx Linux - Newbie 1 10-24-2011 01:11 PM
awk's asort sort order according to locale BerzinTehvs Linux - Software 4 08-03-2010 02:01 PM
Can I use GNU sort to sort one field in order, another in reverse? zombieite Linux - Newbie 4 04-27-2009 12:23 AM
How do I do filtering in Perl (keep sort order and sort again by another means)? RavenLX Programming 9 12-19-2008 10:12 AM
Sort order in nautilus Patrick K Ubuntu 4 06-12-2007 08:53 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:10 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration