LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 04-24-2015, 11:10 PM   #1
centguy
Member
 
Registered: Feb 2008
Posts: 597
Blog Entries: 1

Rep: Reputation: 46
what character is stronger than the minus or the hyphen character ?


I use the naming convention to help to sort files in a directory.

Under "ls" command linux,
I have all the project directories in the following order:
Quote:
04.07_01-SProject
04.07:01-SProject
04.07+01-SProject
04.07.02-SProject
04.070-SProject
04.07a01-SProject
04.07A01-SProject
04.07=SProject
04.07-SProject
04.07-T02-SProject
The names are created to illustrate my point.

Now, I would really like to ask "ls" to
display a project called 04.07+01-SProject
created later than 04.07-SProject
got listed below 04.07-SProject, like

04.07-SProject
04.07+01-SProject

but under ls,

I will get
04.07+01-SProject
04.07-SProject

It is a bit complicated but it is crucial for me to keep track
of different projects in "chronological" order.

The ls put the strength of characters according to

_ : + . 012 abcz ABC Z = -


I think the solution to this is to use the weaker characters such
as _ : + rather than the strongest character - from the very beginning (the weakest one _ has been outlawed by
me to make files due to some reason). Or overwrite the alphabatical order controlled by linux
convention.

I would have like the precedence goes like

- = + . 0 12 abc AB C Z

Can I change the precedance at will ?

I hope my question did not confuse you. This is an academic exercise.

Last edited by centguy; 04-24-2015 at 11:11 PM.
 
Old 04-24-2015, 11:40 PM   #2
michaelk
Moderator
 
Registered: Aug 2002
Posts: 24,390

Rep: Reputation: 5470Reputation: 5470Reputation: 5470Reputation: 5470Reputation: 5470Reputation: 5470Reputation: 5470Reputation: 5470Reputation: 5470Reputation: 5470Reputation: 5470
Using the date as part of the name makes it easy to list files/directories in chronilogical order. I've done this for many years.

yearmonthday_data_file i.e 20150424-data_file
or
20150415-4.07-SProject
20150416-4.07+01-SProject
 
Old 04-25-2015, 12:09 AM   #3
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 20,257

Rep: Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838
ls has a lot of different sorting options, like -t, -c or --sort
 
Old 04-25-2015, 12:44 AM   #4
centguy
Member
 
Registered: Feb 2008
Posts: 597

Original Poster
Blog Entries: 1

Rep: Reputation: 46
michaelk:

The restriction is I do not want to change the original directory names, which contains
large files. I have
input files in many other places that refer to the original directory. If I change it I have to replace
the directory in name in the input in many places.

[linux]$ ls -1
070-SProject
07ASProject
07-SProject
07--SProject
07SSProject
07-T02-SProjec

Now I do not understand why - should come between A and S. What is the logic? It broke my observation
in the first post.
I was hoping to use -- to make it stronger than -S in sorting.
 
Old 04-25-2015, 12:50 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 20,257

Rep: Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838
You may have "invisible" characters inside filenames (try ls | od -xc to check it). Also you can try to choose another locale settings to modify sorting behavior.
 
Old 04-25-2015, 01:00 AM   #6
centguy
Member
 
Registered: Feb 2008
Posts: 597

Original Poster
Blog Entries: 1

Rep: Reputation: 46
No visible thing:

A-
AA
AS
B--
B-A
B-B
E-04.07ASb
E-04.07-Sb
E-04.07--Sb
E-04.07SSb

test it yourself if you may.
 
Old 04-25-2015, 01:38 AM   #7
centguy
Member
 
Registered: Feb 2008
Posts: 597

Original Poster
Blog Entries: 1

Rep: Reputation: 46
Tested on linux mint 14 and Centos 6. same thing.
 
Old 04-25-2015, 01:45 AM   #8
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 20,257

Rep: Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838Reputation: 6838
I cannot see what you made. Please write all your commands as they were executed and the full result.
 
Old 04-25-2015, 02:03 AM   #9
centguy
Member
 
Registered: Feb 2008
Posts: 597

Original Poster
Blog Entries: 1

Rep: Reputation: 46
Even more bizzare results:

$ ls -1
07ASb
07-Sb
07--Sb
07SSb
A-
AA
aASb
AS
a-Sb
a--Sb
aSSb
B--
B-A
B-B
C--A
C-AA
C-ZA
ZASb
Z-Sb
Z--Sb
ZSSb

now "a" appears between an "A" and another "A".
 
Old 04-25-2015, 02:13 AM   #10
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,091
Blog Entries: 23

Rep: Reputation: 4077Reputation: 4077Reputation: 4077Reputation: 4077Reputation: 4077Reputation: 4077Reputation: 4077Reputation: 4077Reputation: 4077Reputation: 4077Reputation: 4077
Oops... think before typing...

My man page says it sorts alpha if no other sort options specified...

What does which ls return? (i.e., does your shell have a built-in with different behavior).

Also, many distros alias ls, is there an alias that changes the expected sort behavior?

Last edited by astrogeek; 04-25-2015 at 02:21 AM. Reason: Think before inserting foot...
 
Old 04-25-2015, 02:16 AM   #11
centguy
Member
 
Registered: Feb 2008
Posts: 597

Original Poster
Blog Entries: 1

Rep: Reputation: 46
good question, I used ls -1 for many years but never notice anything bizzare.

Anyway, on the IBM machine,

I have

% /usr/bin/ls -1
07--Sb
07-Sb
07ASb
07SSb
A-
AA
AS
B--
B-A
B-B
C--A
C-AA
C-ZA
Z--Sb
Z-Sb
ZASb
ZSSb
a--Sb
a-Sb
aASb
aSSb

Seems IBM got it right.
 
Old 04-25-2015, 02:29 AM   #12
centguy
Member
 
Registered: Feb 2008
Posts: 597

Original Poster
Blog Entries: 1

Rep: Reputation: 46
astrogeek:
I just issue ls as without any customization.

I bet if you cut and paste my directory names in a file and run a script to make the directories, I hope you
see what I see.

So for IBM AIX gives the right sort.

All other Linux that I have tested all seem to (even on an IBM machine installed with Linux) give:

$ uname -a
Linux 2.6.32-279.el6.ppc64 #1 SMP Wed Jun 13 18:19:27 EDT 2012 ppc64 ppc64 ppc64 GNU/Linux
$ ls -1
07ASb
07-Sb
07--Sb
07SSb
A-
AA
aASb
AS
a-Sb
a--Sb
aSSb
B--
B-A
B-B
C--A
C-AA
C-ZA
ZASb
Z-Sb
Z--Sb
ZSSb
 
Old 04-25-2015, 02:54 AM   #13
centguy
Member
 
Registered: Feb 2008
Posts: 597

Original Poster
Blog Entries: 1

Rep: Reputation: 46
i issued /bin/ls -1 and so far none linux OSes gives the consistent expected result.
So I shall leave this to other to
show me how silly I have been or let the community fix this "bug", if there is any.

I repeat the test I have done:

$ uname -a
Linux centos61 2.6.32-131.0.15.el6.x86_64 #1 SMP Sat Nov 12 15:11:58 CST 2011 x86_64 x86_64 x86_64 GNU/Linux
$ /bin/ls -1
07ASb
07-Sb
07--Sb
07SSb
A-
AA
aASb
AS
a-Sb
a--Sb
aSSb
B--
B-A
B-B
C--A
C-AA
C-ZA
ZASb
Z-Sb
Z--Sb
ZSSb
 
Old 04-25-2015, 03:52 AM   #14
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,173

Rep: Reputation: 2647Reputation: 2647Reputation: 2647Reputation: 2647Reputation: 2647Reputation: 2647Reputation: 2647Reputation: 2647Reputation: 2647Reputation: 2647Reputation: 2647
The answer lies in the setting of the LC_COLLATE shell variable.
With LC_COLLATE=C
Code:
bash-4.3$ LC_COLLATE=C; ls -1
07--Sb
07-Sb
07ASb
07SSb
A-
AA
AS
B--
B-A
B-B
C--A
C-AA
C-ZA
Z--Sb
Z-Sb
ZASb
ZSSb
a--Sb
a-Sb
aASb
aSSb
With LC_COLLATE=en_US.utf8
Code:
bash-4.3$ LC_COLLATE=en_US.utf8; ls -1
07ASb
07-Sb
07--Sb
07SSb
A-
AA
aASb
AS
a-Sb
a--Sb
aSSb
B--
B-A
B-B
C--A
C-AA
C-ZA
t.sc
ZASb
Z-Sb
Z--Sb
ZSSb
 
1 members found this post helpful.
Old 04-25-2015, 08:46 PM   #15
centguy
Member
 
Registered: Feb 2008
Posts: 597

Original Poster
Blog Entries: 1

Rep: Reputation: 46
On my bash, I need an export.

But now, this raises a question, why does en_US.utf8 (seems a default) gives an order of "a" between 2 "A"s ?

Is there a locale that first list "a", then "A", then "b", then "B"? The "export LC_COLLATE=C"
put "a" all the way back to all capital letters, that's why it may be scientifically correct, but useless for
a common person, who prefers to see "a" and "A" files list close to one another. I won't use LC_COLLATE=C
for this reason but to bear the weird quirks I have seen in the first post. My theory is that
some operating systems
treat files with names "a" and "A" the same, that's why it is causing the sorting algorithm to produce results
that is hard to predict. I am interested to know who can provide an explanation to the sorting offered
by en_US.utf8.

One way to go about it is to write a "bash -1" command myself and then do a python to display the
order I want, rather to be utterly confused by the standard that may be understood by a handful few.

Now, the results on my system:

Note, this is not working.
Quote:
$ LC_COLLATE=C; /bin/ls -1
07ASb
07-Sb
07--Sb
07SSb
A-
AA
aASb
AS
a-Sb
a--Sb
aSSb
B--
B-A
B-B
C--A
C-AA
C-ZA
ZASb
Z-Sb
Z--Sb
ZSSb
Note: This is working.
Quote:
$ export LC_COLLATE=C; /bin/ls -1
07--Sb
07-Sb
07ASb
07SSb
A-
AA
AS
B--
B-A
B-B
C--A
C-AA
C-ZA
Z--Sb
Z-Sb
ZASb
ZSSb
a--Sb
a-Sb
aASb
aSSb
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Writing a file character by character with a bash builtin command (script). stf92 Linux - Newbie 4 06-30-2012 08:41 PM
Find and Replace character/special character from the file MyRelam Red Hat 8 05-21-2012 12:52 AM
Bash scripting: parsing a text file character-by-character Completely Clueless Programming 13 08-12-2009 09:07 AM
To know the function on checking whether a character is ascii or unicode character. murugesan Programming 2 01-23-2009 01:07 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 10:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration