LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   what character is stronger than the minus or the hyphen character ? (https://www.linuxquestions.org/questions/linux-software-2/what-character-is-stronger-than-the-minus-or-the-hyphen-character-4175540682/)

centguy 04-24-2015 11:10 PM

what character is stronger than the minus or the hyphen character ?
 
I use the naming convention to help to sort files in a directory.

Under "ls" command linux,
I have all the project directories in the following order:
Quote:

04.07_01-SProject
04.07:01-SProject
04.07+01-SProject
04.07.02-SProject
04.070-SProject
04.07a01-SProject
04.07A01-SProject
04.07=SProject
04.07-SProject
04.07-T02-SProject

The names are created to illustrate my point.

Now, I would really like to ask "ls" to
display a project called 04.07+01-SProject
created later than 04.07-SProject
got listed below 04.07-SProject, like

04.07-SProject
04.07+01-SProject

but under ls,

I will get
04.07+01-SProject
04.07-SProject

It is a bit complicated but it is crucial for me to keep track
of different projects in "chronological" order.

The ls put the strength of characters according to

_ : + . 012 abcz ABC Z = -


I think the solution to this is to use the weaker characters such
as _ : + rather than the strongest character - from the very beginning (the weakest one _ has been outlawed by
me to make files due to some reason). Or overwrite the alphabatical order controlled by linux
convention.

I would have like the precedence goes like

- = + . 0 12 abc AB C Z

Can I change the precedance at will ?

I hope my question did not confuse you. This is an academic exercise.

michaelk 04-24-2015 11:40 PM

Using the date as part of the name makes it easy to list files/directories in chronilogical order. I've done this for many years.

yearmonthday_data_file i.e 20150424-data_file
or
20150415-4.07-SProject
20150416-4.07+01-SProject

pan64 04-25-2015 12:09 AM

ls has a lot of different sorting options, like -t, -c or --sort

centguy 04-25-2015 12:44 AM

michaelk:

The restriction is I do not want to change the original directory names, which contains
large files. I have
input files in many other places that refer to the original directory. If I change it I have to replace
the directory in name in the input in many places.

[linux]$ ls -1
070-SProject
07ASProject
07-SProject
07--SProject
07SSProject
07-T02-SProjec

Now I do not understand why - should come between A and S. What is the logic? It broke my observation
in the first post.
I was hoping to use -- to make it stronger than -S in sorting.

pan64 04-25-2015 12:50 AM

You may have "invisible" characters inside filenames (try ls | od -xc to check it). Also you can try to choose another locale settings to modify sorting behavior.

centguy 04-25-2015 01:00 AM

No visible thing:

A-
AA
AS
B--
B-A
B-B
E-04.07ASb
E-04.07-Sb
E-04.07--Sb
E-04.07SSb

test it yourself if you may.

centguy 04-25-2015 01:38 AM

Tested on linux mint 14 and Centos 6. same thing.

pan64 04-25-2015 01:45 AM

I cannot see what you made. Please write all your commands as they were executed and the full result.

centguy 04-25-2015 02:03 AM

Even more bizzare results:

$ ls -1
07ASb
07-Sb
07--Sb
07SSb
A-
AA
aASb
AS
a-Sb
a--Sb
aSSb
B--
B-A
B-B
C--A
C-AA
C-ZA
ZASb
Z-Sb
Z--Sb
ZSSb

now "a" appears between an "A" and another "A".

astrogeek 04-25-2015 02:13 AM

Oops... think before typing...

My man page says it sorts alpha if no other sort options specified...

What does which ls return? (i.e., does your shell have a built-in with different behavior).

Also, many distros alias ls, is there an alias that changes the expected sort behavior?

centguy 04-25-2015 02:16 AM

good question, I used ls -1 for many years but never notice anything bizzare.

Anyway, on the IBM machine,

I have

% /usr/bin/ls -1
07--Sb
07-Sb
07ASb
07SSb
A-
AA
AS
B--
B-A
B-B
C--A
C-AA
C-ZA
Z--Sb
Z-Sb
ZASb
ZSSb
a--Sb
a-Sb
aASb
aSSb

Seems IBM got it right.

centguy 04-25-2015 02:29 AM

astrogeek:
I just issue ls as without any customization. :)

I bet if you cut and paste my directory names in a file and run a script to make the directories, I hope you
see what I see.

So for IBM AIX gives the right sort.

All other Linux that I have tested all seem to (even on an IBM machine installed with Linux) give:

$ uname -a
Linux 2.6.32-279.el6.ppc64 #1 SMP Wed Jun 13 18:19:27 EDT 2012 ppc64 ppc64 ppc64 GNU/Linux
$ ls -1
07ASb
07-Sb
07--Sb
07SSb
A-
AA
aASb
AS
a-Sb
a--Sb
aSSb
B--
B-A
B-B
C--A
C-AA
C-ZA
ZASb
Z-Sb
Z--Sb
ZSSb

centguy 04-25-2015 02:54 AM

i issued /bin/ls -1 and so far none linux OSes gives the consistent expected result.
So I shall leave this to other to
show me how silly I have been or let the community fix this "bug", if there is any.

I repeat the test I have done:

$ uname -a
Linux centos61 2.6.32-131.0.15.el6.x86_64 #1 SMP Sat Nov 12 15:11:58 CST 2011 x86_64 x86_64 x86_64 GNU/Linux
$ /bin/ls -1
07ASb
07-Sb
07--Sb
07SSb
A-
AA
aASb
AS
a-Sb
a--Sb
aSSb
B--
B-A
B-B
C--A
C-AA
C-ZA
ZASb
Z-Sb
Z--Sb
ZSSb

allend 04-25-2015 03:52 AM

The answer lies in the setting of the LC_COLLATE shell variable.
With LC_COLLATE=C
Code:

bash-4.3$ LC_COLLATE=C; ls -1
07--Sb
07-Sb
07ASb
07SSb
A-
AA
AS
B--
B-A
B-B
C--A
C-AA
C-ZA
Z--Sb
Z-Sb
ZASb
ZSSb
a--Sb
a-Sb
aASb
aSSb

With LC_COLLATE=en_US.utf8
Code:

bash-4.3$ LC_COLLATE=en_US.utf8; ls -1
07ASb
07-Sb
07--Sb
07SSb
A-
AA
aASb
AS
a-Sb
a--Sb
aSSb
B--
B-A
B-B
C--A
C-AA
C-ZA
t.sc
ZASb
Z-Sb
Z--Sb
ZSSb


centguy 04-25-2015 08:46 PM

On my bash, I need an export.

But now, this raises a question, why does en_US.utf8 (seems a default) gives an order of "a" between 2 "A"s ?

Is there a locale that first list "a", then "A", then "b", then "B"? The "export LC_COLLATE=C"
put "a" all the way back to all capital letters, that's why it may be scientifically correct, but useless for
a common person, who prefers to see "a" and "A" files list close to one another. I won't use LC_COLLATE=C
for this reason but to bear the weird quirks I have seen in the first post. My theory is that
some operating systems
treat files with names "a" and "A" the same, that's why it is causing the sorting algorithm to produce results
that is hard to predict. I am interested to know who can provide an explanation to the sorting offered
by en_US.utf8.

One way to go about it is to write a "bash -1" command myself and then do a python to display the
order I want, rather to be utterly confused by the standard that may be understood by a handful few.

Now, the results on my system:

Note, this is not working.
Quote:

$ LC_COLLATE=C; /bin/ls -1
07ASb
07-Sb
07--Sb
07SSb
A-
AA
aASb
AS
a-Sb
a--Sb
aSSb
B--
B-A
B-B
C--A
C-AA
C-ZA
ZASb
Z-Sb
Z--Sb
ZSSb
Note: This is working.
Quote:

$ export LC_COLLATE=C; /bin/ls -1
07--Sb
07-Sb
07ASb
07SSb
A-
AA
AS
B--
B-A
B-B
C--A
C-AA
C-ZA
Z--Sb
Z-Sb
ZASb
ZSSb
a--Sb
a-Sb
aASb
aSSb


All times are GMT -5. The time now is 05:33 AM.