LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Sorting files in BASH (https://www.linuxquestions.org/questions/linux-newbie-8/sorting-files-in-bash-407140/)

deleted/ 01-23-2006 07:03 PM

Sorting files in BASH
 
I've been trying to find an answer to this all night with no luck. Maybe i've been looking in the wrong place. Anyway is the a way in BASH to sort files by upper or lowercase letter.

For example, list all files beginning with an uppercase.

Any help would be appreciated.

IBall 01-23-2006 07:16 PM

See "man ls".

By default, ls sorts in alphabetical order ignoring the case of the file name.

To list all files starting with a capital letter, you need to use a regular expression, such as (I haven't tested this): "ls [A-Z]*"

I hope this helps
--Ian

muha 01-24-2006 07:21 AM

hmm, "ls [A-Z]*" does not work.
I'd think it'd be something like: ls --ignore [a-z]*
but that does not work as well ..

Dtsazza 01-24-2006 07:45 AM

I think you'd be better off sticking the output from ls into a pipe and letting another program sort it. There is, of course, the sort utility for sorting, but I can't see any concept of character case (from its man page at least), and you already have a case-insensitive sort. The first thing that comes to mind is grep - a simple pipe like
Code:

ls | grep ^[A-Z]
will acheive what you want. Apparently grep is pretty resource intensive, and I doubt you need its power for something like this, so there are probably better tools. sed and awk also spring to mind - though sed would basically be emulating grep as in
Code:

ls | sed -n 's/^[A-Z]/&/p'
and awk is probably even more intensive than grep.

Anyway, that's really just an aside as I don't think a standard desktop would notice the performance hit of grepping the output from ls. Though I'll admit it would be nicer to specify an argument to ls itself, so you can play around with columns (either manually or by specifying the -l argument) without having to worry about your output format. If it's important that you can do this, it shouldn't be too tough to write a script that looks at its input to work out what columns it has in what order, and sorts on the filename.

muha 01-24-2006 08:47 AM

@Dtsazza: that's weird, your first solution does the same as for me as: ls
The second one works as supposed though.
Have you tried that first one yourself?

deleted/ 01-24-2006 02:25 PM

Thanks for the input guys. Dtsazza the grep pipe is perfect cheers.

I suppose it will take a while to move from gui based thinking to shell style thinking.

Thanks again.

muha 01-25-2006 04:36 AM

can someone tell how i can find out why this grep command is not working for me:?
(just trying to learn something here ..)
Code:

$ ls
dref.txt  NEW.TXT

$ ls | grep ^[A-Z]
dref.txt
NEW.TXT

$ ls | grep ^[a-z]
dref.txt
NEW.TXT

$ ls | grep ^[0-9]
(gives nothing, so that works)

thanks! I copied it from the commandline, so it can't be typo's.

sohny 01-25-2006 05:07 AM

But those work perfetcly. Those commands worked for me properly in RedHAT 9 with bash

muha 01-25-2006 05:59 AM

yeah, that was why i was wondering why this is not working in my shell :?
cat /etc/passwd shows i'm running /bin/bash
I'm working under suse 10.0
If somebody knows how i can find out why
Code:

$ ls | grep ^[A-Z]
does not work for me (see post above), please let me know :)

Dtsazza 01-25-2006 08:15 AM

That is rather strange, and I tested the command as working before posting it (always good to check, if only for typos). Looking it your command, I can't see why it wouldn't work... it rather straightforward-ly says "OK, match the start of the line, followed by one of the characters A..Z".

As a test, can you try
Code:

$ ls | grep ^N
and
Code:

$ ls | grep [A-Z]
to see if it's the caret or the range that's causing the problem (both of these should just output NEW.TXT)? Also, it might sound obvious, but is NEW.TXT all uppercase ASCII characters? If you're using a Unicode shell and the filename was generated in some way other than you typing in it's name yourself, there's a chance the characters won't be in the range U0041-U005A ('normal' uppercase characters) but will look the same. Yep, it's unlikely, but then that grep really should be working...

muha 01-25-2006 08:47 AM

it's still weird, the minus in ^[A-Z] doesn't work properly.
Code:

$ ls -all
total 20
drwxr-xr-x  2 user users 168 2006-01-25 14:52 .
drwxr-xr-x  4 user users 248 2006-01-24 18:29 ..
-rw-r--r--  1 user users  8 2006-01-25 14:52 AAZ.TXT
-rw-r--r--  1 user users  26 2006-01-24 18:51 dref.txt
-rw-r--r--  1 user users  26 2006-01-25 14:52 dret.txt
-rw-r--r--  1 user users  8 2006-01-24 18:29 NEW.TXT
-rw-r--r--  1 user users  26 2006-01-25 14:52 xret.txt
$ ls | grep ^N
NEW.TXT
$ ls | grep [A-Z]
AAZ.TXT
dref.txt
dret.txt
NEW.TXT
xret.txt
$ ls | grep ^[A-Z]
AAZ.TXT
dref.txt
dret.txt
NEW.TXT
xret.txt
$ ls | grep ^[N]
NEW.TXT
$ ls | grep ^[NZ]
NEW.TXT
$ ls | grep ^[N-Z]
NEW.TXT
xret.txt
$ echo $SHELL
/bin/bash
$ which bash
/bin/bash

whereas with sed it does work:
Code:

$ ls | sed -n 's/^[A-Z]/&/p'
AAZ.TXT
NEW.TXT
$ ls | sed -n 's/^[a-z]/&/p'
dref.txt
dret.txt
xret.txt


Dtsazza 01-25-2006 01:50 PM

That's really weird. My first thought was that maybe it wasn't recognising the dash as a special character - but it is more than just a literal, because of the difference between 'grep ^[NZ]' and 'grep ^[N-Z]'. It seems that it's interpreting it as a case-insensitive range.

My thoughts are that perhaps your shell's locale is defining some kind of default search order that either makes all ranges case-insensitive, or somehow inserts a-z between A and Z (less likely). But then, it's weird that sed doesn't use the same information to process its own regexes. Besides, I'm no good at locales, so moving swiftly on...

I think a
Code:

grep -V
could be interesting at this point.

muha 01-25-2006 02:29 PM

thanks for thinking along. I'm still just trying to make sense of linux so ..
I have no real problem atm but am just trying to learn.

Code:

$ grep -V
grep (GNU grep) 2.5.1

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

according to 'man setlocale', LC_COLLATE="en_US.UTF-8" and LC_CTYPE="en_US.UTF-8"
are the ones defing regexp. But they look pretty standard to me. Maybe LC_ALL= should also be defined :?

/edit: ahhh! It is indeed LC_ALL which should be set to LC_ALL=C
as explained here: http://www.linuxquestions.org/questi...hreadid=327916

/edit: to change this setting to LC_ALL=C do
Code:

$ export LC_ALL='C'

$ ls -all
total 20
drwxr-xr-x  3 user users 200 Jan 25 21:21 .
drwxr-xr-x  4 user users 248 Jan 24 18:29 ..
-rw-r--r--  1 user users  8 Jan 25 20:59 AAZ.TXT
-rw-r--r--  1 user users  8 Jan 25 20:59 NEW.TXT
-rw-r--r--  1 user users  26 Jan 25 20:59 dref.txt
-rw-r--r--  1 user users  26 Jan 25 20:59 dret.txt
drwxr-xr-x  2 user users 176 Jan 25 20:59 new
-rw-r--r--  1 user users  26 Jan 25 20:59 x_aet.txt
$ ls |grep ^[A-Z]
AAZ.TXT
NEW.TXT

Thanks to Dtsazza for the hints! Funny to see that the behaviour of ls -all has changed accordingly to the LC_ALL setting.
And also that suse 10.0 has LC_ALL= not set to C as standard for me.

Dtsazza 01-25-2006 04:36 PM

Wow... I didn't know that myself, so thanks for taking my vague hints and making them into something workable! I've got the exact same version of grep so it really must be something else. And my LC_ALL variable is also unset, so it wasn't that.

Still, if we've got something that sorts in 'ls' itself, that's much better than piping to grep, both from a resources point of view and so you can play around with ls switches and still get what you want.

muha 01-26-2006 03:35 AM

Besides LC_ALL it can be other 'locale' settings as well: from man grep:
Quote:

A locale LC_foo is specified by examining the three environment variables LC_ALL, LC_foo, LANG, in that order. The first of these variables that is set specifies the locale. For example, if LC_ALL is not set, but LC_MESSAGES is set to pt_BR, then Brazilian Portuguese is used for the LC_MESSAGES locale. The C locale is used if none of these environment variables are set, or if the locale catalog is not installed, or if grep was not compiled with national language support (NLS).
Pretty complex with all the or, or, ors :p


All times are GMT -5. The time now is 03:43 PM.