[SOLVED] Don't use 'ls' in shell scripts... why?

bartonski · 04-13-2012, 10:44 PM

Long long ago in a thread far away, I remember one of the LQ gurus recommending that 'ls' should not be used for programmatic purposes. There were several justifications given.

I've used this as a rule of thumb ever since... but I don't remember why...

I can see it in a case like this, where launching 'ls' would spawn a separate process...

Code:

for i in *.txt
do
    echo $i
done

I think that the argument was more along the lines that the output of 'ls' is unstable in some way.

Am I imagining this?

Asido · 04-13-2012, 11:36 PM

You never should parse `ls` output if you are not the only who is going to use the script. People make aliases in example with `-F` flag which appends extra characters. Your script will simply be broken.

grail · 04-14-2012, 12:24 AM

See if this helps:

http://mywiki.wooledge.org/ParsingLs

bartonski · 04-14-2012, 11:07 AM

Quote:

Originally Posted by grail

See if this helps:

http://mywiki.wooledge.org/ParsingLs

The link was exactly what I needed.

Funny thing is that I knew 90% of what was there... I just couldn't articulate it. Good to have it one place.

catkin · 04-14-2012, 01:11 PM

Quote:

Originally Posted by Asido

You never should parse `ls` output if you are not the only who is going to use the script. People make aliases in example with `-F` flag which appends extra characters. Your script will simply be broken.

But aliases are disabled by default in bash scripts. From the GNU bash reference section on aliases: "Aliases are not expanded when the shell is not interactive, unless the expand_aliases shell option is set using shopt". That doesn't mean it's OK to parse ls output; it just means aliases are not a reason for avoiding it in scripts.

theNbomr · 04-14-2012, 02:13 PM

My take on the matter is that it relies on the consistent behavior, over time, of another application. As soon as that behavior changes, which happens often enough, then your script breaks. Add to that, that it is simply wasteful, when bash already has the facility to access the filesystem and it's structure. Why add one or more extra steps that, at best, serve only to get in the way?

--- rod.

bartonski · 04-14-2012, 03:27 PM

Quote:

Originally Posted by theNbomr

My take on the matter is that it relies on the consistent behavior, over time, of another application. As soon as that behavior changes, which happens often enough, then your script breaks. Add to that, that it is simply wasteful, when bash already has the facility to access the filesystem and it's structure. Why add one or more extra steps that, at best, serve only to get in the way?

--- rod.

Well... there are times when glob expansion won't do the trick in terms of finding files... sometimes you have to resort to using 'find'. I agree that the overlap between shell globs and what you can do with 'ls' is pretty large, and in those cases, you should simply use the glob. In the case of 'ls -r -d' (recursively list directory entries), the case for using bash alone isn't nearly as clear cut... that's where you reach for 'find', because 'ls' isn't up to the job.

In terms of having the behaviour of the program change... that's why there are standards. Sticking with standards compliant behaviour should ensure that you get the same thing every time.

grail · 04-15-2012, 06:28 AM

I think I understand where you are coming from but maybe next time you should pick a better example:

Code:

$ ls -r -d
.

I am pretty sure bash could handle this one

ta0kira · 04-15-2012, 10:39 AM

I don't think differences in versions are the main reason; if that was the case, people would recommend against sed, find, ps, tar, etc. The two main reasons I can think of are:

ls doesn't expand *; the shell does, then it passes the names to ls, which echoes those that exist on one line at a time (when stdout isn't a tty.) If you for file in *.txt, though, you'll iterate once with "*.txt" if there aren't any matching files. And if files have spaces in the names, those names will get split. So it's almost like you get some sort of "validation" by using ls *.txt | while read file.
ls treats directories and files differently by default, which can ruin things, but the -d option helps.

Kevin Barry

PS In case it wasn't clear, this is one vote for "it's fine to use ls in a script." ls -l is a different story, though.

bartonski · 04-15-2012, 11:17 AM

Quote:

Originally Posted by grail

I think I understand where you are coming from but maybe next time you should pick a better example:

Code:

$ ls -r -d
.

I am pretty sure bash could handle this one

Ok, I'm missing something... how does bash handle directory recursion natively?

bartonski · 04-15-2012, 11:28 AM

Quote:

Originally Posted by ta0kira

PS In case it wasn't clear, this is one vote for "it's fine to use ls in a script." ls -l is a different story, though.

Hm. That's an interesting take... I'll have to think about that.

grail · 04-15-2012, 11:52 AM

Quote:

Ok, I'm missing something... how does bash handle directory recursion natively?

My point is not that bash handles recursion (although you can write a recursive function??), but rather that the demonstrated output of the dot directory by the command you supplied
is trivial for bash to provide.

I also agree in part with ta0kira that using ls on its own may not necessarily be dangerous, assuming that the following portion:

Quote:

And if files have spaces in the names, those names will get split.

Is aimed at ls in the for loop scenario and not globbing. That being said, for the things that can go wrong with different switches being used, and not just -l, that I personally
try to steer clear of using it at all.

tuxdev · 04-15-2012, 04:18 PM

Quote:

I don't think differences in versions are the main reason; if that was the case, people would recommend against sed, find, ps, tar, etc.

It's really quite a huge reason, though. sed, find, and tar have POSIX mandated behavior (though tar has some weirdities). ps is actually not recommended for largely the same reasons as ls: it's meant for *humans*, and completely inappropriate for scripts. psgrep is the script-safe way of going about most things you would want to use ps for.

Nominal Animal · 04-15-2012, 05:44 PM

Quote:

Originally Posted by ta0kira

this is one vote for "it's fine to use ls in a script."

Unless, of course, you have file names with newlines in them, in which case you only get partial file names if you parse the output.

Weird file names are nearly not as rare as one might think. You can create one very easily by accident, for example typing

Code:

touch Isn't it cool?
touch Yes, it's cool.

which gives you two files, one named

Code:

Isnt it cool?
touch Yes, its

and one named

Code:

cool.

A Bash loop,

Code:

shopt -s nullglob
for FILE in * ; do
    ...
done

will handle all names correctly. So does

Code:

shopt -s nullglob
FILESDIRS=(*)
DIRSONLY=(*/)

which collects all files and directory names in current directory (except, by default, those that start with a dot) into array FILESDIRS, and only directory names into array DIRSONLY.

The shopt -s nullglob tells Bash that non-matching glob patterns expand to nothing; the default is to expand to the pattern itself. It is enough (and a good practice) to set it once at the start of the script. (To be honest, I always forget it, and end up testing if FILE exists within the loop..)

I normally use

Code:

LANG=C LC_ALL=C
find DIR(s)... -type f -print0 | while IFS="" read -rd '' FILE ; do
    ...
done

myself, or in fact, the -printf '...%p\0' variant, which allows me to extract not only the file names (no matter what characters they might have), but also file timestamp, size, and/or access mode, at the same time, with just Bash string operations.
It is the safest and most robust way I know of. If you need to compute something and yield it to the script outside the loop, you should use

Code:

LANG=C LC_ALL=C
while IFS="" read -rd '' FILE ; do
    ...
done < <(find DIR(s)... -type f -print0)

because in the former example, the while loop is a subshell (right side of a pipe), and thus any changes it makes are never propagated to the parent, the actual shell running the script. This latter form creates or uses a (temporary) pipe to supply the input to the while loop. As the while loop is run in the original shell, not a subshell, the changes it makes to Bash variables are visible outside the loop too.

While you can technically do a recursive directory walk in Bash, it is nigh impossible to do safely, because parent directory names may change mid-walk. find won't get confused by that; you only may see file names that are no longer there.

Note: I haven't used the Bash 4 **/ notation David the H. mentions below. It should work just as well as find does I think, as long as you set the proper Bash shell options.

David the H. · 04-16-2012, 01:12 AM

Bash from version 4 up has a new globstar feature for recursive globbing.

Code:

shopt -s globstar	#it's not enabled by default.

printf '%s\n' **	#lists all files and directories recursively.

printf '%s\n' **/	#lists directories only.

printf '%s\n' **/*.txt	#lists all .txt files recursively.

So for the most part, just prefix your glob with "**/" to make it recursive. You may need to play with dotglob and the GLOBIGNORE variable if you intend to work with hidden files.

Of course you'll still need to use find for more advanced matches, such as by mtime.