ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Long long ago in a thread far away, I remember one of the LQ gurus recommending that 'ls' should not be used for programmatic purposes. There were several justifications given.
I've used this as a rule of thumb ever since... but I don't remember why...
I can see it in a case like this, where launching 'ls' would spawn a separate process...
Code:
for i in *.txt
do
echo $i
done
I think that the argument was more along the lines that the output of 'ls' is unstable in some way.
You never should parse `ls` output if you are not the only who is going to use the script. People make aliases in example with `-F` flag which appends extra characters. Your script will simply be broken.
You never should parse `ls` output if you are not the only who is going to use the script. People make aliases in example with `-F` flag which appends extra characters. Your script will simply be broken.
But aliases are disabled by default in bash scripts. From the GNU bash reference section on aliases: "Aliases are not expanded when the shell is not interactive, unless the expand_aliases shell option is set using shopt". That doesn't mean it's OK to parse ls output; it just means aliases are not a reason for avoiding it in scripts.
My take on the matter is that it relies on the consistent behavior, over time, of another application. As soon as that behavior changes, which happens often enough, then your script breaks. Add to that, that it is simply wasteful, when bash already has the facility to access the filesystem and it's structure. Why add one or more extra steps that, at best, serve only to get in the way?
My take on the matter is that it relies on the consistent behavior, over time, of another application. As soon as that behavior changes, which happens often enough, then your script breaks. Add to that, that it is simply wasteful, when bash already has the facility to access the filesystem and it's structure. Why add one or more extra steps that, at best, serve only to get in the way?
--- rod.
Well... there are times when glob expansion won't do the trick in terms of finding files... sometimes you have to resort to using 'find'. I agree that the overlap between shell globs and what you can do with 'ls' is pretty large, and in those cases, you should simply use the glob. In the case of 'ls -r -d' (recursively list directory entries), the case for using bash alone isn't nearly as clear cut... that's where you reach for 'find', because 'ls' isn't up to the job.
In terms of having the behaviour of the program change... that's why there are standards. Sticking with standards compliant behaviour should ensure that you get the same thing every time.
I don't think differences in versions are the main reason; if that was the case, people would recommend against sed, find, ps, tar, etc. The two main reasons I can think of are:
ls doesn't expand *; the shell does, then it passes the names to ls, which echoes those that exist on one line at a time (when stdout isn't a tty.) If you for file in *.txt, though, you'll iterate once with "*.txt" if there aren't any matching files. And if files have spaces in the names, those names will get split. So it's almost like you get some sort of "validation" by using ls *.txt | while read file.
ls treats directories and files differently by default, which can ruin things, but the -d option helps.
Kevin Barry
PS In case it wasn't clear, this is one vote for "it's fine to use ls in a script." ls -l is a different story, though.
Ok, I'm missing something... how does bash handle directory recursion natively?
My point is not that bash handles recursion (although you can write a recursive function??), but rather that the demonstrated output of the dot directory by the command you supplied
is trivial for bash to provide.
I also agree in part with ta0kira that using ls on its own may not necessarily be dangerous, assuming that the following portion:
Quote:
And if files have spaces in the names, those names will get split.
Is aimed at ls in the for loop scenario and not globbing. That being said, for the things that can go wrong with different switches being used, and not just -l, that I personally
try to steer clear of using it at all.
I don't think differences in versions are the main reason; if that was the case, people would recommend against sed, find, ps, tar, etc.
It's really quite a huge reason, though. sed, find, and tar have POSIX mandated behavior (though tar has some weirdities). ps is actually not recommended for largely the same reasons as ls: it's meant for *humans*, and completely inappropriate for scripts. psgrep is the script-safe way of going about most things you would want to use ps for.
this is one vote for "it's fine to use ls in a script."
Unless, of course, you have file names with newlines in them, in which case you only get partial file names if you parse the output.
Weird file names are nearly not as rare as one might think. You can create one very easily by accident, for example typing
Code:
touch Isn't it cool?
touch Yes, it's cool.
which gives you two files, one named
Code:
Isnt it cool?
touch Yes, its
and one named
Code:
cool.
A Bash loop,
Code:
shopt -s nullglob
for FILE in * ; do
...
done
will handle all names correctly. So does
Code:
shopt -s nullglob
FILESDIRS=(*)
DIRSONLY=(*/)
which collects all files and directory names in current directory (except, by default, those that start with a dot) into array FILESDIRS, and only directory names into array DIRSONLY.
The shopt -s nullglob tells Bash that non-matching glob patterns expand to nothing; the default is to expand to the pattern itself. It is enough (and a good practice) to set it once at the start of the script. (To be honest, I always forget it, and end up testing if FILE exists within the loop..)
I normally use
Code:
LANG=C LC_ALL=C
find DIR(s)... -type f -print0 | while IFS="" read -rd '' FILE ; do
...
done
myself, or in fact, the -printf '...%p\0' variant, which allows me to extract not only the file names (no matter what characters they might have), but also file timestamp, size, and/or access mode, at the same time, with just Bash string operations.
It is the safest and most robust way I know of. If you need to compute something and yield it to the script outside the loop, you should use
Code:
LANG=C LC_ALL=C
while IFS="" read -rd '' FILE ; do
...
done < <(find DIR(s)... -type f -print0)
because in the former example, the while loop is a subshell (right side of a pipe), and thus any changes it makes are never propagated to the parent, the actual shell running the script. This latter form creates or uses a (temporary) pipe to supply the input to the while loop. As the while loop is run in the original shell, not a subshell, the changes it makes to Bash variables are visible outside the loop too.
While you can technically do a recursive directory walk in Bash, it is nigh impossible to do safely, because parent directory names may change mid-walk. find won't get confused by that; you only may see file names that are no longer there.
Note: I haven't used the Bash 4 **/ notation David the H. mentions below. It should work just as well as find does I think, as long as you set the proper Bash shell options.
Last edited by Nominal Animal; 04-16-2012 at 01:48 AM.
Reason: Need 'while IFS="" read' ... to avoid a Bash bug.
Bash from version 4 up has a new globstar feature for recursive globbing.
Code:
shopt -s globstar #it's not enabled by default.
printf '%s\n' ** #lists all files and directories recursively.
printf '%s\n' **/ #lists directories only.
printf '%s\n' **/*.txt #lists all .txt files recursively.
So for the most part, just prefix your glob with "**/" to make it recursive. You may need to play with dotglob and the GLOBIGNORE variable if you intend to work with hidden files.
Of course you'll still need to use find for more advanced matches, such as by mtime.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.