LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Bash - Testing directories and Files where one folder is unknown? (https://www.linuxquestions.org/questions/linux-server-73/bash-testing-directories-and-files-where-one-folder-is-unknown-4175459431/)

David the H. 05-07-2013 11:51 AM

Ok, now we're getting somewhere. I'd still like to have a little more detail about the context of your script; the exact matching criteria you want to use, and what you intend to do with the matches, but at least we can work with this.

Since you do indeed expect to test multiple entries, yes, a loop is what you need. But do not use a for loop if you intend to use an external command like find.

Code:

while IFS='' read -d '' -r dname; do
    echo "$dname"
done < <( find . -type d -iname '*xyz*' -print0 )

This will print out all directories containing the string 'xyz' in or under the current directory. Of course the use of echo here is just for demo, since find could just do the printing itself. Replace it with whatever actions you want to take.

Using -print0 (null separators) and the corresponding settings in read, makes it possible to safely handle all file names.

Notice also the syntax of find. You need to give it one or more starting directories, followed by your matching options, and finally one or more actions to perform on them (-print is the default). The -name/-iname options use globbing patterns, so you have to specify something that will match the entire file name (and don't forget to quote it to protect it from shell expansion).

The input is fed into the loop with a bash/ksh style process substitution, so it's not posix portable. There are more portable ways do handle it if needed.

On the other hand, a for loop is just fine if you use a simple globbing pattern. If you know what directory level to search and don't need to do recursive searching, then this is probably what you really want.

Code:

shopt -s dotglob nullglob

for name in * ; do

    if [[ -d $name ]]; then
        echo "$name is a directory"
    elif [[ -f $name ]]; then
        echo "$name is a regular file"
    else
        echo "$name is a special file of some kind"
    fi

done

dotglob turns on matching for hidden files, and nullglob keeps it from using the raw string if nothing is expanded.

Actually, you could use '*/' as the globbing pattern to expand directories only and skip the other file types.

And again, you'll still have to decide exactly what to do with whatever it detects. You could add them to arrays, for example, for later use.

Code:

if [[ -d $name ]]; then
    dirarray+=( "$name" )
...

This may be useful if, as you seem to be saying, you want to use it to limit the search paths of a subsequent find command.

Code:

find "${dirarray[@]}" <searchoptions> <actions>
But if that's the case, I'm not convinced that this whole exercise is all that worthwhile. If you use find properly you'd probably find it just as efficient on its own. Check out the -prune action in particular to eliminate directory trees that don't need to be searched.

You can also use a globbing pattern directly in the startdir part of find, BTW, as long as it would expand into a list of directories.

Here are a couple of good links about using find:
http://mywiki.wooledge.org/UsingFind
http://www.grymoire.com/Unix/Find.html


Quote:

I have a much better bash project that might be more interesting an useful if you would like to help with that?
Just write it up in its own thread and I'm sure I'll come across it. The other regulars will certainly help out too, if they can.

hyperdaz 05-08-2013 07:12 AM

Hi David,

Many thanks for your time and the detailed reply, the information looks very useful to myself and many others that might glance this post.

I have 40 for loops so at some time will rewrite them to see how much performance difference I might gain from using while loops instead. It's more of a habit than anything else.

I certainlly will post the other project if I don't find what I am seeking (doing a little research before posting :)

Cheers
Hdaz

David the H. 05-10-2013 08:02 AM

The use of while vs for loops isn't really a performance issue, but comes from the fact that they have different functions.

A for loop iterates over a fixed list of individual word tokens, whereas a while loop runs for as long as some condition is true. When the while loop is combined with read it can be used to parse arbitrary input text, from both files and other commands.

The trouble usually comes from trying to use a for loop on command and variable substitutions, as shown in the link I provided. As long as the expansion results in a simple word list it's not a big problem, unless the list is very large, but trying to use it on things like filenames is very risky, due to the shell word-splitting and pathname expansion operations that follow the substitution.

for loops are, however, recommended for use on file names generated by direct globbing expansion. The risk only comes when the list is generated indirectly by another command.

It's therefore a good idea to simply remember to use while+read loops when the input comes from a text file or command, and for loops on globbing, arrays, brace expansion and other lists of simple elements.


The things you should really focus on if you want to reduce script overhead are to:

1) Eliminate as many external command calls as possible, usually by using built-in string manipulations, instead. Also, learn to run a command once and save its output for future use, instead of calling it over and over every time you need it (date often tends to be abused that way).

As a rule of thumb, bulk operations that have to scan large amounts of text are often better handled by an external tool like sed or awk, but once a text string is stored in a variable, it's usually better to use built-in shell operations on it.

2) Design your code flow to eliminate as many redundant operations as possible. e.g. use a single case statement instead of a series of if..elif..else tests, and printf instead of print loops. If your script has 40+ loops, I imagine you can probably combine the operations of at least some of them together.


You should also consider creating functions for often-called operations. This may or may not save on redundancy, but it can at the very least make the code cleaner.


All times are GMT -5. The time now is 01:43 AM.