LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Bash - Testing directories and Files where one folder is unknown? (https://www.linuxquestions.org/questions/linux-server-73/bash-testing-directories-and-files-where-one-folder-is-unknown-4175459431/)

hyperdaz 04-24-2013 09:22 AM

Bash - Testing directories and Files where one folder is unknown?
 
I want to do the following testing where I will not know one of the folders

[[ -f /folder/*/folder3/file.txt ]] && ehco "success"
[[ -d /folder/*/folder4/folder5 ]] && echo "success"

The above fails so I tried to escape the *

[[ -f /folder/\*/folder3/file.txt ]] && ehco "success"

but this fails aslo..

the reasion to do this is to speed up a script, where the echo "success" are replaced with find commands.

Whats the best way to do this?

Cheers
Hdaz

unSpawn 04-24-2013 05:23 PM

Definitely not optimized but why don't you 'find /folder -type d -name folder3' and use that output to "anchor" your test use 'find /folder | grep -q "folder3/file.txt' && doSomething'?

hyperdaz 04-25-2013 04:25 AM

Hi unSpawn,

Thanks for your reply, sure I can do that but it seems a little OTT and painful to do this just to test the condition.

The script i am trying to speed up could have a lot of these within so it becomes painful working out each any every folder structure just to test for a file or folder.

This is 90% there hmmm
[[ -d `/folder/*/folder3/` ]] && ehco "success"
-bash: /folder/folder2/folder3/: is a directory


which would be the correct result as * is not a directory...
[[ -d `/folder/\*/folder3/` ]] && ehco "success"
-bash: /folder/*/folder3/: No such file or directory

If this works why does the 90% version above not hmm
[[ -n `ls -lrth /folder/*/folder3/` ]] && echo "success"
success


Just to answer my own questions hmm
cat folder.sh
#!/usr/bin/env bash
[[ -d `/folder//.*/folder3/` ]] && ehco "success"

bash -x folder.sh
++ '/folder//.*/folder3/'
folder.sh: line 2: /folder//.*/folder3/: No such file or directory
+ [[ -d '' ]]

hdaz

unSpawn 04-25-2013 03:24 PM

Quote:

Originally Posted by hyperdaz (Post 4938517)
it seems a little OTT and painful to do this just to test the condition.

Sure, I never said it was anything other than crappy ;-p


Quote:

Originally Posted by hyperdaz (Post 4938517)
The script i am trying to speed up could have a lot of these within so

Sounds more like restructuring is in order...


Quote:

Originally Posted by hyperdaz (Post 4938517)
This is 90% there hmmm
Code:

[[ -d `/folder/*/folder3/` ]] && ehco "success"
-bash: /folder/folder2/folder3/: is a directory


Don't confuse backticks with single quotes (which you don't need either) and mind your typos:
Code:

[ -d /folder/*/folder3 ] && echo "OK"
or maybe use
Code:

readlink -f /folder/*/folder3 && echo "OK"
or
Code:

[ -d $(readlink -f /folder/*/folder3) ] && echo "OK"

hyperdaz 04-25-2013 04:01 PM

Thanks for the reply unSpawn...

I was just testing and talking out aloud with all the different attempts..

The last one does it nicely :) don't think I have every used readlink before interesting.. :)

wonders why does this fail in double [[ ?

[[ -d $(readlink -f /folder/*/folder3) ]] && echo "OK"
[ -d $(readlink -f /folder/*/folder3) ] && echo "OK"
OK

Have a good evening
Cheers
hdaz

unSpawn 04-26-2013 01:32 AM

Quote:

Originally Posted by hyperdaz (Post 4938885)
The last one does it nicely :) don't think I have every used readlink before interesting.. :)

There's common things you should not need to like for example:
cat /file|grep == grep /file
cat /file|sed > /somefile == sed -i /file
ps|grep something == pgrep something
etc, etc.

Script more and you'll find out more.


Quote:

Originally Posted by hyperdaz (Post 4938885)
wonders why does this fail in double [[ ?

Let's hope one of our resident Bash evangelists like David The H. will be here RSN to answer that. In the meanwhile maybe have a look at some Bash scripting guides:
http://www.tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html
http://www.tldp.org/LDP/Bash-Beginne...tml/index.html
http://www.gnu.org/software/bash/man...ode/index.html
http://www.grymoire.com/Unix/Sh.html
http://www.tldp.org/LDP/abs/html/
http://mywiki.wooledge.org/BashFAQ
http://mywiki.wooledge.org/BashPitfalls

hyperdaz 04-26-2013 03:37 AM

Hey UnSapwn,

Unfortunitly after a little testing [ -d $(readlink -f /folder/*/folder3) ] && echo "OK" does not work and here is why...

ls -lrth /folder/folder2/folder3
total 0
-rw-r--r-- 1 root root 0 Apr 26 09:04 file

[ -f $(readlink -f /folder/*/folder3/file) ] && echo "OK"
OK
[ -f $(readlink -f /folder/*/folder3/fie) ] && echo "OK"
OK
[ -d $(readlink -f /folder/*/foler3) ] && echo "OK"
OK

bash -x foldertest.sh
++ readlink -f /folder/folder2/folder3
+ '[' -d /folder/folder2/folder3 ']'
+ echo OK
OK

[root@hostingtest ~]# bash -x foldertest.sh
++ readlink -f '/folder/*/foder3'
+ '[' -d ']'
+ echo OK
OK

[ -d $(readlink -ve /folder/*/foler3) ] && echo "OK"
readlink: /folder/*/foler3: No such file or directory
OK

[ -d $(readlink -vf /folder/*/foler3) ] && echo "OK"
readlink: /folder/*/foler3: No such file or directory
OK

Close but not quite...

unSpawn 04-27-2013 03:45 AM

Yeah, my bad, that's because it simply takes the exit value of 'readlink'.
Code:

# The problem with this ITEM is that you DO NOT want to have the script expand it unless necessary.
# Using single quotes keeps BASH from expanding it:
ITEM='directory/*/directory3'

# ... at the expense of violating the rule to avoid 'eval'. The problem is the expanded ITEM can
# potentially hold MULTIPLE values (as in 'mkdir -p directory/{1,2,3}/directory{1,2,3}
#  ; mkdir -p directory/dir\ {1,2,3}/directory{1,2,3}')
# further hampered by IFS splitting.
ITEM=($(eval echo "${ITEM}"))
echo "${#ITEM[@]}"
9

# So the correct way IMHO still would be to anchor it
find directory -type d -name directory3 | while read ITEM; do
 doSomething "${ITEM}"
done

# Then again one of our true BASH evangelists may know a better solution...

I think the essential problem with your script is that it needs structural change and I suggest you post it in full.

hyperdaz 04-27-2013 06:01 AM

Hey unSpawn,

I just re-read my own posts and realised I answered my own question doh :)

Just use [[ -n $(ls -lrth /folder/*/folder3/) ]] && echo "success" which works correctly when the folder does not exist.

Maybe this should be an improvement within the handing of bash for [[ -d or -f ]] directives...
unless we are just doing something wrong..

I agree the script could probably could do with restructuring but there are reasons why I wrote it the way I did, I wanted it as concise and compact as possible.... hence looking at commands that would one speed things up and two keep things compact..

unfortunately the script contains a lot of sensitive data so its not really possible to upload it here.

hdaz

David the H. 04-29-2013 11:00 AM

Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques.

I think we need to start by getting a little more info.

First, are you sure you're using bash? What does the shebang on the top line look like?

If it says #!/bin/sh, then you're using the system's posix-based shell, which may be something like dash instead. That could be why you aren't able to use things like [[..]]. Always use #!/bin/bash if you want support for bash-specific built ins.

See here for the main differences between bash and posix shells:
http://mywiki.wooledge.org/Bashism

Actually, I see you using things like "bash -x foldertest.sh". That does specify the script as bash, but it's such a sloppy way to do it. As long as the scripts have proper shebangs, just run them directly with "/path/to/foldertest.sh" and similar.


Second, what do the actual filenames/paths look like? Could you show us an example directory tree and explain more clearly what needs to be tested about it?


The first problem, as UnSpawn mentioned above, is that a globbing pattern of "*/" will expand into a list of all subdirectories inside the one specified. So using "/folder/*/folder3/file" could end up giving you:

/folder/folder2a/folder3/file
/folder/folder2b/folder3/file
/folder/folder2c/folder3/file

You can't run a single test on output like that. Perhaps you could capture the results of the glob expansion into an array, then loop over that to test for possible values. But again, it's going to depend on the details of what you have and what you want.


From what's been posted so far, I agree that the whole code flow probably needs re-writing, particularly since you say that it "contains a lot of sensitive data". A script has any kind of data hard-coded into it all (outside of perhaps a few default variable values at the top) is by definition a poorly written script.

I suggest you go through the script to redact (replace with dummy values) the sensitive parts and then post what you can. We need to see more of the code in context before we can really get down to fixing it.


PS: I see a lot of typos in the above posts; 'foder' instead of 'folder', etc. You'll have to tell us whether this is important to the script or just mistakes in explaining it here.

hyperdaz 04-29-2013 12:06 PM

Dave, Many thanks for the reply sorry you find my output a little sloppy..

Quote:

PS: I see a lot of typos in the above posts; 'foder' instead of 'folder', etc. You'll have to tell us whether this is important to the script or just mistakes in explaining it here.
These are not typos they are showing that if the "*/" is used on
Code:

[ -d $(readlink -f /folder/*/folder3) ] && echo "OK"
the code always gives a positive "OK" even if the folders are not created, maybe I could of made that clearer, I thought showing the full "bash -x" was quite clear.


My original thought and question was why following fails if the files and folders exists

Code:

[[ -f /folder/*/folder3/file.txt ]] && ehco "success"
[[ -d /folder/*/folder4/folder5 ]] && echo "success"

To me it seems logical that the file directive
Code:

  [[ -f
and the directory directive
Code:

[[ -d
test conditions should be able to expand and test with a folder structure simular to the following "folder/*/folder3/".

As can be seen on this attempt bash knows the files structure but still fails i.e. the continued commands in this case "ehco "success"" never gets run.

Code:

[[ -d `/folder/*/folder3/` ]] && ehco "success"
-bash: /folder/folder2/folder3/: is a directory

I can use the following this does gives the correct results if the file/folder exist or does not exist which was the original objective with -f and -d
Code:

[[ -n $(ls /folder/*/folder3/) ]] && echo "success"
To me its a simple question adding detailed script to the question would probably make the question more complicated.

Cheers
Hdaz

chrism01 04-29-2013 09:21 PM

Code:

[[ -d `/folder/*/folder3/` ]] && ehco "success"
1. you've used backquotes there, so its trying to interpret that as a cmd(!). Use single quotes instead
2. there's no such cmd as 'ehco', so that'll fail always.
You must ensure exact correct spelling at all times.

hyperdaz 04-30-2013 05:23 AM

Hi Chris,

none of these makes any diffence the second command never gets run.

Code:

ls -lrth /folder/folder2/folder3/file
-rw-r--r-- 1 root root 0 Apr 26 09:04 /folder/folder2/folder3/file

[[ -d '/folder/*/folder3/' ]] && echo "success"

[[ -d `/folder/*/folder3/` ]] && echo "success"
-bash: /folder/folder2/folder3/: is a directory

[[ -d "/folder/*/folder3/" ]] && echo "success"

If it ever got to ehco then it would output "-bash: ehco: command not found" but thanks I did not spot that on the original test above.

Cheers
Hdaz

David the H. 05-05-2013 07:21 AM

Code:

[[ -d '/folder/*/folder3/' ]] && echo "success"
It doesn't really matter what you use here because globbing patterns do not expand inside [[..]] brackets. You're always going to get a negative result because it will try to test the literal string, and there is no directory named "*" in that position. You can use command substitutions, but that just means you have to be sure that it will expand into a string that the test can recognize anyway.

In short you must have an actual, fixed, directory path before you can test for a directory.


In any case, you didn't address any of the important points from my last post.

1) Since you're using double-bracket tests, I'm assuming the availability of bash or ksh, is that correct? Or does this have to be a posix-compatible script, limiting the availability to its less-advanced features?

2) Can you guarantee that the globbing pattern will always expand into a single file or directory? If not, and I maintain that you can't, you'll have to decide what to do if you come across more than one entry. But whatever the case, to do it safely the globbing patterns should be run through a loop that tests each possible value individually.

3) As I mentioned, good coding practice says that code and data should be kept as separate as possible. File and directory names, and any text strings that could change, should be set in variables at the top of the script, or even imported into it from elsewhere. Keep the code clean.

This includes the paths you are trying to test for, so could you please explain clearly what should and shouldn't exist?


So I'm going to ask again for some more background on what you are trying to accomplish. It's still not completely clear to me exactly what you need to test for, and what those tests need to accomplish. Could you please explain the context of your needs in some more detail? What files and directories need to exist, what should not exist, and what should happen if conditions aren't what you expect them to be (e.g. there are extra entries)?

And again, if you could post at least the relevant sections of the actual code, with any sensitive data removed or altered, I'd be happy to help you re-write it to be cleaner and more robust.

hyperdaz 05-06-2013 03:36 PM

Hi David,

1, bash or ksh - keeping things portable might be useful but at the moment 99% of all vms/servers are CentOS.
Code:

cat folder.sh
#!/usr/bin/env bash

2, more than one entry
That's perfect as I want to catch each any every entry
Code:

&& echo "
is replaced with
Code:

for in $(find -iname /DIR/path/xyzfile);do xyz;done
As I said I wanted to reduce the time it took for find to do it's work on TB of data in a lot of directories.

3, thats a great point, I do like to try and keep things a separated as possible but in this case not really reinvent.

I have a much better bash project that might be more interesting an useful if you would like to help with that?

Cheers
Hdaz

David the H. 05-07-2013 11:51 AM

Ok, now we're getting somewhere. I'd still like to have a little more detail about the context of your script; the exact matching criteria you want to use, and what you intend to do with the matches, but at least we can work with this.

Since you do indeed expect to test multiple entries, yes, a loop is what you need. But do not use a for loop if you intend to use an external command like find.

Code:

while IFS='' read -d '' -r dname; do
    echo "$dname"
done < <( find . -type d -iname '*xyz*' -print0 )

This will print out all directories containing the string 'xyz' in or under the current directory. Of course the use of echo here is just for demo, since find could just do the printing itself. Replace it with whatever actions you want to take.

Using -print0 (null separators) and the corresponding settings in read, makes it possible to safely handle all file names.

Notice also the syntax of find. You need to give it one or more starting directories, followed by your matching options, and finally one or more actions to perform on them (-print is the default). The -name/-iname options use globbing patterns, so you have to specify something that will match the entire file name (and don't forget to quote it to protect it from shell expansion).

The input is fed into the loop with a bash/ksh style process substitution, so it's not posix portable. There are more portable ways do handle it if needed.

On the other hand, a for loop is just fine if you use a simple globbing pattern. If you know what directory level to search and don't need to do recursive searching, then this is probably what you really want.

Code:

shopt -s dotglob nullglob

for name in * ; do

    if [[ -d $name ]]; then
        echo "$name is a directory"
    elif [[ -f $name ]]; then
        echo "$name is a regular file"
    else
        echo "$name is a special file of some kind"
    fi

done

dotglob turns on matching for hidden files, and nullglob keeps it from using the raw string if nothing is expanded.

Actually, you could use '*/' as the globbing pattern to expand directories only and skip the other file types.

And again, you'll still have to decide exactly what to do with whatever it detects. You could add them to arrays, for example, for later use.

Code:

if [[ -d $name ]]; then
    dirarray+=( "$name" )
...

This may be useful if, as you seem to be saying, you want to use it to limit the search paths of a subsequent find command.

Code:

find "${dirarray[@]}" <searchoptions> <actions>
But if that's the case, I'm not convinced that this whole exercise is all that worthwhile. If you use find properly you'd probably find it just as efficient on its own. Check out the -prune action in particular to eliminate directory trees that don't need to be searched.

You can also use a globbing pattern directly in the startdir part of find, BTW, as long as it would expand into a list of directories.

Here are a couple of good links about using find:
http://mywiki.wooledge.org/UsingFind
http://www.grymoire.com/Unix/Find.html


Quote:

I have a much better bash project that might be more interesting an useful if you would like to help with that?
Just write it up in its own thread and I'm sure I'll come across it. The other regulars will certainly help out too, if they can.

hyperdaz 05-08-2013 07:12 AM

Hi David,

Many thanks for your time and the detailed reply, the information looks very useful to myself and many others that might glance this post.

I have 40 for loops so at some time will rewrite them to see how much performance difference I might gain from using while loops instead. It's more of a habit than anything else.

I certainlly will post the other project if I don't find what I am seeking (doing a little research before posting :)

Cheers
Hdaz

David the H. 05-10-2013 08:02 AM

The use of while vs for loops isn't really a performance issue, but comes from the fact that they have different functions.

A for loop iterates over a fixed list of individual word tokens, whereas a while loop runs for as long as some condition is true. When the while loop is combined with read it can be used to parse arbitrary input text, from both files and other commands.

The trouble usually comes from trying to use a for loop on command and variable substitutions, as shown in the link I provided. As long as the expansion results in a simple word list it's not a big problem, unless the list is very large, but trying to use it on things like filenames is very risky, due to the shell word-splitting and pathname expansion operations that follow the substitution.

for loops are, however, recommended for use on file names generated by direct globbing expansion. The risk only comes when the list is generated indirectly by another command.

It's therefore a good idea to simply remember to use while+read loops when the input comes from a text file or command, and for loops on globbing, arrays, brace expansion and other lists of simple elements.


The things you should really focus on if you want to reduce script overhead are to:

1) Eliminate as many external command calls as possible, usually by using built-in string manipulations, instead. Also, learn to run a command once and save its output for future use, instead of calling it over and over every time you need it (date often tends to be abused that way).

As a rule of thumb, bulk operations that have to scan large amounts of text are often better handled by an external tool like sed or awk, but once a text string is stored in a variable, it's usually better to use built-in shell operations on it.

2) Design your code flow to eliminate as many redundant operations as possible. e.g. use a single case statement instead of a series of if..elif..else tests, and printf instead of print loops. If your script has 40+ loops, I imagine you can probably combine the operations of at least some of them together.


You should also consider creating functions for often-called operations. This may or may not save on redundancy, but it can at the very least make the code cleaner.


All times are GMT -5. The time now is 02:09 AM.