LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   [Bash] How to expand path variable that contains spaces and wildcards (http://www.linuxquestions.org/questions/programming-9/%5Bbash%5D-how-to-expand-path-variable-that-contains-spaces-and-wildcards-789318/)

jkv 02-15-2010 05:20 PM

[Bash] How to expand path variable that contains spaces and wildcards
 
Hi all

First post here, I'm quite new with bash and now having bit of a trouble with path expansion of strings that contain some whitespace and wildcards

First my script sources a configuration file that contains array assignments

Code:

...
BACKUP_TARGET_FILES[2]=/boot/config-*                        #  no problems
BACKUP_TARGET_FILES[3]="/root/random dir with space/file*"  #  this is the problem
...

then later in the script I want to expand BACKUP_TARGET_FILES elements as below

Code:

IFS_DEFAULT="$IFS"
shopt -s nullglob
shopt -s dotglob
IFS=

for pattern in "${BACKUP_TARGET_FILES[@]}" ; do
    for file in $pattern ; do
        [ -f "$file" ] && BACKUP_TARGETS+=( "$file" )
    done
done

IFS="$IFS_DEFAULT"
shopt -u nullglob
shopt -u dotglob

this code seems to work but I'm not quite satisfied with it. I'd like to get rid those IFS changes, but haven't found out a solution as of yet.

Problem with default IFS seems to be that with it neither $pattern or "$pattern" work; it either interprets pattern as multiple words (because of spaces) and so expands to wrong paths or it ignores * because it's within quotes.

Any suggestions as to how I could do this better ? Thanks!

allanf 02-15-2010 06:16 PM

When you use the
Code:

    "${BACKUP_TARGET_FILES[@]}"
You expanded all the array values into a new single string by enclosing the reference within the "quotes".

Can you use
Code:

BACKUP_TARGET_FILES=(
                      ...
                      /boot/config-*
                      /root/random\ dir\ with\ space/file*
                      ...
                    )

Rather than the separate index usage assignments.

Code:

    for value in ${BACKUP_TARGET_FILES[@]}; do
      echo "value is:" ${value}
    done

or

Code:

 
    for sindex in ${!BACKUP_TARGET_FILES[@]}; do
        echo "value is:" ${BACKUP_TARGET_FILESs[${index}]}
    done


jkv 02-15-2010 07:29 PM

Quote:

When you use the
Code:
"${BACKUP_TARGET_FILES[@]}"

You expanded all the array values into a new single string by enclosing the reference within the "quotes".
Hm.. that contradicts a bit what I have learned earlier. Isn't ${BACKUP_TARGET_FILES[@]} handled a bit different within double quotes?

from bash man pages under "Special Parameters"
Quote:

@
Expands to the positional parameters, starting from one. When the expansion occurs within double quotes, each parameter expands to a separate word. That is, "$@" is equivalent to "$1" "$2" ... If the double-quoted expansion occurs within a word, the expansion of the first parameter is joined with the beginning part of the original word, and the expansion of the last parameter is joined with the last part of the original word. When there are no positional parameters, "$@" and $@ expand to nothing (i.e., they are removed).
and under "Arrays"
Quote:

Any element of an array may be referenced using ${name[subscript]}. The braces are required to avoid conflicts with pathname expansion. If subscript is @ or *, the word expands to all members of name. These subscripts differ only when the word appears within double quotes. If the word is double-quoted, ${name[*]} expands to a single word with the value of each array member separated by the first character of the IFS special variable, and ${name[@]} expands each element of name to a separate word. When there are no array members, ${name[@]} expands to nothing. If the double-quoted expansion occurs within a word, the expansion of the first parameter is joined with the beginning part of the original word, and the expansion of the last parameter is joined with the last part of the original word. This is analogous to the expansion of the special parameters * and @ (see Special Parameters above).
Am I reading it incorrectly or is there some more special cases mentioned somewhere? It's a long document :(

Quote:

Can you use
Code:

BACKUP_TARGET_FILES=(
...
/boot/config-*
/root/random\ dir\ with\ space/file*
...
)
Problem with this is that it does pathname expansion (*) too early. I source it at time when not all necessary mounts may not be done (/boot for example)... that's done a bit later in the script.
Code:

...
source /etc/gentoo-backup.conf
...
# -
# Do mounts (entries must exist in fstab)
# -
for dir in "${BACKUP_MOUNTS[@]}" ; do
    if [ ! "$(grep "$dir" /etc/mtab)" ] ; then
        echo "Mounting $dir"
        mount "$dir" || exit 1
    fi
done

Also I already tried to use "\" earlier with some code changes in the script and thought it worked, though I could be wrong... backtracked soon after that and then tried something else.
But I'd prefer to let user do simple double quoting in config file and then do all the hard work in the script.

allanf 02-15-2010 08:32 PM

Yes you are correct on the "${arrayvar[@]}" which is like the "$@" for the arguments. I did not read the '@' correctly.

When having delayed expansions it is also hard. If the stuff is fixed at the time you could try:

Code:

my_array=(
            ...
            $(cd /path/desired/ && ls wild-*)
            ...
        )

This will be in the other path, but will not include the path.



The use of spaces is really discourage (even in Windows) by any one that writes scripts (perl, python, batch, etc) bit windows does not allow the use of a '"' (quote in a name as they have to quote the name when working with them). Unix and Linux only that two characters that can not be in a name but many characters that can problems to people writting scripts. The not allowed characters are '/' and the character having a binary value of 0 (nul). But it is unwise to use lots of characters such as ! $ , ; ' " ( ) [ ] \ ...

Just because a character can be used is not a GOOD reason to use them.

catkin 02-16-2010 12:05 AM

Here's prrof of concept code without IFS manipulation
Code:

#!/bin/bash
#shopt -s extglob

# For testing
(
    dir='/tmp/dir with space in name'
    mkdir "$dir"
    touch "$dir/foo"
    touch "$dir/bar"
    touch "$dir/file with space in name"
)

# Configuration
BACKUP_TARGET_FILE_PATTERNS[0]="/tmp/d*/b*"
BACKUP_TARGET_FILE_PATTERNS[1]="/tmp/d*/f*"

# Intialisation
i=-1

# Gopher it!
for pattern in "${BACKUP_TARGET_FILE_PATTERNS[@]}" ; do
    echo "DEBUG: \$pattern is '$pattern'"
    for file in $pattern ; do
        echo "DEBUG: \$file is '$file'"
        if [[ -f "$file" ]]; then
            echo "DEBUG: file exists, adding to list of files to backup"
            let i=i+1
            BACKUP_TARGET_FILES[$i]="$file"
        fi
    done
done

Here's the output
Code:

DEBUG: $pattern is '/tmp/d*/b*'
DEBUG: $file is '/tmp/dir with space in name/bar'
DEBUG: file exists, adding to list of files to backup
DEBUG: $pattern is '/tmp/d*/f*'
DEBUG: $file is '/tmp/dir with space in name/file with space in name'
DEBUG: file exists, adding to list of files to backup
DEBUG: $file is '/tmp/dir with space in name/foo'
DEBUG: file exists, adding to list of files to backup


jkv 02-16-2010 07:32 AM

Quote:

Originally Posted by catkin (Post 3865146)
Here's prrof of concept code without IFS manipulation
Code:

#!/bin/bash
#shopt -s extglob

# Configuration
BACKUP_TARGET_FILE_PATTERNS[0]="/tmp/d*/b*"
BACKUP_TARGET_FILE_PATTERNS[1]="/tmp/d*/f*"
...
# Gopher it!
for pattern in "${BACKUP_TARGET_FILE_PATTERNS[@]}" ; do
    echo "DEBUG: \$pattern is '$pattern'"
    for file in $pattern ; do
        echo "DEBUG: \$file is '$file'"
        if [[ -f "$file" ]]; then
            echo "DEBUG: file exists, adding to list of files to backup"
            let i=i+1
            BACKUP_TARGET_FILES[$i]="$file"
        fi
    done
done


Any ideas how to get this code working in case there are both space and wildcard in patterns? This is exactly the problem I'm having: space(s) + wildcard(s) together. So fex. adding
Code:

BACKUP_TARGET_FILE_PATTERNS[2]="/tmp/dir with space*/f*"  # space+wildcard pattern
then extra output is

Code:

DEBUG: $pattern is '/tmp/dir with space*/f*'
DEBUG: $file is '/tmp/dir'
DEBUG: $file is 'with'
DEBUG: $file is 'space*/f*'

if $pattern is quoted instead

Code:

...
    echo "DEBUG: \$pattern is '$pattern'"
    for file in "$pattern" ; do
        echo "DEBUG: \$file is '$file'"
...

it stops working because no pathname expansions are done.

Maybe IFS manipulation is quite good solution afterall and not worth fixing?

catkin 02-16-2010 08:26 AM

Quote:

Originally Posted by jkv (Post 3865469)
Any ideas how to get this code working in case there are both space and wildcard in patterns?

[snip]

Maybe IFS manipulation is quite good solution afterall and not worth fixing?

Sorry I did not understand the problem :redface:

Having thought about the actual problem and researched some alternatives, I think your present solution is probably optimal. You could make it a little neater by not storing and restoring IFS; simply unsetting IFS is functionally equivalent to having it set at the default value.
Code:

unset IFS
.

tuxdev 02-16-2010 10:26 AM

Code:

patterns=("/usr/*/foo*", "/var/r*/bar*"j, "/mnt/qux*")
for pattern in "${patterns[@]}" ; do
  while IFS="" read -r -d "" file ; do
      echo "$file"
  done < <(find / -path "$pattern" -print0)
done

Quote:

Just because a character can be used is not a GOOD reason to use them.
This is no justification to allow scripts to break. Really, ever since I've started over-quoting, I haven't had a spaces-in-filename or shell metacharacters problem. I still do IFS=$'\n' on the top of my scripts, but that's more insurance than anything else.

catkin 02-16-2010 10:46 AM

Quote:

Originally Posted by tuxdev (Post 3865666)
Code:

patterns=("/usr/*/foo*", "/var/r*/bar*"j, "/mnt/qux*")
for pattern in "${patterns[@]}" ; do
  while IFS="" read -r -d "" file ; do
      echo "$file"
  done < <(find / -path "$pattern" -print0)
done


Hello tuxdev :)

Nice, robust technique.

I tried the find / -path approach but discarded it for a couple of reasons. The most important was that find does not regard the "/" character as special in -path patterns so "/etc/*.cfg" would match "/etc/foo/my.cfg" which is counter-intuitive. The second was that it searches the whole file system hierarchy which may be resource intensive.

The technique could be modified if the search patterns did not have wildcarded directories in which case the search pattern could be parsed into directory and filename_pattern
Code:

find "$directory" -maxdepth 1 -type f -name "filename_pattern" -print0
This would avoid both the problems described above.

BTW, there's a typo in the done < <(find / -path "$pattern" -print0), which should be done < $(find / -path "$pattern" -print0)

tuxdev 02-16-2010 04:00 PM

Quote:

I tried the find / -path approach but discarded it for a couple of reasons. The most important was that find does not regard the "/" character as special in -path patterns so "/etc/*.cfg" would match "/etc/foo/my.cfg" which is counter-intuitive. The second was that it searches the whole file system hierarchy which may be resource intensive.
Yeah, that's definitely an issue if that's not what you want to do. I wasn't exactly sure what 'find' behaviour the OP wanted, so the actual command is more a placeholder than anything else. Same with searching / rather than something more specific.

Quote:

BTW, there's a typo in the done < <(find / -path "$pattern" -print0), which should be done < $(find / -path "$pattern" -print0)
No, this is actually correct. <() is similar to $(), but substitutes to a file with the output of the command rather than expanding to the output of the command itself. <() behaves almost identically to <<< "$()", except that the latter breaks when the output contains null characters. Try
Code:

echo $(echo foo)
echo <(echo foo)

On my bash, this produced
Code:

foo
/dev/fd/63


catkin 02-16-2010 11:30 PM

Quote:

Originally Posted by tuxdev (Post 3866010)
No, this is actually correct. <() is similar to $(), but substitutes to a file with the output of the command rather than expanding to the output of the command itself. <() behaves almost identically to <<< "$()", except that the latter breaks when the output contains null characters. Try
Code:

echo $(echo foo)
echo <(echo foo)

On my bash, this produced
Code:

foo
/dev/fd/63


Thank you for something new :) Found the details here.

allanf 02-17-2010 01:19 AM

Quote:

Originally Posted by tuxdev (Post 3865666)
Code:

patterns=("/usr/*/foo*", "/var/r*/bar*"j, "/mnt/qux*")
for pattern in "${patterns[@]}" ; do
  while IFS="" read -r -d "" file ; do
      echo "$file"
  done < <(find / -path "$pattern" -print0)
done


This is no justification to allow scripts to break. Really, ever since I've started over-quoting, I haven't had a spaces-in-filename or shell metacharacters problem. I still do IFS=$'\n' on the top of my scripts, but that's more insurance than anything else.

Here is a bunch of files that can be used to test script handling of valid but hard to handle characters. Just think of the problems.
Code:

http://freeyourip.oiihob.org/76743c05daab6e517da22463deec3092/strange_names.tbz


All times are GMT -5. The time now is 11:03 AM.