LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   queries about [:digit:] and cp --verbose effect: GNU shell (https://www.linuxquestions.org/questions/linux-newbie-8/queries-about-%5B-digit-%5D-and-cp-verbose-effect-gnu-shell-911836/)

MMaddoxx 11-04-2011 07:42 AM

queries about [:digit:] and cp --verbose effect: GNU shell
 
Hi,

I'm new to shell scripting.

I have written the shell below to enable a set of otherwise identically-named .png files to share a single directory.
The script extracts two sample identifiers (numeric) from a parent directory name and adds them to the front of the file.png names which reside in a daughter 'Images' folder.

Code:

fldrs=*fastqc

for sdir in $fldrs
do
  echo "looking at $sdir"
  prepend=`expr $sdir : 'SA\([0123456789]*\)'`
  #prepend=`expr $sdir : 'SA\([:digit:]*\)'`
  echo $prepend
  append=`expr $sdir : '.*ETC\(_[0123456789]*\.\)'`
  #append=`expr $sdir : '.*ETC\(_[:digit:]*\.\)'`
  echo $append
 
  for f in `ls $sdir/Images`
  do
    echo "considering $f"
    `cp -v $sdir/Images/$f $sdir/Images/$prepend$append$f`
  done
 
done


It all works but ...

1. the commented-out lines for the [:digit:] pattern match do not work (at all). I expected them to to be equivalent to the preceding lines with the less compact [0123456789] form. What have I missed??

2. despite the fact that the script does what I want, the -v flag gives me verbose messages suggesting otherwise. These occur just after each
Code:

echo "considering $f"
and are of the form
scriptname : line number : $f : not found. Which seems odd, since the script does what I want so the file must have been found. The line number is 24, which is the very last line of the script
Code:

done
. What have I missed?

I'm running Ubuntu 10.04 LTS & Gnome terminal 2.30.2

I have had a good look around this site (and elsewhere) but have been unable to figure out what's wrong. I'd appreciate some help.

Thanks

m

berbae 11-04-2011 10:08 AM

Quote:

Originally Posted by MMaddoxx (Post 4515683)
1. the commented-out lines for the [:digit:] pattern match do not work (at all). I expected them to to be equivalent to the preceding lines with the less compact [0123456789] form. What have I missed??

You have to use [[:digit:]]

Quote:

Originally Posted by MMaddoxx (Post 4515683)
2. despite the fact that the script does what I want, the -v flag gives me verbose messages suggesting otherwise. These occur just after each
Code:

echo "considering $f"
and are of the form
scriptname : line number : $f : not found. Which seems odd, since the script does what I want so the file must have been found. The line number is 24, which is the very last line of the script
Code:

done
. What have I missed

You don't need the backquotes around the cp command line, because with them, the shell will want to execute the command output and fails, which results in the errors you see.

So here is the modified code:
Code:

fldrs=*fastqc

for sdir in $fldrs
do
  echo "looking at $sdir"
  prepend=$(expr $sdir : 'SA\([[:digit:]]*\)')
  echo $prepend
  append=$(expr $sdir : '.*ETC\(_[[:digit:]]*\.\)')
  echo $append
 
  for f in $sdir/Images/*
  do
    echo "considering $f"
    cp -v "$sdir/Images/$f" "$sdir/Images/$prepend$append$f"
  done
 
done

I replaced the backquotes with the preferable $(...) syntax.

grail 11-04-2011 10:30 AM

Well my first question back is ... which shell? You appear to not define an interpreter at the beginning and the default in Ubuntu is dash (so this may complicate some of the features you are trying to
use).

As for the script:

1. How are you guaranteeing that all 'sdir' found are directories?

# We will assume number 1 is true, ie all dirs at top level

2. [0123456789], [0-9] and [[:digit:]] are all equivalent character lists

3. Parsing ls is generally a bad idea as pointed out here

4. Most probably reason for error is spacing in file names which the for loop will perform word splitting.

5. As per point 1, there is no guarantee you are processing files in the inner for loop.

6. Not sure why you are using process substitution (``) for the copy line when you are not returning the data to anything

7. Assuming you do have unusual characters in file and / or directory names then you should quote your variables being used in the copy

David the H. 11-05-2011 02:12 PM

Just to make it clear, the named character class as enclosed by [::] is equal to a predefined set of characters. This is separate from the [] character range expression. You generally use the first one inside the second one.

So [:digit:] is equal to 0-9, and [[:digit:]] is equal to [0-9].

http://mywiki.wooledge.org/RegularExpression

By the way, the use of expr is generally unnecessary in most modern shells. Almost everything that it can do is now built-in in some form or other.

Here are a few ways to extract a string of digits using only shell built-ins (with a guess about your directory name format).

Code:

sdir='SA1234FOOBAR'

prepend="${sdir#SA}"
prepend="${prepend%%[^[:digit:]]*}"

echo "$prepend"

This uses standard parameter expansions supported by all posix shells (see link below).

In fact, if there's only a single string of digits in the name, you could even use this:
Code:

prepend="${sdir//[^[:digit:]]}"
If you have a fairly modern version of bash (v3+), you can also use this:
Code:

re='SA([[:digit:]]*)'
[[ $sdir =~ $re ]] && echo prepend="${BASH_REMATCH[1]}"

The [[ test can be used to apply a regex to the string, with the matched substring and any captures being held in the BASH_REMATCH array. Note that it's generally advisable to store the regex in a separate variable, to avoid having to backslash reserved characters on the right-hand-side of the expression (the regex as a whole needs to be unquoted in order to work).

They may take an extra line or two of code, but since everything is done internally, they should be more efficient than calling an external tool like expr.

parameter expansion
string manipulation

MMaddoxx 11-09-2011 02:57 PM

thanks to all responders!
 
@David the H. Thank you; your explanations were very clear, and the links inserted very useful


All times are GMT -5. The time now is 05:04 AM.