Using tr to replace strings in a group of filenames

Lemmings · 05-01-2008, 02:11 PM

Hello,

I have hundreds of filenames of the form CEU_NA21322_NSP.CEL. I am trying to use tr to first systemactically remove the 'CEU_' prefix; then remove the '_NSP' string; and finally, replace the suffix with ',CEU.CEL'. So that in the end this filename will be

NA21322,CEU.CEL

For the first step this is what I am trying:

mv *.CEL `echo *.CEL | tr -d [="CEU_"=]`
tr: CEU_: equivalence class operand must be a single character
mv: missing destination file operand after `CEU_NA3242_NSP.CEL'
Try `mv --help' for more information.

Can anyone help?

MensaWater · 05-01-2008, 02:31 PM

Not saying you couldn't do it with tr with some effort but this is really a job for sed:

If you did something like:

Code:

echo CEU_NA21322_NSP.CEL |sed -e s/^CEU_// -e s/_NSP/,CEU/

You'd see it converted the file name to what you wanted.

For a list of files you could do this with a "for" loop:

Code:

for FILE in `ls CEU_*NSP.CEL`
do NEWFILE=`echo $FILE | sed -e s/^CEU_// -e s/_NSP/,CEU/`
   mv $FILE $NEWFILE
done

The above assumes all your files are in the current directory. You'd need to modify it for different location.

I'd recommend you copy a couple of files to a completely different directory and test the above on that new directory to be sure the results are what you expect.

The above is provided "as is" and except for the original echo line has NOT been tested by me. Testing is very important.

ararus · 05-01-2008, 02:47 PM

Quote:

for FILE in `ls CEU_*NSP.CEL`

What is the point of using ls here?

Code:

for FILE in CEU_*NSP.CEL

In any case, for large number of files, you're better off doing it with a single program (e.g. perl), rather than spawning mv/rename/whatever for each file. Something like:

Code:

#!/usr/bin/perl -w

for $ARGV (@ARGV) {
   $_ = $ARGV;
   s/^CEU_//;
   s/_NSP//;
   s/\.CEL$/,CEU\.CEL/;
   rename $ARGV, $_;
}

Caveats:
my perl knowledge is quite limited, the above is probably twice as long/inefficient as necessary.
It seems to work but test it first, as jlightner suggested.

Lemmings · 05-02-2008, 06:24 PM

Thanks for the help guys. I am more comfortable with bash shell scripts, so I will stick to using that for now. If I write a script though, how can I pass the contents of 'ls' to that script as an argument?

Thanks for you help!

ararus · 05-04-2008, 09:17 AM

You don't need to, and shouldn't, pass the output of ls. It's a bad idea to rely on ls's output since it's not portable, it varies according to environment variables, and it's changed over time.

If you just want to iterate over a list of files, you can pipe the output of find to the script, or just use shell globbing.

If you need to know the date/size/whatever, it's better to use stat than the output of ls (ls uses stat anyway).

You might want to write the script so you can use it both ways (piping filenames in or passing on the command line)

E.g.

Code:

#!/bin/sh

handle_file()
{
   # do something with file
}

# do option handling here (getopt) if required
# getopt ...

# if no arguments, read from stdin
if [[ -z $1 ]]; then
  while read file; do
     handle_file "$file"
  done
else
   for file in "$@"; do
     handle_file "$file"
   done
fi

Note though that the behaviour in each case is different, since passing files on the command line won't be recursive. But you can add a check for directories.

Code:

....

handle_directory()
{
  cd "$1"
  for f in *; do
    if [[ -d "$f" ]]; then
      handle_directory "$f"
    else
      handle_file "$f"
    fi
  done
  cd ..
}

if [[ -z $1 ]]; then
  while read file; do
     handle_file "$file"
  done
else
   for file in "$@"; do
     if [[ -d "$file" ]]; then
        handle_directory "$file"
     else
        handle_file "$file"
     fi
   done
fi

Alternatively, you might want to make recursion optional.

Code:

recursion=""

# use getopt to set recursion=1 if requested

  ...

  if [[ -d "$file" ]] && -n "$recursion" ]]; then
    handle_directory "$file"
  else
    ...
  fi

Lemmings · 05-20-2008, 04:12 PM

Quote:

Originally Posted by jlightner

Not saying you couldn't do it with tr with some effort but this is really a job for sed:

If you did something like:

Code:

echo CEU_NA21322_NSP.CEL |sed -e s/^CEU_// -e s/_NSP/,CEU/

You'd see it converted the file name to what you wanted.

For a list of files you could do this with a "for" loop:

Code:

for FILE in `ls CEU_*NSP.CEL`
do NEWFILE=`echo $FILE | sed -e s/^CEU_// -e s/_NSP/,CEU/`
   mv $FILE $NEWFILE
done

The above assumes all your files are in the current directory. You'd need to modify it for different location.

I'd recommend you copy a couple of files to a completely different directory and test the above on that new directory to be sure the results are what you expect.

The above is provided "as is" and except for the original echo line has NOT been tested by me. Testing is very important.

This code chunk you suggested does not work:

for ff in 'CEU_*NSP.CEL'; do NEWFILE='echo $ff |sed -e s/^CEU_// -e s/_NSP/,CEU/';
mv $ff $NEWFILE; done

I get the following error message when running it as a script:

mv: invalid option -- e
Try `mv --help' for more information.

MensaWater · 05-21-2008, 09:53 AM

The code I suggested DOES work because I tested it before posting.

It appears you combined advice of two posters:

In mine I had you do `ls CEU_*NSP.CEL` (back ticks included) to get the list of files.

A later poster said to drop the ls. He also meant for you to drop the back ticks. The back ticks say "execute this command before the rest of the command line". He basically was saying you don't need to and shouldn't do the ls syntax I'd provided. I personally don't think his objections were that valid but also don't think he was wrong in saying it would work his way.

You can do it the way I had it OR the way he had it. I haven't tested it his way but don't see any reason it wouldn't work.

There is a question why it passed the "-e" to your mv command since it is in the sed syntax rather than the mv syntax. I wouldn't investigate that until you correct as noted above. If you still see the same error it may indicate you accidentally created a file named "-e". That occurs on occasion when you type something incorrectly. If you have such a file you can delete it by typing:
rm ./"-e"
in the directory where the file exists.

P.S. When asking for help it might be better to say "I couldn't get it working" than to say "This code chunk you suggested does not work". It allows for the idea that the mistake was yours rather than that of the person that was attempting to help you and might encourage them to follow up. Not saying I can't a mistake but here the mistake was yours and in a different mood I might just have ignored your post or replied with a simple flame.

Lemmings · 05-21-2008, 11:00 AM

Thanks alot for your help. Ive worked out the problems (I wasnt properly doing the command subsitution previously) and as you have suggested the following works:

for ff in CEU_*NSP.CEL
do NEWFILE=$(echo $ff |sed -e s/^CEU_// -e s/_NSP/,CEU/)
mv $ff $NEWFILE
done