LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-12-2010, 11:20 AM   #1
Yalla-One
Member
 
Registered: Oct 2004
Location: Norway
Distribution: Slackware, CentOS
Posts: 635

Rep: Reputation: 35
bash-code to rename files based on config file


Hi,

I'm writing a bash shell script that among various other things will traverse through a directory with hundreds of files and rename those who match a pattern found in a config file. It's expected that only about one in ten files will actually match, and those who don't, will simply just be ignored for this purpose.

The config file should be in the format of BEFORE DELIMITER AFTER, for instance
Code:
dBASE program file | Clipper source code
Turbo Pascal file | Pascal source code
C++ source | C source code
(Don't yet know if " | " is a sensible delimiter)

This should for instance cause the file "dBase program file December 1987.prg" to be renamed "Clipper source code December 1987.prg", and conversely "C++ source August 1996.cpp" to be renamed "C source code August 1996.cpp" etc...
A sample file such as "Random Data File.dat" should not be renamed here since it's not mentioned in the config file...

What is the quickest, most elegant way to do this in bash?

I am thinking of using bash's built-in regex matching combined with the /bin/rename utility, but don't quite know how to get started to catch this...
I guess there are plenty ways of doing this in perl and elsewhere as well, but since this has to integrate into a pre-existing bash script, that's what I'm looking for.

Anyone out there with a spare moment to offer a hint in the right direction? Thanks much in advance!
 
Old 04-12-2010, 12:15 PM   #2
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,506

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Maybe not the quickest nor the most elegant, but I'd think about two nested while loops. The outer one to parse the config file by retrieving the old and the new patterns, the inner one to find (recursively) the matching files and rename them accordingly. The delimiter can be managed by IFS. For example:
Code:
#!/bin/bash
OLD_IFS=$IFS
IFS="|"
while read old new
do
  while read OLDNAME
  do
    echo mv "$OLDNAME" "${OLDNAME/${old% }/${new# }}"
  done < <(find . -name ${old% }\*)
done < config
IFS=$OLD_IFS
I'm thinking about a more elegant way, now...
 
Old 04-12-2010, 08:52 PM   #3
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,541

Rep: Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919
My first question is, are all the files in a single directory or in a directory structure?
Quote:
traverse through a directory with hundreds of files
If a single directory simply perform the first loop and throw any erroneous info from mv into /dev/null,
ie success or failure of the file being there or not is not of a concern:
Code:
#!/bin/bash
OLD_IFS=$IFS
IFS="|"
while read old new
do
    mv $old $new 2>/dev/null
done < config
IFS=$OLD_IFS
Or as an alternative to the above you could use a single awk line:
Code:
#!/bin/bash

awk 'BEGIN{FS="|"}{print | "mv "$1" "$2}' config
 
Old 04-12-2010, 08:52 PM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,541

Rep: Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919
My first question is, are all the files in a single directory or in a directory structure?
Quote:
traverse through a directory with hundreds of files
If a single directory simply perform the first loop and throw any erroneous info from mv into /dev/null,
ie success or failure of the file being there or not is not of a concern:
Code:
#!/bin/bash
OLD_IFS=$IFS
IFS="|"
while read old new
do
    mv $old $new 2>/dev/null
done < config
IFS=$OLD_IFS
Or as an alternative to the above you could use a single awk line:
Code:
#!/bin/bash

awk 'BEGIN{FS="|"}{print | "mv "$1" "$2}' config
Edit: Sorry not sure why this published twice

Edit 2: So on top of all that it appears I didn't read this properly as I didn't get the extension <doh>. Sorry

Last edited by grail; 04-12-2010 at 08:56 PM.
 
Old 04-13-2010, 02:42 AM   #5
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,506

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Hi grail. In my example it is the inner loop, fed by the find command, that takes care of recursion (if any) and of non existent files (in that case the output is null and the outer loop cycles to the next pair of patterns). I admit it isn't an efficient method: if the patterns in the config file are hundreds the find command is executed hundreds of times and it could result in a waste of resources. By the way, I cannot end up with a more elegant method right now.
 
Old 04-13-2010, 02:46 AM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,541

Rep: Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919
@colucix: elegant enough as you at least read the question ... my bad
 
Old 04-13-2010, 04:42 AM   #7
Yalla-One
Member
 
Registered: Oct 2004
Location: Norway
Distribution: Slackware, CentOS
Posts: 635

Original Poster
Rep: Reputation: 35
Thanks for all the comments - I think I found a solution after reading the bash manual a bit more intensely. How about this: (I replaced the separater between OLD and NEW from "|" to "=" to avoid pipe errors :-) )

Code:
#!/bin/sh
FILENAME="Old DOS filename.prg"
CHECKER=$(grep "$FILENAME" fileparser.rename)

if [ ! -n "$CHECKER" ]; then
        echo "No match"
        exit
fi
OLDNAME=${CHECKER%=*}
NEWNAME=${CHECKER#*=}
echo "Was: $OLDNAME"
echo "Is: $NEWNAME"
rename "$OLDNAME" "$NEWNAME" "$OLDNAME.*"
This is ofcourse only a sample code, but it's quite short and neat, and if it doesn't match, it doesnt do anything... I'll put this in the standard loop ofcourse, but for each file, this should be as efficient as it gets.

Comments, thoughts on how to further optimize?
 
Old 04-13-2010, 07:32 AM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,541

Rep: Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919
Have you thought what your code does if the very first check does not work??
Code:
if [ ! -n "$CHECKER" ]; then
        echo "No match"
        exit
fi
 
Old 04-13-2010, 09:06 AM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,541

Rep: Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919
So my head has decided to come back to the party, and it thinks it has a competitor
Code:
#!/bin/bash

IFS="|"

while read old new
do
    found=$(find -name "$old*")

    [[ ! -z $found ]] && mv "$found" "${found/$old/$new}"
done < config_file
 
Old 04-13-2010, 12:03 PM   #10
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,506

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
@Yalla-One: given the requirement explained in post #1, I assume the config file doesn't contain the full name of files but only a pattern (eventually matching file names). If this is still valid the line
Code:
CHECKER=$(grep "$FILENAME" fileparser.rename)
should return an empty variable in any case. Please, can you post the actual content of "fileparser.rename" (or part of it)?

Furthermore, if you want to do renaming only when $CHECKER is not empty, the code should be something like:
Code:
if [ ! -n "$CHECKER" ]; then
   echo "No match"
else
   OLDNAME=${CHECKER%=*}
   NEWNAME=${CHECKER#*=}
   echo "Was: $OLDNAME"
   echo "Is: $NEWNAME"
   rename "$OLDNAME" "$NEWNAME" "$OLDNAME.*"
fi
A last note: take in mind that there are two versions of the rename command out there: one coming from util-linux (the one used by your script) and one using perl regexp and the substitution command instead of the FROM and TO patterns. This one is usually provided by Debian-based systems. Therefore if portability is an issue, the rename command should be used carefully (maybe better to stick with mv, especially to rename one file at a time).

@grail: nice code. Unfortunately it fails if there are more files matching the pattern. In that case the found variable contains two or more file names (one per line) and the resulting mv command is not what you expect to be.

The quest continues...
 
Old 04-13-2010, 01:43 PM   #11
Yalla-One
Member
 
Registered: Oct 2004
Location: Norway
Distribution: Slackware, CentOS
Posts: 635

Original Poster
Rep: Reputation: 35
IFS nightmare

Thanks again for the replies and engaging discussions.

The original problem is now solved and works well, as shown below, however there's a new small IFS-related problem.

Here's a more complete code - first the config file for the partial conversions:
Code:
PASCAL=pascal
Sample Data File=Indata
Then the relevant parts of the code:
Code:
OLDIFS=$IFS
IFS='\n'
# Find all incoming files and put them into FILEARRAY
FILEARRAY=( $(find "${INCOMING}" \
        \( -iname "*.prg" -o -iname "*.cpp" -o -iname "*.dbi" \) -printf "%f\n") )
IFS=$OLDIFS
debug Files collected: ${#FILEARRAY[*]}
This means that I now have all the files collected in the INCOMING area in an array that is separated by "\n" (newline), so that I do not get in trouble with filenames containing spaces.

Then we continue:
Code:
RENAMECHECK=$(grep "${PROG_NAME}=" ${RENAMECONFIGFILE})
if [ -n ${RENAMECHECK} ]; then
  OLDNAME=${RENAMECHECK%=*}
  PROG_NAME=${RENAMECHECK#*=}
  debug "Will rename from $OLDNAME to $PROG_NAME"
fi
As you can see the actual move (went from rename to mv) is done later where PROG_NAME is remembered from the above code snip, and PROG_CODE (program code ID), PROG_CONS (consultant ID) and COMNT (comment) is collected elsewhere (long code, but works well, and thus not relevant here). However, as we proceed, the program now needs to loop through the array of FILEARRAY again, and thats when all goes belly up:

Code:
IFS='\n'
for item in ${FILEARRAY[*]}
do
  $FILE_OLD="${item}"
  IFS=$OLDIFS
  FINAL_NAME="${PROG_NAME} - ${PROG_CODE}${PROG_CONS} - ${PROG_COMNT}"
  FINAL_FILE="${FINAL_NAME}.${PROG_EXT}"
  debug Moving ${FILE_OLD} to ${DEST_DIR}/${FINAL_FILE}
  mv "${FILE_OLD}" "${DEST_DIR}"/"${FINAL_FILE}"
  chmod 0644 "${DEST_DIR}"/"${FINAL_FILE}"
  IFS='\n'
done
IFS=$OLDIFS
What happens now is that instead of moving the file, the IFS of '\n' messes up the formatting of FINAL_NAME so that all n's are replaced with space. And on that note I am completely stuck!

I really want to use find to find the files due to the possibility of sub-directories etc. And that is why I use find with -printf and end it with \n so that find puts a newline to all its found.

For some reason, using another iFS such as \777 does not work, as that causes the final part of the code snip to output the filename with the unicode of \777 at the end, thus making the "mv" command fail (obviously).

I am clearly too new to the secrets of IFS, and setting it back and forth as I do within the loop _clearly_ confuses the script immensely.

So as you can see the first and original question is now solved and works well, but the latter part is broken in a way I did not expect initially...

Any takers?
 
Old 04-13-2010, 02:36 PM   #12
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,551
Blog Entries: 28

Rep: Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176
Quote:
Originally Posted by Yalla-One View Post
Then the relevant parts of the code:
Code:
OLDIFS=$IFS
IFS='\n'
# Find all incoming files and put them into FILEARRAY
FILEARRAY=( $(find "${INCOMING}" \
        \( -iname "*.prg" -o -iname "*.cpp" -o -iname "*.dbi" \) -printf "%f\n") )
IFS=$OLDIFS
debug Files collected: ${#FILEARRAY[*]}
This means that I now have all the files collected in the INCOMING area in an array that is separated by "\n" (newline), so that I do not get in trouble with filenames containing spaces.

Any takers?
How about
Code:
while IFS="" read -r -d "" file ; do
   FILEARRAY+=("$file")
done < <(find ... -print0)
Notes:
  1. IFS is set only during the read; no need for OLDIFS=$IFS ... IFS=$OLDIFS.
  2. There is a space between < and < after the done.
  3. This technique handles all possible file names including ones that include line ends.
  4. For more info on the technique (thanks tuxdev) see this LQ post and later in the thread.
BTW when IFS is unset bash works as if it had the default value so, if you do want to keep changing IFS, you can effectively put it back to default with unset IFS; that saves having to use OLDIFS=$IFS.
 
1 members found this post helpful.
Old 04-13-2010, 09:32 PM   #13
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,541

Rep: Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919
Quote:
@grail: nice code. Unfortunately it fails if there are more files matching the pattern. In that case the found variable contains two or more file names (one per line) and the resulting mv command is not what you expect to be.
@colucix - See I told everyone you weren't just a pretty face (just jokes )

So here is something a bit different:
Code:
#!/bin/bash

awk 'BEGIN{FS="[/|]"}
     ARGV[1]==FILENAME{_[$1]=$2}
     ARGV[2]==FILENAME{
         test=gensub(/\..*/,"","g",$NF);
         if(test in _)
              print | "mv \""$0"\" \""gensub(test,_[test],"g")"\""
     }' config <(find -type f)
Now please be kind with the criticism
 
Old 04-14-2010, 12:56 AM   #14
Yalla-One
Member
 
Registered: Oct 2004
Location: Norway
Distribution: Slackware, CentOS
Posts: 635

Original Poster
Rep: Reputation: 35
Quote:
Originally Posted by catkin View Post
How about
Code:
while IFS="" read -r -d "" file ; do
   FILEARRAY+=("$file")
done < <(find ... -print0)
Notes:
  1. IFS is set only during the read; no need for OLDIFS=$IFS ... IFS=$OLDIFS.
  2. There is a space between < and < after the done.
  3. This technique handles all possible file names including ones that include line ends.
  4. For more info on the technique (thanks tuxdev) see this LQ post and later in the thread.
Great - thanks!
One question comes to mind though - do I need any special handling of IFS or anything else when traversing through the array at a later point to go through all the files and give them their special attention? (ie where I originally ended up with the script eating all my n's due to the IFS blunder) ?

-y1
 
Old 04-14-2010, 01:38 AM   #15
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,551
Blog Entries: 28

Rep: Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176
Quote:
Originally Posted by Yalla-One View Post
Great - thanks!
One question comes to mind though - do I need any special handling of IFS or anything else when traversing through the array at a later point to go through all the files and give them their special attention? (ie where I originally ended up with the script eating all my n's due to the IFS blunder) ?

-y1
No -- the array members are strings of the exact file names but try it to be sure.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash script rename files based on directory name cupofnestor Linux - General 7 03-31-2010 08:20 AM
[SOLVED] how to rename a file based on it creation date limdel Linux - Newbie 11 09-23-2009 06:41 AM
Trouble with making a bash script to read in different files and rename output files. rystke Linux - Software 1 05-07-2009 08:00 AM
rename files using bash? ferradura Linux - General 4 09-14-2007 03:35 AM
auto file rename based on time wx_jason Linux - Newbie 10 07-10-2003 11:14 AM


All times are GMT -5. The time now is 05:56 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration