LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Recursive file operation in a directory (https://www.linuxquestions.org/questions/linux-newbie-8/recursive-file-operation-in-a-directory-4175484279/)

Paul_Lee 11-11-2013 09:24 PM

Recursive file operation in a directory
 
Hi,
I have a directory of jpg images and I'd like to run through the recursively so that the output file name is the same as the input, with only the file type suffix changed. A bit like doing
cuneiform 001.jpg -o 001.txt
cuneiform 002.jpg -o 002.txt
and so on (there are a lot of images)
I can't remember how to do this recursively in a shell script, could someone help please?
(Also, if I wanted to do this on certain files but miss out on others, how would I achieve this? For instance, omit processing all files that had "img" in the name, such as 002img.jpg, 010img.jpg and so on)

Many thanks!

Paul

evo2 11-11-2013 09:48 PM

Hu,

most people would use find for that. Somthing like:
Code:

find . -name '*.jpg' ! -name '*img*.jpg'
Then you need to pipe that output to your command. You could use a for loop, or you can do something with xargs. For example (using echo instead of cuneiform for testing purposes):
Code:

find . -name '*.jpg' ! -name '*img*.jpg' | xargs -I{} sh -c 'echo "$1 -o ${1%.*}.txt"' -- {}
If your filenames might have whitespaces in them then you might want to use the null separator.
Eg
Code:

find . -name '*.jpg' ! -name '*img*.jpg' -print0 | xargs -0 -I{} sh -c 'echo "$1 -o ${1%.*}.txt"' -- {}
If you are happy with that, then replace the "echo" with the command you actually want to run.


HTH,

Evo2.

Paul_Lee 11-11-2013 09:57 PM

Hi,
Many thanks, I shall try, FWIW, the filenames don't have whitespace, they are just of the variety
002.jpg 003a.jpg 005img.jpg 010.jpg 124img.jpg and so on - all I'm trying to veto are the ones that have "img" in the filename.

Incidentally, what does "-- {}" mean?

evo2 11-11-2013 11:58 PM

Hi,

the "-- {}" allows the xarg variable to be put into the $1 shell variable. There are other ways to achieve the same thing with xargs, but like find, its a very powerful program and once I've found one way that works and can remember, I'm less likely to explore the other options.

Cheers,

Evo2.

Firerat 11-12-2013 01:33 AM

an alternative

Code:

#!/bin/bash
Path=$1
Exclude=$2
while read -d '' FileName;do
    [[ "${FileName}" =~ "${Exclude}" ]] || \
        mv -v "${FileName}" "${FileName%.jpg}.txt"
done < <(find "${Path}" -name "*.jpg" -print0)


takes two options
example , process files in current dir ( including sub directories ) excluding files that contain the string "img"
Code:

TheScripit.sh . img


with comments
Code:

#!/bin/bash
Path=$1
Exclude=$2
# start a read loop, using 'null' as record separator
while read -d '' FileName;do
    [[ "${FileName}" =~ "${Exclude}" ]] || \
        mv -v "${FileName}" "${FileName%.jpg}.txt"
# list construct http://www.tldp.org/LDP/abs/html/list-cons.html#LISTCONSREF
# ^^ test $FileName, if it does not include the $Exclude string, then
# mv -v ( verbose, it prints what it does, nice for feedback )
# http://www.tldp.org/LDP/abs/html/refcards.html#AEN22664
# ^^ string sub. explains what ${FileName%.jpg}.txt is doing
done < <(find "${Path}" -name "*.jpg" -print0)
# < <() heredocs.. this is passing the output of the subshell () into the 'loop'
# http://mywiki.wooledge.org/BashGuide/InputAndOutput
# find , straight forward.. the -print0 ensures we use a 'null' instead of a new line to separate records

# my comments are awful ...
more reading

http://www.tldp.org/LDP/Bash-Beginners-Guide/html/
http://www.tldp.org/LDP/abs/html/
http://mywiki.wooledge.org/BashGuide
http://www.gnu.org/software/bash/manual/bashref.html

The tldp stuff is great, however there are some nasty bad habits in it,
The mywiki.wooledge does a very good job of 'fixing' those habits


a note on the heredocs use here

Code:

find "${Path}" -name "*.jpg" -print0 | while read -d '' FileName;do
  .. stuff ..
done

would work in this case,
However the downside is that any variables you set inside the loop are 'lost' as the | ( pipe ) starts a subshell

best example I can come up with here..

create a new dir, and 'touch' example files

Code:

touch 00{1..5}{,img}.jpg
Code:

#!/bin/bash
Path=$1
Exclude=$2
while read -d '' FileName;do
    [[ "${FileName}" =~ "${Exclude}" ]] \
        || mv -v "${FileName}" "${FileName%.jpg}.txt" \
        && skipped+=("${FileName}")
done < <(find "${Path}" -name "*.jpg" -print0)
echo "I skipped ${skipped[@]}"

Code:

#!/bin/bash
Path=$1
Exclude=$2
find "${Path}" -name "*.jpg" -print0 | while read -d '' FileName;do
    [[ "${FileName}" =~ "${Exclude}" ]] \
        || mv -v "${FileName}" "${FileName%.jpg}.txt" \
        && skipped+=("${FileName}")
done
echo "I skipped ${skipped[@]}"


evo2 11-12-2013 01:50 AM

Hi,

well, if we're doing loops and shell scripts... here's how I would actually do it myself. I use zsh, so I'd run the follow command:
Code:

for f in **/*.jpg~*img.jpg ; do echo "$f" -o "${f%.*}.txt"; done
Again with echo holding the place of the real command until I was happy with the output.

The '**' in zsh will recurse into the sub directories and the '~' in the glob excludes the *img.jpg files.

Cheers,

Evo2.

Firerat 11-12-2013 02:17 AM

nice, I might have to try using zsh

Paul_Lee 11-12-2013 04:44 AM

I tried the one-line command of yours, evo2:

Code:

find . -name '*.jpg' ! -name '*img.jpg' | xargs -I{} sh -c 'cuneiform "$1 -o ${1%.*}.txt"' -- {}
But I got horrible errors, such as:
Magick: Unable to open file (./057.jpg -o ./057.txt) reported by magick/blob.c:2866 (OpenBlob)

But it works it I explicitly type into a shell:
cuneiform 057.jpg -o 057.txt

pan64 11-12-2013 04:58 AM

try remove " :
find . -name '*.jpg' ! -name '*img.jpg' | xargs -I{} sh -c 'cuneiform $1 -o ${1%.*}.txt' -- {}

Paul_Lee 11-12-2013 05:10 AM

Thanks, it worked like a champ! :)

pan64 11-12-2013 05:37 AM

glad to help you.
If you really want to say thanks just press YES. Also please mark the thread SOLVED if your problem has been solved.

Paul_Lee 11-12-2013 05:41 AM

Great, thanks!


All times are GMT -5. The time now is 11:40 AM.