LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-21-2014, 03:27 PM   #1
jamtat
Member
 
Registered: Oct 2004
Distribution: ubuntu, arch
Posts: 60

Rep: Reputation: 4
bash: search out files by extension, remove spaces, copy elsewhere


Hi. I'm not much of a bash scripter and usually turn to this forum when I need help. Now is one of those occasions.

I have a large number of image files scattered in sub-directories on an external drive and having varying extensions. Since the image files came from a Windows user, most have spaces in the file names. There are at least 5 types of image files mixed in with a few other file types on this drive: tif (TIF), jpg (JPG), psd (PSD), gif (GIF), and bmp (BMP) are the ones I'm targeting. I need to automate finding all those image files by extension and copying them into a single, different directory--say the directory is called copied-images. It would be nice during that process to strip out the blank spaces in the file names and replace them with something like underscores. I'm also considering whether it might not be helpful to prepend to each name the name of the directory from which it is being copied, along with, perhaps, the name of the directory above that.

So a file named Good Image #1.tif, located in the directory images/CD1/files/ would end up looking like CD1-files-Good_Image_#1.tif and would get copied over to a directory called copied-images/.

Can anyone offer input on how to accomplish this task? Help will be much appreciated.

James

Last edited by jamtat; 02-21-2014 at 03:34 PM.
 
Old 02-21-2014, 03:35 PM   #2
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,688

Rep: Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775
i would look into the find command and pipe the output into something like sed or tr. you can also use the file command to find files that have the wrong suffix.

Last edited by schneidz; 02-21-2014 at 03:36 PM.
 
1 members found this post helpful.
Old 02-21-2014, 06:32 PM   #3
jamtat
Member
 
Registered: Oct 2004
Distribution: ubuntu, arch
Posts: 60

Original Poster
Rep: Reputation: 4
Looking over the man page for find, I see it's possible to use regular expressions. Given the right formulation, all the extensions I'm after, in both lower and upper case, could be targeted. Or the boolean -o could be used and each extension entered in full as
Code:
find . -name '*.tif' -o -name '*.TIF' -o -name '*.jpg' -o -name '*.JPG' -o '*.psd' -o '*.PSD' -o etc -exec cp /path/to/copied-images
Working with sed or tr to remove spaces and perhaps prepend to filenames presents more of a challenge for me. But I'll keep looking into that.

Last edited by jamtat; 02-21-2014 at 06:48 PM.
 
Old 02-21-2014, 09:52 PM   #4
jamtat
Member
 
Registered: Oct 2004
Distribution: ubuntu, arch
Posts: 60

Original Poster
Rep: Reputation: 4
As I continue to research this matter, my thinking on what I need to do changes. I'm now thinking that, rather than copying the image files themselves to a different directory, what I should actually being doing is creating symlinks to those files in that different directory. So, given the example above, the directory should be named symlink-images rather than copied-imges, and it would contain, as mentioned, symlinks. For that reason both prepending something to the symlink names, as well as removing spaces from file names, become very secondary concerns.

To add a bit more detail, I will be performing a mass conversion on those image files, saving the converted image files to yet another, different directory. Tests I've so far conducted indicate to me that the I can perform that conversion on symlinked files. And this approach simplifies somewhat the complex task I'm trying to accomplish here.

So, instead of doing something like -exec cp /path/to/directory on those image files, what I now plan to do is something like -exec ln -s /path-to/symlink-directory. Tips will be appreciated.
 
Old 02-22-2014, 07:38 AM   #5
jamtat
Member
 
Registered: Oct 2004
Distribution: ubuntu, arch
Posts: 60

Original Poster
Rep: Reputation: 4
It turned out to be quite a challenge, given my limited technical abilities, to automate the creation of the symlinks I needed. But I've finally found what appears to be the solution at http://ubuntuforums.org/archive/inde...t-1288584.html . I simply revised that command as follows:
Code:
find /path/to/dir/above/image-dirs \( -iname "*.jpg" -o -iname "*.JPG" \) -exec sh -c 'filename="${0##*/}"; ln -sf "$0" /path/to/symlinks-dir/"$filename"' {} \;
I didn't try entering all the extensions I was searching for, but only one extension at a time, in both lower and upper case. I now have a symlink-images directory full of valid symlinks that point, I believe, to all the image files I need to work with.

I assume a modified command such as
Code:
find /path/to/dir/above/image-dirs \( -iname "*.jpg" -o -iname "*.JPG" -iname "*.tif" -o -iname "*.TIF" -iname "*.psd" -o -iname "*.PSD" -iname "*.gif" -o -iname "*.GIF" -iname "*.bmp" -o -iname "*.BMP" \) -exec sh -c 'filename="${0##*/}"; ln -sf "$0" /path/to/symlinks-dir/"$filename"' {} \;
would have done the job in one iteration, and could have saved me a couple of minutes, but I did not try that. So I can't confirm.

In any case I should, now having all those symlinks in one distinct directory, be able to perform the necessary conversions on those image files.

Last edited by jamtat; 02-22-2014 at 07:44 AM.
 
1 members found this post helpful.
Old 02-22-2014, 11:20 AM   #6
jamtat
Member
 
Registered: Oct 2004
Distribution: ubuntu, arch
Posts: 60

Original Poster
Rep: Reputation: 4
I've discovered that issuing
Code:
for f in *\ *; do mv "$f" "${f// /_}"; done
at the command line replaces, within the current directory, spaces in file names with underscores. (lifted from the thread http://stackoverflow.com/questions/2...-in-file-names )

Last edited by jamtat; 02-22-2014 at 11:39 AM.
 
1 members found this post helpful.
Old 02-22-2014, 11:35 AM   #7
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,688

Rep: Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775
Quote:
Originally Posted by jamtat View Post
Looking over the man page for find, I see it's possible to use regular expressions. Given the right formulation, all the extensions I'm after, in both lower and upper case, could be targeted. Or the boolean -o could be used and each extension entered in full as
Code:
find . -name '*.tif' -o -name '*.TIF' -o -name '*.jpg' -o -name '*.JPG' -o '*.psd' -o '*.PSD' -o etc -exec cp /path/to/copied-images
Working with sed or tr to remove spaces and perhaps prepend to filenames presents more of a challenge for me. But I'll keep looking into that.
sometimes i see the suffixes *.jpeg and *.tiff (fyi: -iname ignores case).

i think this method is more comprehensive since it will include files that dont have a suffix or are mislabelled:
Code:
[schneidz@hyper Documents]$ find `pwd` -exec sh -c "file -i '{}' | grep image" \; -exec sh -c 'filename="${0##*/}"; ln -sf "$0" jamtat/`echo $filename | tr " " "-"`' {} \;
/home/schneidz/Documents/pa40print.asp_files/prnlogout.jpg: image/jpeg; charset=binary
/home/schneidz/Documents/pa40print.asp_files/prnlogout.psd: image/vnd.adobe.photoshop; charset=binary
/home/schneidz/Documents/pa40print.asp_files/prnlogout.bmp: image/x-ms-bmp; charset=binary
/home/schneidz/Documents/pa40print.asp_files/alicia keys.mp3: image/jpeg; charset=binary
/home/schneidz/Documents/pa40print.asp_files/prnlogout.gif: image/gif; charset=binary
/home/schneidz/Documents/pa40print.asp_files/prnlogout.tiff: image/tiff; charset=binary
/home/schneidz/Documents/AES: image/jpeg; charset=binary
/home/schneidz/Documents/Direct Loans(managed by Mohela): image/jpeg; charset=binary
/home/schneidz/Documents/concur/fax.bmp: image/tiff; charset=binary

Last edited by schneidz; 02-22-2014 at 11:50 AM.
 
Old 02-23-2014, 05:17 PM   #8
jamtat
Member
 
Registered: Oct 2004
Distribution: ubuntu, arch
Posts: 60

Original Poster
Rep: Reputation: 4
Interesting contribution, schneidz; thanks for offering it. A couple of observations based on some testing.

On the first trial run of the command, issued in my home directory, it somehow included a .txt file as a target (/home/user/textfile.txt: text/plain; charset=utf-8). Any idea why that happened? Next, in a trial run in the directory containing all the subdirectories with images, I got a few of these:
Code:
sh: -c: line 0: unexpected EOF while looking for matching `'' 
sh: -c: line 1: syntax error: unexpected end of file
Could that have happened because of all the spaces in the names of these image files?

Finally, I've discovered while trying to process this data using the other commands I'd run across, that what I'd feared might be the case is actually the case: some of these files, located in different directories, have the same name. Thus, when I try to create symlinks to those files in a single, separate directory, some of the file names are being overwritten by the names of different files that have the same name. So, now I've got to come up with a way of differentiating files that have the same names.

Using the ls -l command gives me one possible way of doing that: it seems I could, using some other utility, simply prepend the time stamp of each file to its name, thereby differentiating each file with the same name from any other files sharing the same name. Or even just prepending a number to every symlink name as it's created, incrementing that number each time, could do the trick. But it will take more studying before I can figure out how those solutions might be implemented.

PS Thanks for the clarification on -iname

Last edited by jamtat; 02-23-2014 at 07:12 PM.
 
Old 02-23-2014, 05:53 PM   #9
jamtat
Member
 
Registered: Oct 2004
Distribution: ubuntu, arch
Posts: 60

Original Poster
Rep: Reputation: 4
Here's another highly simplified way of finding files by extension: ls *.{tif,TIF,jpg,JPG} (found at http://stackoverflow.com/questions/1...th-ls-and-grep), by the way. That works from within the current directory.
 
Old 02-23-2014, 07:04 PM   #10
jamtat
Member
 
Registered: Oct 2004
Distribution: ubuntu, arch
Posts: 60

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by jamtat View Post
Using the ls -l command gives me one possible way of doing that: it seems I could, using some other utility, simply prepend the time stamp of each file to its name, thereby differentiating each file with the same name from any other files sharing the same name. Or even just prepending a number to every symlink name as it's created, incrementing that number each time, could do the trick. But it will take more studying before I can figure out how those solutions might be implemented.
Prepending the output of echo $(date %N) to each symlink could ensure the uniqueness I'm looking for, though a string of 3 or 4 numerals would be preferable the 9 numerals that outputs.

Last edited by jamtat; 02-24-2014 at 10:20 AM.
 
Old 02-24-2014, 06:43 AM   #11
jamtat
Member
 
Registered: Oct 2004
Distribution: ubuntu, arch
Posts: 60

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by jamtat View Post
On the first trial run of the command, issued in my home directory, it somehow included a .txt file as a target (/home/user/textfile.txt: text/plain; charset=utf-8). Any idea why that happened?
Nevermind. I've figured this one out. I anonymized that output; the original file name was not texttfile.txt, but something like imagelist.txt. The grep command command was catching the word "image" in the names of files, as well as where it was supposed to be catching it, in the file type description.
 
1 members found this post helpful.
Old 02-24-2014, 07:46 AM   #12
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,688

Rep: Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775Reputation: 775
^ that still is undesired... i think i would have to revise my command in post #7:
Code:
find `pwd` -exec sh -c "file -i '{}' | grep image.*charset=binary$" \; -exec sh -c 'filename="${0##*/}"; ln -sf "$0" jamtat/`echo $filename | tr " " "-"`' {} \;
this should narrow down the results a bit...
i have yet to come across any error like in post #8. would you be able to post an example filename that it is tripping on so i can help debug ?

Last edited by schneidz; 02-24-2014 at 08:44 AM.
 
Old 02-24-2014, 09:59 AM   #13
jamtat
Member
 
Registered: Oct 2004
Distribution: ubuntu, arch
Posts: 60

Original Poster
Rep: Reputation: 4
Thanks for the improvement you've offered, schneidz. As to a sample of an instance where the error occurs, here's some additional context, showing the line that immediately precedes one of those error messages:
Quote:
/home/user/mnt/usb/artwork/CDs/Disc8/re-reforestation copy.tif: image/tiff; charset=binary
sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file
In all, I get 5 such messages while processing the symlinks. There are nearly 500 image files being symlinked, if that number helps.

Last edited by jamtat; 02-24-2014 at 11:15 AM.
 
Old 02-24-2014, 10:09 AM   #14
jamtat
Member
 
Registered: Oct 2004
Distribution: ubuntu, arch
Posts: 60

Original Poster
Rep: Reputation: 4
By the way, I've settled on prepending the output of $(date +%N) to the symlink names as a way of ensuring that each is unique, even though the files to which they point might have the same exact name as other files residing in different directories among this collection. I tried $(date +%M%S)--more desirable because it prepends only four numerals (instead of nine) to each symlink name--but there were cases where two identically-named files might get processed within the same minute/second interval.

To show how that modification looks, using schneidz's latest revision, the command I would use follows:
Code:
find `pwd` -exec sh -c "file -i '{}' | grep image.*charset=binary$" \; -exec sh -c 'filename="${0##*/}"; ln -sf "$0" jamtat/`echo $(date +%N)-$filename | tr " " "-"`' {} \;

Last edited by jamtat; 02-24-2014 at 10:13 AM.
 
Old 02-24-2014, 11:13 AM   #15
jamtat
Member
 
Registered: Oct 2004
Distribution: ubuntu, arch
Posts: 60

Original Poster
Rep: Reputation: 4
I've manually confirmed that, indeed, no symlinks are being created in those 5 instances where I get the error message. Here is a bit of additional output, again slightly anonymized, preceding the other four instances in which I get the error:
Code:
/home/user/mnt/usb/artwork/CDs/Disc20/previous.gif: image/gif; charset=binary
sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file
Code:
/home/user/mnt/usb/artwork/CDs/Disc3/Copy of Jpg thumbnail index/APR 28 2000 copy.psd altered may 3 copy.psd#3 copy.psd #2.JPG: image/jpeg; charset=binary
sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file
Code:
/home/user/mnt/usb/artwork/CDs/Disc3/Jpg thumbnail index/APR 28 2000 copy.psd altered may 3 copy.psd#3 copy.psd #2.JPG: image/jpeg; charset=binary
sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file
Code:
/home/user/mnt/usb/artwork/CDs/Disc3/Jpg thumbnail index/Paintings-actual paintings/1995-#108.jpg: image/jpeg; charset=binary
sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file

Last edited by jamtat; 02-24-2014 at 11:17 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Bash: prevent read to remove spaces killout Linux - Newbie 7 11-30-2012 05:46 AM
Need bash script to remove spaces and non alpha chars from folders/ files ne0shell Programming 6 06-22-2012 11:10 AM
[SOLVED] Bash: remove leading spaces with only expansions romagnolo Linux - General 15 02-13-2012 06:39 AM
Bash - to remove files with spaces in name iwitham Programming 4 01-23-2012 12:52 PM
Bash Command to Remove Spaces CincinnatiKid Linux - General 16 09-18-2010 09:46 AM


All times are GMT -5. The time now is 09:46 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration