LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-21-2010, 03:58 AM   #1
SilversleevesX
Member
 
Registered: May 2009
Posts: 181
Blog Entries: 9

Rep: Reputation: 15
Script to rename files according to filetype


I get a bit lazy using cUrl, often letting downloads go without adding the proper extension to images and whatnot. Yes I know about the -O flag, but I'm the kind who likes to give my files unique names right off the bat (until they puzzle out a way to "stamp" files with originating URLs across platforms, Classic MacOS Web-browser style, curl -o is the way to go).

So I was hoping someone could help me write a script (bash or python, doesn't matter which) that used the file command to look inside files in a single directory and then rename the ones it happened across that didn't have a 3- or 4-letter extension, appending the one that corresponded with the mime type. But this action only to be performed on files with no extension.

I know Linux and Unix don't bother much with extensions. I started home computing on a Mac IIcx -- that kind of intuition is beyond natural for me and very much appreciated. But I, and a few of my friends and family, also use Windows where file extensions are an idiot-proof way, if nothing else good can be said about the two, to get file X to open in application Y and not Q, W or Z (or, worse for some, no application at all).

I've seen a few scripts in different places, but they all seem to have the flaw of renaming files that already have an extension.

Much obliged for any help/pointers in the right direction.

BZT
 
Old 01-21-2010, 04:57 AM   #2
Agrouf
Senior Member
 
Registered: Sep 2005
Location: France
Distribution: LFS
Posts: 1,596

Rep: Reputation: 80
Code:
ls | grep -v "\....." | grep -v "\...." | while read f; do
   mv $f $f.$(file -i $f | sed "s/.*\///g")
done

Last edited by Agrouf; 01-21-2010 at 05:00 AM.
 
1 members found this post helpful.
Old 01-21-2010, 05:33 AM   #3
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Agrouf: Did you test your script? I always use "echo" before a command like mv to make sure it works ok.
Code:
ls | grep -v "\....." | grep -v "\...." | while read f; do echo   mv $f $f.$(file -i $f | sed "s/.*\///g"); done
mv 020209_christianbale 020209_christianbale.mpeg; charset=binary
mv 111-1112_IMG 111-1112_IMG.jpeg; charset=binary
mv 111-1116_IMG 111-1116_IMG.jpeg; charset=binary
mv tllts_330-12-02-09 tllts_330-12-02-09.mpeg; charset=binary
mv tllts_331-12-09-09 tllts_331-12-09-09.mpeg; charset=binary

Some items are close however. The $f variable needs double quoting.
The results for the tllts pocasts are mpeg instead of mp3 because the description of mp3's are formal.

A multi-line sed or awk script would probably work out better.
file okvfi | sed '/JPEG image data/s/\(.*\):.*/mv "\1" "\1.jpg"/'
mv "okvfi" "okvfi.jpg"
file tllts_330-12-02-09 | sed '/MPEG ADTS, layer III/s/\(^[^:]*\):.*$/mv "\1" "\1.mp3"/'
mv "tllts_330-12-02-09" "tllts_330-12-02-09.mp3"

Here I found a mistake in my first line. The description could contain a ":", so I used "\(^[^:]*\):" to match the filename.

So the sed script would look like:
Code:
#n
/JPEG image data/s/\(.*\):.*$/mv "\1" "\1.jpg"/p
/ISO Media, MPEG v4/s/\(^[^:]*\):.*$/mv "\1" "\1.mp4"/p
/MPEG ADTS, layer III/s/\(^[^:]*\):.*$/mv "\1" "\1.mp3"/p

...

Create a line for each mime type and test it. Then add the sed command to a sed script.
Finally create a script and inspect it. I'd also recommend moving renamed files to another directory instead of simply renaming them.

/JPEG image data/s/\(.*\):.*/mv "\1" images\/"\1.jpg"/'

Start with a list of files without extensions:

I would first generate a list of files and then run the sed script on it.

find . -type f -not -name "*.*" -print0 | xargs -0 file >descriptions
sed -f rename.sed descriptions >add_extensions.sh

this generates a script like this:
Code:
mv "./photo_2a" "./photo_2a.jpg"
mv "./okatx" "./okatx.jpg"
mv "./Toto - Africa" "./Toto - Africa.mp4"
mv "./okvfi" "./okvfi.jpg"
...

Last edited by jschiwal; 01-21-2010 at 05:47 AM.
 
1 members found this post helpful.
Old 01-21-2010, 06:50 AM   #4
Agrouf
Senior Member
 
Registered: Sep 2005
Location: France
Distribution: LFS
Posts: 1,596

Rep: Reputation: 80
Quote:
Originally Posted by jschiwal View Post
Agrouf: Did you test your script?
I tested on RHEL 5.4, with file 4.17. It does not output the charset.
I posted the script without thinking too much if it always works or not though. I expect people to test it before using it. I should put that in my signature.
Quote:
I always use "echo" before a command like mv to make sure it works ok.
Very good idea.

Never ever trust a script you find on a forum on the internet. If you don't understand it, don't use it and if you do understand it, test it before using it for real. In any case, always use your brain when dealing with your data. The brain of anonymous people on the internet is not a substitute for your own, never.

Last edited by Agrouf; 01-21-2010 at 06:54 AM.
 
1 members found this post helpful.
Old 01-21-2010, 06:57 AM   #5
neonsignal
Senior Member
 
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Bookworm (Fluxbox WM)
Posts: 1,391
Blog Entries: 54

Rep: Reputation: 360Reputation: 360Reputation: 360Reputation: 360
Unholy one liner.

Code:
find . -type f -regex '.*/[^.]*' -exec sh -c \
'mv "{}" "{}"$(grep $(file -i -b "{}" | grep -o "^[^ ]*") /usr/share/mime/globs | tail -1 | grep -o "[^*]*$")' \;
For each file in the directory or subdirectories that doesn't have a period in the name (find . -type f -regex '.*/[^.]*'), this performs a move command (-exec sh -c 'mv "{}" "{}"ext' \;).

The script for the extension works by grabbing the mime type (file -i -b), stripping any trailing stuff after the space, and looking up the last entry in the globs database to find the extension.

Yes I know, it would all go better with Sed/Awk/Perl.

Last edited by neonsignal; 01-21-2010 at 07:09 AM.
 
1 members found this post helpful.
Old 01-21-2010, 08:33 AM   #6
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
neonsignal:

Code:
mv ./photo_2a ./photo_2a
mv ./okatx ./okatx
mv ./Toto - Africa ./Toto - Africa
mv ./okvfi ./okvfi
mv ./hpr0426 ./hpr0426
mv ./111-1116_IMG ./111-1116_IMG
mv ./photo_7a ./photo_7a
...
I did look at the mime type output, but saw a lot of overlap. 451 out of 867 entries.
Perhaps using -i and separate awk or sed commands would work OK.

Code:
video/mpeg:*.m2t
video/mpeg:*.mp2
video/mpeg:*.mpe
video/mpeg:*.mpeg
video/mpeg:*.mpg
video/mpeg:*.vob
 
1 members found this post helpful.
Old 01-21-2010, 09:48 AM   #7
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
neonsignal inspired me to generate sed commands from the glob file.

Run it like:
find . -not -name "*.*" -printf "%f\n" | tr '\n' '\0' | xargs -0 file -i | sed -f addext.sed >add_extensions.sh

Comment out extentions you don't want (in addext.sed) such as "jpe" and "jpeg" if you want "jpg" instead.
Reading the generated shell script will cue you in on what you may want to comment out. Look for that extension. What you want to comment is probably right above it.
Attached Files
File Type: txt addext.sed.txt (51.7 KB, 69 views)

Last edited by jschiwal; 01-21-2010 at 09:58 AM.
 
1 members found this post helpful.
Old 01-23-2010, 02:23 AM   #8
SilversleevesX
Member
 
Registered: May 2009
Posts: 181

Original Poster
Blog Entries: 9

Rep: Reputation: 15
some code from early on the 21st worked, almost.

Thanks to you all for putting in the effort to get some kind of script puzzled out to help me with this.

Using jschiwal's short list of MIME-types for my rename.sed file, I executed the find command he suggested, thus:

find . -type f -not -name "*.*" -print0 | xargs -0 file >descriptions

...which gave me a descriptions file that included one line in the list that looked like this:

Code:
./ex----e-31-51407-0:  JPEG image data, JFIF standard 1.01, comment: "CREATOR: gd-jpeg v1.0 (using IJ"
...which my add_extensions.sh script could not act upon, as there was too much information to rename it with a file extension. The generated add_extensions.sh script (since corrected) had as its corresponding line one that ended with the extra "COMMENT:" info instead of a discrete name ending in ".jpg" mv gave an error saying it had no destination (or new name) to apply to the file.

It looks as though, for a few mime-types, some more precise string isolation (sed? awk?) will be necessary. When I do file -i on the command-line, it returns data separated by a semicolon and one space

>> file -i foo.jpg
foo.jpg: image/jpeg; charset=binary

...yet in the descriptions file, the format was different, but only in terms of the punctuation (CLI returned with semicolon; lines in description file used commas,). Fair to call these delimiters? In any case, whichever is easier to parse, the failure-proof way to go about it would be to either terminate the "file-i" return after the first entry (either JPEG image data -or- image/jpeg in my case) and pass that much on to the add_extensions.sh script in the form of a mv command. Can a print0 | xargs -o pairing "cut it any finer" than in jschiwal's code, or would we have to go with something else?

Again, thanks for all the help.

BZT
 
Old 01-26-2010, 09:56 PM   #9
SilversleevesX
Member
 
Registered: May 2009
Posts: 181

Original Poster
Blog Entries: 9

Rep: Reputation: 15
Reading further (and downloading along with!)...

...I notice that the snag described in my last post is overcome with jschiwal's very latest code (post from 21 Jan @ 10:48AM) as well as his superbly complete addext.sed file. Many thanks for both.

But why "comment out" file extensions you don't want to use; why not simply delete them from the sed file altogether? of course, you'll want to cp a backup or "twin" the original into another directory. If all else fails, and you lose track of said backups, log on to LQ Forum and dl another copy.

But while I'm on the subject: would one use # marks as with most "commenting out" in shell scripts and other files (.bash_profile comes to mind unbidden), or something else?

BZT

Last edited by SilversleevesX; 01-26-2010 at 09:57 PM. Reason: Bad grammar in first paragraph.
 
Old 01-28-2010, 12:30 AM   #10
SilversleevesX
Member
 
Registered: May 2009
Posts: 181

Original Poster
Blog Entries: 9

Rep: Reputation: 15
Some JPEG2000 files may be a problem.

I observed there are two different descriptions returned for them from the file command for JPeg2000 files created by XnView.

file grut.jp2 returns
Code:
JPEG-2000 Code Stream Bitmap data
...while file -i grut.jp2 gives back
Code:
application/octet-stream; charset=binary
As the command from jschiwal's post (late on the 21st) includes the latter usage of file, and that "application/octet-stream," in addext.sed is tied to the extension .bin, this means that JPEG2000s will end up with that extension, instead of the correct one for that file type.

Again, to this point, I've only observed this with files saved as JPEG2000 in XnView. I'll probably come back to comment or edit this post when I've been able to see what other graphics apps' JP2s "look like" on the command line. If I don't, one can safely assume that JPEG2000s may be a problem, and should be double-checked by some means or method outside the scripts and commands we've been puzzling out in this thread.

BZT
 
Old 02-28-2010, 02:34 PM   #11
SilversleevesX
Member
 
Registered: May 2009
Posts: 181

Original Poster
Blog Entries: 9

Rep: Reputation: 15
Reviving the thread to ask a few questions.

I got something of a positive answer to the question implied (inferred? never got those right) in my previous post here.


Quote:
Originally Posted by SilversleevesX View Post
I observed there are two different descriptions returned for them from the file command for JPeg2000 files created by XnView.

file grut.jp2 returns
Code:
JPEG-2000 Code Stream Bitmap data
...while file -i grut.jp2 gives back
Code:
application/octet-stream; charset=binary
<snip>

BZT
To quote (and save you a jump off-forum):
Quote:
Originally Posted by MItaly, IrfanView Forums
The two different outputs for me are due to the fact that the -i switch makes file output a mime string, that should be compatible with most clients; since JPEG2000 isn't widely supported, it's passed simply as a generic octet stream.
Which puts one of the script versions in a somewhat awkward place, since when I look for an entry (in my slightly-trimmed copy of jschiwal's excellent addext.sed) for /application/octet-stream/ , I see that it's matched with the extension .bin. I remember some mention was made of the fact that several MIME types returned by file -i commands returned identical strings, and it's likely this was one of those times.

For myself it's no problem. I generally don't use cUrl anywhere that I'm all too likely to find .bin files (or even .jp2s for that matter). However, I conceived of this script as one for a greater number of users than just myself, so I wanted to know:
  • How likely is it, or how often does it happen, that other L/Unix users run into ".bin" files that aren't already clearly marked as such in an archive or installer?
The less likely or less often, the better, to be blunt about it. This might leave the field open, so to say, to change that MIME entry in addext.sed permanently to ".jp2".

All right, one obstacle out of the way. Now to tackle the annoying libgd comment tag.

BZT
 
Old 02-28-2010, 03:18 PM   #12
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Quote:
Originally Posted by SilversleevesX View Post
  • How likely is it, or how often does it happen, that other L/Unix users run into ".bin" files that aren't already clearly marked as such in an archive or installer?
The less likely or less often, the better, to be blunt about it. This might leave the field open, so to say, to change that MIME entry in addext.sed permanently to ".jp2".
For what it's worth: I don't have a single JPEG2000 file
among my downloads, but *.deb., *.exe., *.mdi and varied
other data files including passwordsafe data match the
octet-binary combo.


I think you should be taking a two-step approach, tackle
files that have clear matches using the 'file -i' method,
and fall-back to 'file' where octet-binary is reported by
the first.



Cheers,
Tink
 
1 members found this post helpful.
Old 02-28-2010, 03:31 PM   #13
SilversleevesX
Member
 
Registered: May 2009
Posts: 181

Original Poster
Blog Entries: 9

Rep: Reputation: 15
Found: a way to 'ignore' all comment tags

Quote:
Originally Posted by SilversleevesX View Post
Thanks to you all for putting in the effort to get some kind of script puzzled out to help me with this.

Using jschiwal's short list of MIME-types for my rename.sed file, I executed the find command he suggested, thus:

find . -type f -not -name "*.*" -print0 | xargs -0 file >descriptions

...which gave me a descriptions file that included one line in the list that looked like this:

Code:
./ex----e-31-51407-0:  JPEG image data, JFIF standard 1.01, comment: "CREATOR: gd-jpeg v1.0 (using IJ"
...which my add_extensions.sh script could not act upon, as there was too much information to rename it with a file extension. The generated add_extensions.sh script (since corrected) had as its corresponding line one that ended with the extra "COMMENT:" info instead of a discrete name ending in ".jpg" mv gave an error saying it had no destination (or new name) to apply to the file.
That was a puzzler, indeed.


Quote:
Originally Posted by SilversleevesX View Post
It looks as though, for a few mime-types, some more precise string isolation (sed? awk?) will be necessary.
Neither, as it turns out. file -b and cut -d, -f1 work just as well. And this also eliminates, to some degree, the problem of duplication of mime types turned out by file -i. Except for such things as previous "descriptions" files, so far file -b gives a nice description of every file type I've tested it on. Chops to the folks who keep magic.mgc current!

So anyway, I tried file -b and cut -d, -f1 on the script that went:
Code:
find . -type f -not -name "*.*" -print0 | xargs -0 file -b | cut -d, -f1>descriptions
sed -f rename.sed descriptions >add_extensions.sh
and ended up with a four-line descriptions file, but an empty add_extensions.sh file.

Maybe the rename.sed file, edited again to look like this...
Code:
#n
/JPEG image data/s/\(.*\):.*$/mv "\1" "\1.jpg"/p
/JPEG-2000 Code Stream Bitmap data/s/\(.*\):.*$/mv "\1" "\1.jp2"/p
/PNG image/s/\(.*\):.*$/mv "\1" "\1.png"/p
/TIFF image data/s/\(.*\):.*$/mv "\1" "\1.tif"/p
/ISO Media, MPEG v4/s/\(^[^:]*\):.*$/mv "\1" "\1.mp4"/p
/MPEG ADTS, layer III/s/\(^[^:]*\):.*$/mv "\1" "\1.mp3"/p
...needs a bit of tweaking?

BZT

Last edited by SilversleevesX; 03-01-2010 at 06:31 AM. Reason: Forgot to include recent sed file edit.
 
Old 03-01-2010, 07:42 AM   #14
SilversleevesX
Member
 
Registered: May 2009
Posts: 181

Original Poster
Blog Entries: 9

Rep: Reputation: 15
Still can't get anything in addextensions.sh!

I thought it might have to do with the filename being missing. The sed file obviously needs an original name (w/o extension) for every file it finds, or else the 'mv' command prescribed in rename.sed doesn't know which file to work on.

I also wondered whether or not it had to do with the length of the string the script was processing the descriptions file by the rename.sed script's rules.

So I went back to my previous idea of "fine-tuning" a vanilla "file" command. The following is what I came up with to "trim down" its output to strictly something the script could use without getting tangled by misplaced (read: libgd and some websites') comment headers in JPEG files.
Code:
OLDIFS=$IFS
IFS=$'\ \t\n'
typon=$(file -F : hv7918-861)
nameit=$(echo -ne $typon | cut -d: -f1)
relevant=$(echo -ne $typon | cut -d: -f2)
crucial=$(echo -ne $relevant | cut -d, -f1)
forwardpass=$(echo -ne "$nameit: $crucial")
echo "$forwardpass is ${#forwardpass} characters long."
IFS=$OLDIFS
The variable forwardpass is so named because I think this is as much data as the sed script needs to create commands in add_extensions.sh. In other words, what to pass forward to make that shell script.

Somewhere this code breaks down. I'm pretty sure I'm on the right track, but there's no doubt something's catching me up.

Advice, please?

BZT
 
Old 03-01-2010, 03:23 PM   #15
SilversleevesX
Member
 
Registered: May 2009
Posts: 181

Original Poster
Blog Entries: 9

Rep: Reputation: 15
Worked out the bugs, broke away from "find"

The working and final (I hope) script is a touch slower by reason of not using the find command, but by moving to a for-do-done loop, I was able to introduce the variables from that "play" script in my previous post. The immediate result was a descriptions file that contained the filename and the next two fields returned by find, formatted identically to what was produced by previous versions of the script. The other result was an add_extensions.sh file that actually had commands in it.

To close out this thread, I'm pasting in the contents of my edited rename.sed file and attaching the final script as a text file.

Code:
#n
/JPEG image data/s/\(.*\):.*$/mv "\1" "\1.jpg"/p
/JPEG-2000 Code Stream Bitmap data/s/\(.*\):.*$/mv "\1" "\1.jp2"/p
/PNG image/s/\(.*\):.*$/mv "\1" "\1.png"/p
/TIFF image data/s/\(.*\):.*$/mv "\1" "\1.tif"/p
/ISO Media, MPEG v4/s/\(^[^:]*\):.*$/mv "\1" "\1.mp4"/p
/MPEG ADTS, layer III/s/\(^[^:]*\):.*$/mv "\1" "\1.mp3"/p
Thanks to all the members who helped me get this to a point of (if not at then very near) completion.

BZT
Attached Files
File Type: txt extension-add.sh.txt (521 Bytes, 37 views)

Last edited by SilversleevesX; 03-01-2010 at 03:30 PM. Reason: not not = not
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trouble with making a bash script to read in different files and rename output files. rystke Linux - Software 1 05-07-2009 08:00 AM
To rename files in a directory should I use Bash script or a Perl Script ? jamtech Programming 7 01-22-2008 11:25 PM
Rename files with script sharathkv25 Programming 14 06-25-2007 03:00 AM
Script to Rename Many Files geeman2.0 Programming 3 04-05-2006 01:45 PM
Need script to rename files joe_stevensen Programming 5 12-05-2003 06:12 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:22 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration