Script to rename files according to filetype
I get a bit lazy using cUrl, often letting downloads go without adding the proper extension to images and whatnot. Yes I know about the -O flag, but I'm the kind who likes to give my files unique names right off the bat (until they puzzle out a way to "stamp" files with originating URLs across platforms, Classic MacOS Web-browser style, curl -o is the way to go).
So I was hoping someone could help me write a script (bash or python, doesn't matter which) that used the file command to look inside files in a single directory and then rename the ones it happened across that didn't have a 3- or 4-letter extension, appending the one that corresponded with the mime type. But this action only to be performed on files with no extension. I know Linux and Unix don't bother much with extensions. I started home computing on a Mac IIcx -- that kind of intuition is beyond natural for me and very much appreciated. But I, and a few of my friends and family, also use Windows where file extensions are an idiot-proof way, if nothing else good can be said about the two, to get file X to open in application Y and not Q, W or Z (or, worse for some, no application at all). I've seen a few scripts in different places, but they all seem to have the flaw of renaming files that already have an extension. Much obliged for any help/pointers in the right direction. BZT |
Code:
ls | grep -v "\....." | grep -v "\...." | while read f; do |
Agrouf: Did you test your script? I always use "echo" before a command like mv to make sure it works ok.
Code:
ls | grep -v "\....." | grep -v "\...." | while read f; do echo mv $f $f.$(file -i $f | sed "s/.*\///g"); done Some items are close however. The $f variable needs double quoting. The results for the tllts pocasts are mpeg instead of mp3 because the description of mp3's are formal. A multi-line sed or awk script would probably work out better. file okvfi | sed '/JPEG image data/s/\(.*\):.*/mv "\1" "\1.jpg"/' mv "okvfi" "okvfi.jpg" file tllts_330-12-02-09 | sed '/MPEG ADTS, layer III/s/\(^[^:]*\):.*$/mv "\1" "\1.mp3"/' mv "tllts_330-12-02-09" "tllts_330-12-02-09.mp3" Here I found a mistake in my first line. The description could contain a ":", so I used "\(^[^:]*\):" to match the filename. So the sed script would look like: Code:
#n Create a line for each mime type and test it. Then add the sed command to a sed script. Finally create a script and inspect it. I'd also recommend moving renamed files to another directory instead of simply renaming them. /JPEG image data/s/\(.*\):.*/mv "\1" images\/"\1.jpg"/' Start with a list of files without extensions: I would first generate a list of files and then run the sed script on it. find . -type f -not -name "*.*" -print0 | xargs -0 file >descriptions sed -f rename.sed descriptions >add_extensions.sh this generates a script like this: Code:
mv "./photo_2a" "./photo_2a.jpg" |
Quote:
I posted the script without thinking too much if it always works or not though. I expect people to test it before using it. I should put that in my signature. Quote:
Never ever trust a script you find on a forum on the internet. If you don't understand it, don't use it and if you do understand it, test it before using it for real. In any case, always use your brain when dealing with your data. The brain of anonymous people on the internet is not a substitute for your own, never. |
Unholy one liner.
Code:
find . -type f -regex '.*/[^.]*' -exec sh -c \ The script for the extension works by grabbing the mime type (file -i -b), stripping any trailing stuff after the space, and looking up the last entry in the globs database to find the extension. Yes I know, it would all go better with Sed/Awk/Perl. |
neonsignal:
Code:
mv ./photo_2a ./photo_2a Perhaps using -i and separate awk or sed commands would work OK. Code:
video/mpeg:*.m2t |
1 Attachment(s)
neonsignal inspired me to generate sed commands from the glob file.
Run it like: find . -not -name "*.*" -printf "%f\n" | tr '\n' '\0' | xargs -0 file -i | sed -f addext.sed >add_extensions.sh Comment out extentions you don't want (in addext.sed) such as "jpe" and "jpeg" if you want "jpg" instead. Reading the generated shell script will cue you in on what you may want to comment out. Look for that extension. What you want to comment is probably right above it. |
some code from early on the 21st worked, almost.
Thanks to you all for putting in the effort to get some kind of script puzzled out to help me with this.
Using jschiwal's short list of MIME-types for my rename.sed file, I executed the find command he suggested, thus: find . -type f -not -name "*.*" -print0 | xargs -0 file >descriptions ...which gave me a descriptions file that included one line in the list that looked like this: Code:
./ex----e-31-51407-0: JPEG image data, JFIF standard 1.01, comment: "CREATOR: gd-jpeg v1.0 (using IJ" It looks as though, for a few mime-types, some more precise string isolation (sed? awk?) will be necessary. When I do file -i on the command-line, it returns data separated by a semicolon and one space >> file -i foo.jpg foo.jpg: image/jpeg; charset=binary ...yet in the descriptions file, the format was different, but only in terms of the punctuation (CLI returned with semicolon; lines in description file used commas,). Fair to call these delimiters? In any case, whichever is easier to parse, the failure-proof way to go about it would be to either terminate the "file-i" return after the first entry (either JPEG image data -or- image/jpeg in my case) and pass that much on to the add_extensions.sh script in the form of a mv command. Can a print0 | xargs -o pairing "cut it any finer" than in jschiwal's code, or would we have to go with something else? Again, thanks for all the help. BZT |
Reading further (and downloading along with!)...
...I notice that the snag described in my last post is overcome with jschiwal's very latest code (post from 21 Jan @ 10:48AM) as well as his superbly complete addext.sed file. Many thanks for both.
But why "comment out" file extensions you don't want to use; why not simply delete them from the sed file altogether? of course, you'll want to cp a backup or "twin" the original into another directory. If all else fails, and you lose track of said backups, log on to LQ Forum and dl another copy. But while I'm on the subject: would one use # marks as with most "commenting out" in shell scripts and other files (.bash_profile comes to mind unbidden), or something else? BZT |
Some JPEG2000 files may be a problem.
I observed there are two different descriptions returned for them from the file command for JPeg2000 files created by XnView.
file grut.jp2 returns Code:
JPEG-2000 Code Stream Bitmap data Code:
application/octet-stream; charset=binary Again, to this point, I've only observed this with files saved as JPEG2000 in XnView. I'll probably come back to comment or edit this post when I've been able to see what other graphics apps' JP2s "look like" on the command line. If I don't, one can safely assume that JPEG2000s may be a problem, and should be double-checked by some means or method outside the scripts and commands we've been puzzling out in this thread. BZT |
Reviving the thread to ask a few questions.
I got something of a positive answer to the question implied (inferred? never got those right) in my previous post here.
Quote:
Quote:
For myself it's no problem. I generally don't use cUrl anywhere that I'm all too likely to find .bin files (or even .jp2s for that matter). However, I conceived of this script as one for a greater number of users than just myself, so I wanted to know:
All right, one obstacle out of the way. Now to tackle the annoying libgd comment tag. BZT |
Quote:
among my downloads, but *.deb., *.exe., *.mdi and varied other data files including passwordsafe data match the octet-binary combo. I think you should be taking a two-step approach, tackle files that have clear matches using the 'file -i' method, and fall-back to 'file' where octet-binary is reported by the first. Cheers, Tink |
Found: a way to 'ignore' all comment tags
Quote:
Quote:
So anyway, I tried file -b and cut -d, -f1 on the script that went: Code:
find . -type f -not -name "*.*" -print0 | xargs -0 file -b | cut -d, -f1>descriptions Maybe the rename.sed file, edited again to look like this... Code:
#n BZT |
Still can't get anything in addextensions.sh!
I thought it might have to do with the filename being missing. The sed file obviously needs an original name (w/o extension) for every file it finds, or else the 'mv' command prescribed in rename.sed doesn't know which file to work on.
I also wondered whether or not it had to do with the length of the string the script was processing the descriptions file by the rename.sed script's rules. So I went back to my previous idea of "fine-tuning" a vanilla "file" command. The following is what I came up with to "trim down" its output to strictly something the script could use without getting tangled by misplaced (read: libgd and some websites') comment headers in JPEG files. Code:
OLDIFS=$IFS Somewhere this code breaks down. I'm pretty sure I'm on the right track, but there's no doubt something's catching me up. Advice, please? BZT |
Worked out the bugs, broke away from "find"
1 Attachment(s)
The working and final (I hope) script is a touch slower by reason of not using the find command, but by moving to a for-do-done loop, I was able to introduce the variables from that "play" script in my previous post. The immediate result was a descriptions file that contained the filename and the next two fields returned by find, formatted identically to what was produced by previous versions of the script. The other result was an add_extensions.sh file that actually had commands in it.
To close out this thread, I'm pasting in the contents of my edited rename.sed file and attaching the final script as a text file. Code:
#n BZT |
All times are GMT -5. The time now is 08:27 PM. |