LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   looking for a method of stripping special characters from filenames (http://www.linuxquestions.org/questions/linux-general-1/looking-for-a-method-of-stripping-special-characters-from-filenames-4175444868/)

Dweeb2010 01-09-2013 06:07 PM

looking for a method of stripping special characters from filenames
 
Hi. So I've run into the ages old issue of special characters in filenames. Specifically, in my music collection, there are several files I have which I have ripped on older Windows computers, etc, which have things like quotes and question marks in the filenames (files auto-named by whatever ripping software I was using at the time, based on the cddb lookups). This problem wreaks havoc when I attempt to copy albums or songs over to a portable player, or to other disks, or even just opening the file on gnu/linux players, because the files or directories containing the special characters are just skipped with an error. It can be extremely annoying when I try to load an entire album and then find out one song in the middle wasn't added to the playlist due to this. As stated above, files and directories containing special characters are also omitted whilst copying.

Since I have a fairly large music collection, I'd like to find a semi-automated way to just navigate through an entire directory and remove all special characters. I would write a script to do this, but last time I tried, I wound up making a mistake and renamed several files into nothing (based on a one-liner a friend gave me), losing the files. Is there a fairly uncomplicated way to do this? I tried searching around a bit, but didn't find anything specifically describing this. Thanks.

PTrenholme 01-09-2013 06:31 PM

Look at the tr command. (man tr for details.)

Something like this find ./ -type f -exec mv '{}' tr <args> '{}' ';' (where "<args>" are the tr specification you want) might work.

Note: the find command I suggest is from memory, and not checked. I'm unsure if the '{}' argument-substitution can be used twice.:scratch:

suicidaleggroll 01-09-2013 06:55 PM

There's probably a better way to do it, but I would just go through each special character one by one and write a short script to find any files/dirs with that character and remove it. Something like:

Code:

find dir -iname "* *" -print0 | while read -d $'\0' file; do echo mv "$file" "${file// /_}"; done
find dir -iname "*'*" -print0 | while read -d $'\0' file; do echo mv "$file" "${file//\'/}"; done
find dir -iname '*"*' -print0 | while read -d $'\0' file; do echo mv "$file" "${file//\"/}"; done
find dir -iname "*\?*" -print0 | while read -d $'\0' file; do echo mv "$file" "${file//\?/}"; done

Note that I left "echo"s in front of the mv commands so that you can verify the mv is doing what you think it's doing before actually running it. If you verify that everything looks like it will be renamed properly, then remove the echo in front of the mv and run it again to actually perform the renaming. An example on my system:

Code:

$ ls -l dir/
total 0
-rw-rw-r-- 1 user user 0 Jan  9 16:14 evil"file1a
-rw-rw-r-- 1 user user 0 Jan  9 16:17 evil"file1b
-rw-rw-r-- 1 user user 0 Jan  9 16:14 evil'file2a
-rw-rw-r-- 1 user user 0 Jan  9 16:17 evil'file2b
-rw-rw-r-- 1 user user 0 Jan  9 16:14 evilfile3a?
-rw-rw-r-- 1 user user 0 Jan  9 16:17 evilfile3b?
-rw-rw-r-- 1 user user 0 Jan  9 16:14 evil file 4a
-rw-rw-r-- 1 user user 0 Jan  9 16:17 evil file 4b
$
$ cat fix
#!/bin/bash

find dir -iname "* *" -print0 | while read -d $'\0' file; do echo mv "$file" "${file// /_}"; done
find dir -iname "*'*" -print0 | while read -d $'\0' file; do echo mv "$file" "${file//\'/}"; done
find dir -iname '*"*' -print0 | while read -d $'\0' file; do echo mv "$file" "${file//\"/}"; done
find dir -iname "*\?*" -print0 | while read -d $'\0' file; do echo mv "$file" "${file//\?/}"; done
$
$  ./fix
mv dir/evil file 4a dir/evil_file_4a
mv dir/evil file 4b dir/evil_file_4b
mv dir/evil'file2b dir/evilfile2b
mv dir/evil'file2a dir/evilfile2a
mv dir/evil"file1a dir/evilfile1a
mv dir/evil"file1b dir/evilfile1b
mv dir/evilfile3b? dir/evilfile3b
mv dir/evilfile3a? dir/evilfile3a

Then after removing the echos and running it again:
Code:

$ ls -l dir/
total 0
-rw-rw-r-- 1 user user 0 Jan  9 16:14 evilfile1a
-rw-rw-r-- 1 user user 0 Jan  9 16:17 evilfile1b
-rw-rw-r-- 1 user user 0 Jan  9 16:14 evilfile2a
-rw-rw-r-- 1 user user 0 Jan  9 16:17 evilfile2b
-rw-rw-r-- 1 user user 0 Jan  9 16:14 evilfile3a
-rw-rw-r-- 1 user user 0 Jan  9 16:17 evilfile3b
-rw-rw-r-- 1 user user 0 Jan  9 16:14 evil_file_4a
-rw-rw-r-- 1 user user 0 Jan  9 16:17 evil_file_4b


codergeek 01-10-2013 06:23 PM

You can do a for loop script using sed. I created two filenames with special characters for this example
Code:

ls -1
a_ weird ^? filename.mp3
This is #$ a _ test file

Code:

for i in *; do  echo mv "$i" "$(echo $i | sed 's/[!@#\$%^&*()?_]//g' | tr -s " ")"; done
mv a_ weird ^? filename.mp3 a weird filename.mp3
mv This is #$ a _ test file This is a test file

The original filename is in black and the new filename is in red

Whatever character(s) you want omitted, just insert it between the brackets i.e in bold

It is good to use echo to preview the results before committing the actual conversion. If satisfied with the preview, then remove the echo in blue.

This is optional. If you want to capitalize the first character in each word, then add an extra sed statement.

Code:

for i in *; do  echo mv "$i" "$(echo $i | sed 's/[!@#\$%^&*()_?]//g;s/^.\| [Aa-Zz]/\U&/g' | tr -s " ")"; done
mv a_ weird ^? filename.mp3 A Weird Filename.mp3
mv This is #$ a _ test file This Is A Test File

The new filenames have each word capitalized and weird characters removed.
The new sed statement is in bold

Hope this helps. Remember keep the echo part to preview the results. Then remove it to make the real changes.


All times are GMT -5. The time now is 08:14 AM.