LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-21-2012, 09:09 AM   #1
ne0shell
LQ Newbie
 
Registered: May 2010
Posts: 22

Rep: Reputation: 0
Need bash script to remove spaces and non alpha chars from folders/ files


I have a client who is uploading thousands of media files to be encoded for a media site but the files were provided with spaces and illegal / non alpha characters.

I need a bash script which will sub an underscore for spaces and remove the illegal characters from the files and folder names under a specific path. I can't find one that does both and my scripting ability is not up to this right now.

I'd be glad to pay for someone to help me out with this.
 
Old 06-21-2012, 09:56 AM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,577
Blog Entries: 31

Rep: Reputation: 1196Reputation: 1196Reputation: 1196Reputation: 1196Reputation: 1196Reputation: 1196Reputation: 1196Reputation: 1196Reputation: 1196
Does it have to be a bash script? bash is relatively slow at handling character strings and you have thousands of file names to process. That might make perl or awk a better choice.
 
Old 06-21-2012, 11:08 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,715

Rep: Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034
Also, your scripting might not be great but you should at least show that you have tried something?

Here are some links if you have none of your own:

http://tldp.org/LDP/abs/html/
http://mywiki.wooledge.org/TitleIndex
 
Old 06-21-2012, 11:16 AM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 740Reputation: 740Reputation: 740Reputation: 740Reputation: 740Reputation: 740Reputation: 740
The problem with paying someone is that the payee (contractor) becomes liable for the outcome. I don't do this kind of thing, but--if I did-- I would need access to all of the files, and I would need enough compensation to do some careful testing.

The problem actually seems pretty simple---eg to replace "white space" with "_":

Code:
newfilename = $(echo $filename | sed -r 's/[[:blank:]]+/_/g')
mv newfilename filename
to delete non alphanumeric characters (sed part only):
Code:
sed -r 's/[^[:alnum:]]//g'
###check my syntax on the negation###
do this before replacing white space with _

Last edited by pixellany; 06-21-2012 at 11:17 AM.
 
Old 06-21-2012, 01:05 PM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian + kde 4 / 5
Posts: 6,842

Rep: Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003Reputation: 2003
There have been innumerable threads here and on the web concerning the batch processing of filenames. You might try taking some time to run some searches and read about them.

In at least a few of them you'll find mention of some of the ready-made renaming applications that are also available to you, such as detox.
 
Old 06-22-2012, 06:10 AM   #6
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,577
Blog Entries: 31

Rep: Reputation: 1196Reputation: 1196Reputation: 1196Reputation: 1196Reputation: 1196Reputation: 1196Reputation: 1196Reputation: 1196Reputation: 1196
Quote:
Originally Posted by pixellany View Post
Code:
newfilename = $(echo $filename | sed -r 's/[[:blank:]]+/_/g')
mv newfilename filename
There are a couple of typos in that. There must be no space either side of the bash = assignment operator, the variables on the mv command need $ in front of them and file name variables need to be double quoted in case they contain word-delimiters.

For safety you might like to see what the mv commands are before running them.

Putting that all together:
Code:
newfilename=$(echo "$filename" | sed -r 's/[[:blank:]]+/_/g')
echo mv "$newfilename" "$filename"
 
Old 06-22-2012, 11:10 AM   #7
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947
Here is a very simple way to do it.

First, create a small script that fixes the file names specified as parameter(s). For example, save the following as /usr/local/bin/fixname:
Code:
#!/bin/bash
for ARG in "$@" ; do

    # Separate the directory part into DIR.
    DIR="${ARG%/*}"
    [ "$DIR" = "$ARG" ] && DIR="./"

    # Separate the file name part into OLD.
    OLD="${ARG##*/}"

    # Copy OLD to NEW, replacing everything except
    # - . 0-9 A-Z a-z with a single _
    # Also make sure NEW starts with 0-9 A-Z a-z
    NEW="${OLD//[^-0-9A-Za-z.]/_}"
    TMP=""
    while [ "$TMP" != "$NEW" ]; do
        TMP="$NEW"
        NEW="${NEW//__/_}"
        NEW="${NEW#[^0-9A-Za-z]}"
        NEW="${NEW%[^0-9A-Za-z]}"
    done

    # No fix necessary?
    [ "$OLD" = "$NEW" ] && continue

    # Is the new string empty?
    if [ -z "$NEW" ]; then
        printf '%s: Cannot fix file name.\n' "$ARG" >&2
        continue
    fi

    # Rename file, but ask before overwriting.
    mv -i "$DIR/$OLD" "$DIR/$NEW" || exit $?
done
Now you can trivially fix an entire directory tree using that and find:
Code:
export LANG=C LC_ALL=C
find DIR(s)... -depth -print0 | xargs -r0 /usr/local/bin/fixname
The export sets the locale to POSIX. We treat all non-ASCII characters as byte values anyway, so there is no harm. If you use an UTF-8 locale, a non-UTF-8 sequence in a file name would abort the script automatically. Using POSIX locale therefore is very useful here.

Note the -depth flag. It tells find to process all directory entries before processing the directory itself; depth first. That way we can rename any argument we get, without any future names being dependent on it. (In other words, /a/b/c will be processed before /a/b, and /a/b before /a.)

Note the -print0 and -0: they tell find and xargs to use NUL separators. That way you can handle all possible file names correctly. Newlines or non-ASCII characters will pose no difficulty.

Finally, the -r argument to xargs tells it to only run the /usr/local/bin/fixname command if there are arguments to give to it. Without it, xargs would run the command at least once.

The above script is untested, but it should work. If you find any issues, please report them here, and I'll try to fix the script. For testing, I recommend you first modify the script by adding echo before the mv command. That way it will not do any modifications, just output how it would modify the file names.

Last edited by Nominal Animal; 06-22-2012 at 11:13 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash - to remove files with spaces in name iwitham Programming 4 01-23-2012 12:52 PM
BASH script to move files/folders to either the .Trash file or to another direcotry bettsy583 Linux - Newbie 13 12-03-2010 03:41 AM
bash script to delete files / folders based on date and freespace nekawa Linux - Newbie 5 06-08-2009 09:00 PM
Bash script to remove capitalisation and spaces form a filename scuzzman Programming 11 05-18-2008 12:28 PM
bash script help - finding last n log files in all sub folders jeepescu Programming 4 11-03-2007 07:57 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:25 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration