LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-09-2005, 09:27 AM   #1
CoolAJ86
Member
 
Registered: Jan 2004
Location: VT, USA
Distribution: Gentoo, Ubuntu - t3h 1337 & the easy, respectively
Posts: 125

Rep: Reputation: 15
Unhappy bash and filenames with special characters


I am trying to write a script that organizes and sorts photos from my digital camera into folders by date - totally destroying the current random folder structure.

The problem is that for any folder that has a space or special character in the name, it chokes up and more or less skips over the files.

I think this is related to the way 'for' will use a space as a separator. I've tried doing some funky stuff with 'while', but it still screws up.

Can someone tell me how to change this so that each new line from the `find` command will be interpereted as the entire string and not broken down further by subsequent commands? Perhaps a way to escape the strings (tried using sed 's/\ /\\ /g' to no avail)? I've done my homework on this, but I'm not finding a solution.

Help, Please, and Thank you!


Also: Am I reinventing the wheel to rename my pics? Or if not then is bash a rational choice for this task? Or would I be better suited to take some quick lessons on a more advanced language?

Code:
#!/bin/bash
# Copyright 2004 Alvin A ONeal Jr
# GPL'ed
# USAGE: picdir.sh /path/to/pics/ /new/dir/

# This script will move all jpgs from one folder 
# to another whilst sorting and renaming them by timestamp
# checksums determine whether dups are actually dups
# *** dups will be overwritten ***
# needs BASH FIND GREP CUT FILE JHEAD 

# FLAWS: Doesn't like relative paths
# Need to escape SPACES and other SPECIAL CHARACTERS!


COUNT=0
# need to translate relative paths to absolutes before this will work.
# also consider quoted path values... messy messy messy
PATH_OLD=${1}
# PATH_OLD="$HOME"
echo "line 21: Pictures will be gathered from '$PATH_OLD'"

#if [ -e "${2}" ]; then
#	PATH_NEW="${2}"
#else
	PATH_NOW="$HOME/Pictures/Life/"
#fi
echo "line 28: Pictures will be placed in '${PATH_NOW}'" ###


# http://www.issociate.de/board/post/1...t_in_find.html
# find $PATH_OLD -type f -print | while read -r PICTURE
for PICTURE in `find ${PATH_OLD} -type f` # MEMORY HOG!!
do
	PATH_NEW=${PATH_NOW}
	HAS_EXIF=`file ${PICTURE} | grep JPEG | grep EXIF`
	if [ -n "${HAS_EXIF}" ]; then
		echo "line 38: '${PICTURE}' has EXIF data" ###
		TIMESTAMP=`jhead $PICTURE 2>1 | grep 'Date/Time' | cut -d':' -f2-6`
		# need something to skip a file if it causes jhead error...
		
		# Deciding path
		DATE=`echo ${TIMESTAMP} | cut -d' ' -f1`
		YEAR=`echo ${DATE} | cut -d':' -f1`
		MONTH=`echo ${DATE} | cut -d':' -f2`
		DAY=`echo ${DATE} | cut -d':' -f3`
		PATH_NEW="${PATH_NEW}${YEAR}/${MONTH}/${DAY}/"
		mkdir -p ${PATH_NEW}
		# Deciding filename
		TIME=`echo ${TIMESTAMP} | cut -d' ' -f2`
		HOUR=`echo ${TIME} | cut -d':' -f1`
		MINUTE=`echo ${TIME} | cut -d':' -f2`
		SECOND=`echo ${TIME} | cut -d':' -f3`
		PATH_NEW="${PATH_NEW}${HOUR}${MINUTE}${SECOND}.jpg"
		# Complete path
		echo "line 56: ${PATH_NEW}" ###
		if [ -f "${PATH_NEW}" ]; then
			echo "line 58: Name exists, checking... maybe dup?" ###
			if [ ! "${PATH_NEW}" = "${PICTURE}" ]; then
				# relative paths make this not work
				echo "line 61: It isn't itself" ###
				SUM_ORIG=`/usr/bin/md5sum ${PICTURE} | cut -d' ' -f1`
				SUM_NEW=`/usr/bin/md5sum ${PATH_NEW} | cut -d' ' -f1`
				if [ ! "${SUM_ORIG}" = "${SUM_NEW}" ]; then
					# These pictures are not the same
					i=0
					PATH_NEWER="d${i}-${PATH_NEW}"
					until [ ! -f ${PATH_NEWER} ]; do
						(( i++ ))
						PATH_NEWER="d${i}-${PATH_NEW}"
					done
					echo "${PATH_NEW} exists, appending 'd${i}-' to name."
					PATH_NEW="${PATH_NEWER}"
					echo "It's now ${PATH_NEW}?"
				fi
				# 3) dups deleted, non-dups renamed
			fi
			# 2) It wasn't the same file (might be duplicates)
		# 1) A file of that name existed (might be itself).
		# 0) All that settled, should be safe to move the filei
		fi
		echo "Gonna move that pic..." ###
		mv -i ${PICTURE} ${PATH_NEW}
	fi
	# doesn't have EXIF, not bothering...
	(( COUNT++ ))
echo $COUNT
done
echo "Mucked around with ${COUNT} files successfully!"

Last edited by CoolAJ86; 03-09-2005 at 09:36 AM.
 
Old 03-09-2005, 10:25 AM   #2
TheLinuxDuck
Member
 
Registered: Sep 2002
Location: Tulsa, OK
Distribution: Slack, baby!
Posts: 349

Rep: Reputation: 33
(=

I smile because I find this type of issue really annoying, and for a long time had no clue how to fix it, either. However, there is some really good news for you. (=

The secret is the bash variable IFS, or the Internal Field Separator. This bash variable is what determines how bash splits word boundaries. It's default is to match a tab, a space, or a newline.

You can easily change this variable before the for loop in question to only match tabs and newlines/carriage returns, and this will cause the filenames with spaces to remain intact.

One word of caution is this:

If you're doing other field splitting, make sure to revert this value to it's original state, because you may encounter split problems due to the fact that it no longer contains a space.

Consider that the dir /tmp/IFS contains the following files:
Code:
-rw-r--r--  1 root root   0 2005-03-09 10:30 has\ onespace.txt
-rw-r--r--  1 root root   0 2005-03-09 10:30 nospaces.txt
-rw-r--r--  1 root root   0 2005-03-09 10:30 other\ file.rgf
-rw-r--r--  1 root root   0 2005-03-09 10:30 otherfile.rgf
The follow code shows how to use 'find' to display them correctly in a for loop, and then with IFS reverted back to it's original state
Code:
#!/bin/bash

#  store original value, and set to catch tab(9), newline(A), and CR(D)
#
IFScopy=$IFS
IFS=$'\x09'$'\x0A'$'\x0D'
echo "IFS FIX"
for i in `find /tmp/IFS/ -type f`; do
  echo "$i"
done

#  revert
#
IFS=$IFScopy
echo "IFS REVERT"
#  now other for loops will work as before
#
for i in `find /tmp/IFS/ -type f`; do
  echo $i
done
And the output:
Code:
/usr/sbin> q.sh
IFS FIX
/tmp/IFS/other file.rgf
/tmp/IFS/nospaces.txt
/tmp/IFS/otherfile.rgf
/tmp/IFS/has onespace.txt
IFS REVERT
/tmp/IFS/other
file.rgf
/tmp/IFS/nospaces.txt
/tmp/IFS/otherfile.rgf
/tmp/IFS/has
onespace.txt
Happy bashing!
 
Old 03-09-2005, 02:50 PM   #3
CoolAJ86
Member
 
Registered: Jan 2004
Location: VT, USA
Distribution: Gentoo, Ubuntu - t3h 1337 & the easy, respectively
Posts: 125

Original Poster
Rep: Reputation: 15
Thanks so much! This will most certainly be handy in the future!

However, in wait for a reply, I found that the way I was going about it was completely in disregard to find's built in -exec function. So I rewrote the code to use that... did a little recursing... tweeked a bit. Works flawlessly, AFAIK.

Code:
#!/bin/bash

# /usr/local/bin/picdir.sh
# Picdir v0.9 rc1

# Copyleft 2005 Alvin A ONeal Jr - This software is OpenSource
# 
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

# Guaranteed to be my personal best!
# ...not guaranteed to work...


# What will this do?
# Search for all JPEG type files within a directory containing EXIF data
# Use the timestamp in the EXIF data to relocate and rename the file
# Using checksums, determine and remove duplicate files
# Uniquely name files which have the same timestamp but different data

# needs BASH FIND GREP CUT FILE JHEAD MD5SUM MV

if [ ! -n "${1}" ]; then
	echo "USAGE: ${0} /my/pictures/unsorted/ [/my/pictures/sorted/]"
	echo "The trailing '/' is kinda important... btw..."
	exit
fi
if [ ! -n "${2}" ]; then
	MOVETO="${HOME}/Pictures/"
else
	MOVETO=${2}
fi
# Call self recursively
if [ ! "${3}" = "RECUR" ]; then
	echo "Finding and organizing all pictures with EXIF timestamps..."
	find "${1}" -type f -exec "${0}" {} "${2}" "RECUR" \;
	echo "Done!"
else
	FILE="${1}"
	HAS_EXIF=$(file "${FILE}" | grep 'JPEG' | grep 'EXIF') # Is it better this way?
	if [ -n "${HAS_EXIF}" ]; then
		TIMESTAMP=`jhead "${FILE}" 2>1 | grep 'Date/Time' | cut -d':' -f2-6` # Or this way?
		if [ ! -n "${TIMESTAMP}" ]; then
			# '${FILE}' has EXIF but no timestamp!"
			exit
		fi
		# Deciding path
		DATE=`echo ${TIMESTAMP} | cut -d' ' -f1`
		YEAR=`echo ${DATE} | cut -d':' -f1`
		MONTH=`echo ${DATE} | cut -d':' -f2`
		DAY=`echo ${DATE} | cut -d':' -f3`
		MOVETO="${MOVETO}${YEAR}/${MONTH}/${DAY}/"
		mkdir -p ${MOVETO}
		# Deciding filename
		TIME=`echo ${TIMESTAMP} | cut -d' ' -f2`
		HOUR=`echo ${TIME} | cut -d':' -f1`
		MINUTE=`echo ${TIME} | cut -d':' -f2`
		SECOND=`echo ${TIME} | cut -d':' -f3`
		NEWFILE="${HOUR}${MINUTE}${SECOND}.jpg"
		# Complete path
		ABSPATH="${MOVETO}${NEWFILE}"
		# Complete path
		
		if [ -f "${ABSPATH}" ]; then
		# 1) File exists with that name"
			
			if [ ! "${ABSPATH}" = "${FILE}" ]; then
			# 2) The file isn't itself ... yeah ... that makes sense"
			
				SUM0=`/usr/bin/md5sum "${FILE}" | cut -d' ' -f1`
				SUM1=`/usr/bin/md5sum "${ABSPATH}" | cut -d' ' -f1`
				if [ ! "${SUM0}" = "${SUM1}" ]; then
				# 3) It isn't a duplicate of the same picture"
				
					i=0
					ABSPATH_1="d${i}-${ABSPATH}"
					ABSPATH_1="${MOVETO}d${i}-${HOUR}${MINUTE}${SECOND}.jpg"
					until [ ! -f ${ABSPATH_1} ]; do
						(( i++ ))
						ABSPATH_1="${MOVETO}d${i}-${HOUR}${MINUTE}${SECOND}.jpg"
					done
	
					# Giving it a unique name 'd#-FILE'"
					
					echo "Appending 'd${i}-' to ${ABSPATH}: file exists."
					ABSPATH="${ABSPATH_1}"
				
				# 3) non-duplicate of same name was renamed"
				else
					echo "Checksum match: Duplicate file overwritten"
				fi
	
			fi
			# 2) if the file is itself, pass-along, mv will handle it"
		
		fi
		# 1) All that settled, should be safe to move the file
		if [ ! "${FILE}" = "${ABSPATH}" ]; then
			echo "Moving ${FILE} to ${ABSPATH}"
			mv "${FILE}" ${ABSPATH}
		fi
	fi
	#  0) If that even was a picture, it certainly didn't have EXIF data."
fi

# I tested this baby on my precious photos and it worked for me.
# Sorted about 700 photos from about 34,500 files total in a few minutes. :-D

Last edited by CoolAJ86; 03-09-2005 at 04:26 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Special Characters SimeonV SUSE / openSUSE 14 07-07-2006 01:29 PM
Comment out special signs in filenames Lobais Linux - Software 3 09-02-2005 11:17 AM
using special characters one_ro Mandriva 5 11-04-2004 08:52 AM
sftp with filenames and weird characters Nevion Linux - Software 1 05-28-2004 05:29 AM
smbclient and filenames with special characters cloro_x Linux - Newbie 0 07-30-2001 04:15 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:19 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration