LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-10-2013, 04:06 PM   #1
kayasaman
Member
 
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443

Rep: Reputation: 32
Using rm in Bash shell script on files/folders with spaces in for loop array


Hi,

I've written a shell script to compare two directories and remove duplicates from the initial directory or dir1.

Everything is working fine however, I'm unable to use the rm command as the files and folders contained in the directories have spaces and other funky characters including ( ) etc...

Since the rm statement is contained in a for loop it seems to be bahving differently then if it where just a single shell line pass.

Here is the script:

Code:
#!/bin/bash

#Change to DIR1 and read list of files into fnames1.diff - sorting as column

cd /path/to/dir1
fnames1=( * )
echo " ${fnames1[@]/%/$' \n'}" > /tmp/fnames1.diff

#Change to DIR2 and read list of files into fnames2.diff - sorting as column

cd /patch/to/dir2
fnames2=( * )
echo " ${fnames2[@]/%/$' \n'}" > /tmp/fnames2.diff

#Find the differences between DIR1 and DIR2

fnames_diff=`diff -y --suppress-common-lines /tmp/fnames1.diff /tmp/fnames2.diff`

echo ""

#Output the differences with formatting

echo "These entries are not on destination DIR"
echo "----------------------------------------"
echo "$fnames_diff" | cut -d '>' -f 1 | cut -d '|' -f -1 | sort | awk 'NF'

echo ""

#Find the duplicate entries between DIR1 and DIR2

fnames_same=`sort /tmp/fnames1.diff /tmp/fnames2.diff|uniq -d`

#Output the duplicate entries between DIR1 and DIR2 

echo "These entries are duplicates and will be removed"
echo "------------------------------------------------"
echo "$fnames_same"

sleep 2

#Convert entries to array

ifs=$IFS; IFS=$'\n'; fnames_array=($fnames_same)

#Change back to DIR1 and remove duplicate items

cd /path/to/dir1
for i in "${fnames_array[@]}"; do
#	rm -rfv -- "$i";
	echo "";
	echo "$i";
done

#Cleanup temporary files

sleep 2

rm /tmp/fnames1.diff
rm /tmp/fnames2.diff
The array itself is working fine as using the 'echo' command each file/folder is printed on a seperate line.

The 'rm' command however is just simply not working and I have no idea how to fix it. If I use rm manually I usually just run it with * where the spaces are or general globbing but with this I'm a bit puzzled!! I know I'm close though....

Thanks for any assistance :-)
 
Old 05-10-2013, 06:16 PM   #2
Kustom42
Senior Member
 
Registered: Mar 2012
Distribution: Red Hat
Posts: 1,604

Rep: Reputation: 415Reputation: 415Reputation: 415Reputation: 415Reputation: 415
Have you tried changing your for loop to a while loop with read -r?

That would prevent backslash interpretation. Although I'm not sure if thats what you are after.

You could also do a set -f in the for loop to turn off expansion.
 
Old 05-10-2013, 06:31 PM   #3
kayasaman
Member
 
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443

Original Poster
Rep: Reputation: 32
Thanks for the response!

I was actually playing around with doing something like:

Code:
for i in "${fnames_array[@]}"; do
#       rm -rfv -- "$i";
#       echo "";
        test1=`echo "${i}" | sed -e 's/^[ \t]*//'`; 
#| xargs -0 --verbose -p rm -rfv " ";
        rm -rfv "$test1";
done
So my latest attempt was to stick each name into another variable and remove the leading 'whitespace' from it which is generated earlier, then perform the 'rm' function on the new variable. Unfortunately it doesn't work??

Even running:

Code:
echo "${i}" | sed -e 's/^[ \t]*//' | xargs -0 --verbose -p rm -rfv " ";
Tells me that the system gets as far as:

Code:
rm -rfv <file name>
but then nothing happens?

I'm wondering if rm isn't operating on the correct dir?? Though I did set it before the for loop!

Perhaps running something like:

Code:
rm -rfv $dir_path" "
may work?

The really weird thing is that I tested the script with 2 test directories before hand test1 & test2 then ran a bunch of touch and mkdir commands in them for a through f.

All the common files and folders got deleted - ok the names were just 1 letter but; and that was without even using a for loop. I'm not sure what's going on but am still tinkering??
 
Old 05-10-2013, 07:44 PM   #4
kayasaman
Member
 
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443

Original Poster
Rep: Reputation: 32
Ok well I managed to figure out why the script wasn't working - or rather why rm wasn't doing anything; basically when I output 'i' from the for loop there is a space infront of it.

Adding the dir doesn't help as you get:

Code:
/path/to/dir/ file name
I've tried all sorts of text manipulation to remove the leading white space but none seems to have an effect. I'm having a look at bouncing the information off to a file now and seeing if that helps but there surely must be an easier solution??
 
Old 05-10-2013, 10:02 PM   #5
kayasaman
Member
 
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443

Original Poster
Rep: Reputation: 32
I've tried with both suggestions above now but with no luck:

set -f attempt:
Code:
IFS=$'\n'; for i in ${#fnames_array[@]}; do
#	rm -rfv -- "$i";
	set -f $i;
#	echo "$i";
	rm -rfv -- "$i";
done
and also using a while loop which I got an example from here:

http://www.cyberciti.biz/faq/bash-loop-over-file/

Code:
#!/bin/bash
        while IFS= read -r file
        do
                [ -f "$file" ] && rm -f "$file"
        done < "/tmp/data.txt"

which I adapted to this:

Code:
for i in "${fnames_array[@]}"; do
	echo "$i" | sed -e :a -e 's/^.\{1,78\}$/ &/;ta' | sed 's/^[ \t]*//' >> /tmp/fnames_dup.diff;
done

ifs=$IFS

while IFS= read -r file
do
#	cd /path/to/dir && echo "$file" | xargs -0 --verbose rm -rf --
	cd /path/to/dir && rm -rf "$file"
done < "/tmp/fnames_dup.diff"
But that doesn't work either???

For whatever the reason 'rm' just won't function.... though xargs is definitely parsing the information to the rm command. Running rm with the -v flag set for verbosity doesn't seem to do anything at all as I don't know what rm is actually doing or where/if it's getting stuck.
 
Old 05-11-2013, 02:37 AM   #6
kayasaman
Member
 
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443

Original Poster
Rep: Reputation: 32
Ok good news :-)

I managed to get things almost working. I got rid of the extra complexity (extra while loop I added) and came up with this:

Code:
for i in "${fnames_array[@]}"; do
	echo "$i" | sed -e :a -e 's/^.\{1,78\}$/ &/;ta' | sed 's/^[ \t]*//' >> /tmp/fnames_dup.diff;
	xargs -r --arg-file=/tmp/fnames_dup.diff rm -rfv --
done

#Cleanup temporary files

sleep 2

rm /tmp/fnames1.diff
rm /tmp/fnames2.diff
I just have to figure out how to handle files and folders with whacky characters in them now then everything will be fine :-)
 
Old 05-11-2013, 07:10 AM   #7
GazL
LQ Veteran
 
Registered: May 2008
Posts: 7,071

Rep: Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220
Whacky characters are definitely problematic when it comes to shell scripts..

How about this approach?
Code:
gazl@ws1:/tmp/wibble$ ls -Q dir1 dir2
"dir1":
"duplicate1"  "duplicate2"  "duplicate3"  "new\nline"  "original5"  "original6"  "space in name"

"dir2":
"duplicate1"  "duplicate2"  "duplicate3"  "new\nline"  "orginial4"  "space in name"
gazl@ws1:/tmp/wibble$ cat /tmp/duplicates.sh 
#!/bin/bash

IFS=$'\n'
for file in $( comm -12 <( ls -Q dir1 ) <( ls -Q dir2 ) )
do
   file="${file#\"}"
   file="${file%\"}" 
   file="$(echo -e "dir1/$file")"
   ls -l -- "$file" 
done
gazl@ws1:/tmp/wibble$ /tmp/duplicates.sh 
-rw-r--r-- 1 gazl users 0 May 11 12:02 dir1/duplicate1
-rw-r--r-- 1 gazl users 0 May 11 12:02 dir1/duplicate2
-rw-r--r-- 1 gazl users 0 May 11 12:02 dir1/duplicate3
-rw-r--r-- 1 gazl users 0 May 11 12:02 dir1/new?line
-rw-r--r-- 1 gazl users 0 May 11 12:02 dir1/space in name
gazl@ws1:/tmp/wibble$
Of course, if you don't need to do any fancy logic in your script you could just use xargs like so:
Code:
gazl@ws1:/tmp/wibble$ cd dir1
gazl@ws1:/tmp/wibble/dir1$ xargs -0r ls -l -- < <( sort -z <( find . -type f -print0 ) <( cd ../dir2 ; find . -type f -print0) | uniq -z -d )
-rw-r--r-- 1 gazl users 0 May 11 12:02 ./duplicate1
-rw-r--r-- 1 gazl users 0 May 11 12:02 ./duplicate2
-rw-r--r-- 1 gazl users 0 May 11 12:02 ./duplicate3
-rw-r--r-- 1 gazl users 0 May 11 12:02 ./new?line
-rw-r--r-- 1 gazl users 0 May 11 12:02 ./space in name
gazl@ws1:/tmp/wibble/dir1$

Last edited by GazL; 05-11-2013 at 08:26 AM.
 
Old 05-11-2013, 09:12 AM   #8
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 852

Rep: Reputation: 389Reputation: 389Reputation: 389Reputation: 389
how about just

Code:
cd "$dir2"
files=(*)

cd "$dir1"
rm -vfr "${files[@]}"
 
Old 05-11-2013, 10:24 AM   #9
GazL
LQ Veteran
 
Registered: May 2008
Posts: 7,071

Rep: Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220Reputation: 5220
I try to avoid constructs such as that just in case they might exceed the maximum number of arguments the shell or system can cope with, but yes, unless you're dealing with a very large number of files it ought to be safe (but be wary of subdirectories). I usually use either xargs or a while/read loop approach in my scripts to avoid these issues.

Last edited by GazL; 05-11-2013 at 10:42 AM.
 
Old 05-11-2013, 01:02 PM   #10
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 852

Rep: Reputation: 389Reputation: 389Reputation: 389Reputation: 389
Quote:
Originally Posted by GazL View Post
I try to avoid constructs such as that just in case they might exceed the maximum number of arguments the shell or system can cope with
Yes, of course. My point was why make the diff of the directories when you can just "remove everything". It should be easy to modify my example to use xargs:

Code:
cd "$dir2"
files=(*)

cd "$dir1"
printf "%s\0" "${files[@]}" | xargs -0 rm -vfr --
 
Old 05-11-2013, 06:54 PM   #11
kayasaman
Member
 
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443

Original Poster
Rep: Reputation: 32
Thanks for the responses :-)

I know I went totally overboard by dumping everything into temp files but it was a good way to test out stuff offline and not on the 'live' directories.

Though I'm still learning scripting I thought it was not a good idea to run the 'ls' command in a script and instead use either 'find' or dump everything into an array as I did previously??

Quote:
cd "$dir1"
printf "%s\0" "${files[@]}" | xargs -0 rm -vfr --
hence my last attempt was very similar to this towards the end of the script, though for some reason 'rm' wasn't responding to files or folders with (whacky) characters in them (for UNIX) as some had () and [] style braces....

Manually running rm -rfv "whatever bad UNIX (style) [file]" works fine from stdinput (ie. typed on the cli) however, I tested using:

Code:
xargs -0 -I{} --arg-file=/tmp/fnames_dup.diff --verbose rm -rfv -- "{}"
which still wouldn't parse the information properly though when 'echoed' all was fine line by line.

I was considering reading xargs into a variable such as:

Code:
var=`xargs -0 -I{} --verbose --arg-file=/tmp/fnames_dup.diff`
rm -rfv "$var"
Though for whatever reason using " " quotes in my previous attempts still got interpreted wrongly by rm. [EDIT] rather then wrongly I should say not the way I expected things to be handled!! UNIX is never wrong it's only its operators that are lol

It would be really nice to get the stdinput version of: rm -rvf -- "funky chars in name" in to the script which I'm sure would solve all my issues....

Last edited by kayasaman; 05-11-2013 at 06:56 PM.
 
Old 05-12-2013, 03:10 AM   #12
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 852

Rep: Reputation: 389Reputation: 389Reputation: 389Reputation: 389
Quote:
Originally Posted by kayasaman View Post
Manually running rm -rfv "whatever bad UNIX (style) [file]" works fine from stdinput (ie. typed on the cli) however, I tested using:

Code:
xargs -0 -I{} --arg-file=/tmp/fnames_dup.diff --verbose rm -rfv -- "{}"
I was considering reading xargs into a variable such as:

Code:
var=`xargs -0 -I{} --verbose --arg-file=/tmp/fnames_dup.diff`
rm -rfv "$var"
which still wouldn't parse the information properly though when 'echoed' all was fine line by line.
Though for whatever reason using " " quotes in my previous attempts still got interpreted wrongly by rm. [EDIT] rather then wrongly
I should say not the way I expected things to be handled!! UNIX is never wrong it's only its operators that are lol

It would be really nice to get the stdinput version of: rm -rvf -- "funky chars in name" in to the script which I'm sure would solve all my issues....
The problem is not with rm. It's the xargs. The -0 switch means xargs will want the arguments separated by a \0 character. Without the switch, any whitespace will be used as delimiter. In the file, the filenames are separated by \n's. So, you may try

Code:
[whatever] | xargs -d$'\n' rm -vfr --
However, please note that in unix, the file names can also contain newlines.
 
Old 05-14-2013, 07:33 PM   #13
kayasaman
Member
 
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443

Original Poster
Rep: Reputation: 32
Thanks for the response and apologies for the delay....

Unfortunately that isn't working either?

Is there anyway that I can see what characters the line of text comprises of??

Using plain "ls", only a readable output is shown with the usual

Code:
() [] {} etc...
characters in the name.

Perhaps using the '-d' (delimiter) option is correct but I need to figure out what the delimiter needs to be first.
 
Old 05-14-2013, 08:50 PM   #14
kayasaman
Member
 
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443

Original Poster
Rep: Reputation: 32
I managed to fix this, at least on the directory I was having trouble with:

Code:
ifs=$IFS; IFS=$'\n'; fnames_array=($fnames_same)

#Change back to DIR1 and remove duplicate items

echo ""
cd /path/to/dir/1

for i in "${fnames_array[@]}"; do
	echo "$i" | sed -e :a -e 's/^.\{1,78\}$/ &/;ta' | sed 's/^[ \t]*//' >> /tmp/fnames_dup.diff;
	var=`xargs -t --arg-file=/tmp/fnames_dup.diff echo`
	rm -rfv -- $var
#	echo "$var"
done
Note the extra insertion of blank line after the IFS statement and additionally putting the output of 'xargs' into a variable, then running the rm command on that.

For some reason:

Code:
xargs -t --arg-file=/tmp/fnames_dup.diff rm -rfv --
doesn't work, however, when run like above everything seems fne :-)
 
Old 05-15-2013, 07:21 PM   #15
kayasaman
Member
 
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443

Original Poster
Rep: Reputation: 32
I managed to fix things perfectly now and everything works the way it should.

Here is the full script if anyone is interested:

Code:
#!/bin/bash

#Initialize file

rm /tmp/fnames_dup.diff;

#Read DIR1 and DIR2 from stdin
echo "Please enter the path to DIR1: "
echo ""
read DIR1

echo ""

echo "Please enter the path to DIR2: "
echo ""
read DIR2

#Change to DIR1 and read list of files into fnames1.diff - sorting as column

cd $DIR1
fnames1=( * )
echo " ${fnames1[@]/%/$' \n'}" > /tmp/fnames1.diff

#Change to DIR2 and read list of files into fnames2.diff - sorting as column

cd $DIR2
fnames2=( * )
echo " ${fnames2[@]/%/$' \n'}" > /tmp/fnames2.diff

#Find the differences between DIR1 and DIR2

fnames_diff=`diff -y --suppress-common-lines /tmp/fnames1.diff /tmp/fnames2.diff`

echo ""

#Output the differences with formatting

echo "These entries are not on destination DIR"
echo "----------------------------------------"
echo "$fnames_diff" | cut -d '>' -f 1 | cut -d '|' -f -1 | sort | awk 'NF'

echo ""

#Find the duplicate entries between DIR1 and DIR2

fnames_same=`sort /tmp/fnames1.diff /tmp/fnames2.diff|uniq -d`

#Output the duplicate entries between DIR1 and DIR2 

echo "These entries are duplicates and will be removed"
echo "------------------------------------------------"
echo "$fnames_same"

sleep 2

#Convert entries to array

ifs=$IFS; IFS=$'\n'; fnames_array=($fnames_same)

#Change back to DIR1 and remove duplicate items

echo ""

cd $DIR1

for i in "${fnames_array[@]}"; do
	echo "$i" | sed -e :a -e 's/^.\{1,78\}$/ &/;ta' | sed 's/^[ \t]*//' | sed 's/^[ \t]*//;s/[ \t]*$//' >> /tmp/fnames_dup.diff;
done

xargs -I{} -t --arg-file=/tmp/fnames_dup.diff rm -rfv "{}"

#Synchronize DIR1 and DIR2 using Rsync

echo ""
echo "Synchronization between $DIR1 and $DIR2 will now take place: "
echo "------------------------------------------------------------ "
echo ""
echo "WARNING! Activating this optiong will remove all source files "
echo ""
echo "To continue press y|Y "
echo ""
echo "To abort    press n|N "

while : 
do
	read REPLY
	case $REPLY in
		y|Y)
			rsync -avvcr --inplace --progress --remove-source-files $DIR1 $DIR2;
			break
			;;
		n|N)
			break;;
	esac
done

#Cleanup temporary files

sleep 2

rm /tmp/fnames1.diff
rm /tmp/fnames2.diff

The issues I faced were basically white space at the begining and end of the names which weren't accounted for. With the new 'sed' statement they are all accounted for and removed so that when 'xargs' reads the input file the 'rm' statement has no problems in functioning due to misalligned input when coupled to names in the directories.

Thanks for everyone's help in the meantime :-)
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need bash script to remove spaces and non alpha chars from folders/ files ne0shell Programming 6 06-22-2012 12:10 PM
Need help with bash shell script for loop Thaidog Programming 6 03-28-2012 09:59 AM
[bash] indirect array reference to array with values containing spaces Meson Linux - Software 9 06-04-2010 10:38 PM
BASH: Unzip files/folders from archive one at a time using a loop cade Programming 3 05-16-2010 11:48 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:34 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration