Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
|
05-10-2013, 04:06 PM
|
#1
|
Member
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443
Rep:
|
Using rm in Bash shell script on files/folders with spaces in for loop array
Hi,
I've written a shell script to compare two directories and remove duplicates from the initial directory or dir1.
Everything is working fine however, I'm unable to use the rm command as the files and folders contained in the directories have spaces and other funky characters including ( ) etc...
Since the rm statement is contained in a for loop it seems to be bahving differently then if it where just a single shell line pass.
Here is the script:
Code:
#!/bin/bash
#Change to DIR1 and read list of files into fnames1.diff - sorting as column
cd /path/to/dir1
fnames1=( * )
echo " ${fnames1[@]/%/$' \n'}" > /tmp/fnames1.diff
#Change to DIR2 and read list of files into fnames2.diff - sorting as column
cd /patch/to/dir2
fnames2=( * )
echo " ${fnames2[@]/%/$' \n'}" > /tmp/fnames2.diff
#Find the differences between DIR1 and DIR2
fnames_diff=`diff -y --suppress-common-lines /tmp/fnames1.diff /tmp/fnames2.diff`
echo ""
#Output the differences with formatting
echo "These entries are not on destination DIR"
echo "----------------------------------------"
echo "$fnames_diff" | cut -d '>' -f 1 | cut -d '|' -f -1 | sort | awk 'NF'
echo ""
#Find the duplicate entries between DIR1 and DIR2
fnames_same=`sort /tmp/fnames1.diff /tmp/fnames2.diff|uniq -d`
#Output the duplicate entries between DIR1 and DIR2
echo "These entries are duplicates and will be removed"
echo "------------------------------------------------"
echo "$fnames_same"
sleep 2
#Convert entries to array
ifs=$IFS; IFS=$'\n'; fnames_array=($fnames_same)
#Change back to DIR1 and remove duplicate items
cd /path/to/dir1
for i in "${fnames_array[@]}"; do
# rm -rfv -- "$i";
echo "";
echo "$i";
done
#Cleanup temporary files
sleep 2
rm /tmp/fnames1.diff
rm /tmp/fnames2.diff
The array itself is working fine as using the 'echo' command each file/folder is printed on a seperate line.
The 'rm' command however is just simply not working and I have no idea how to fix it. If I use rm manually I usually just run it with * where the spaces are or general globbing but with this I'm a bit puzzled!! I know I'm close though....
Thanks for any assistance :-)
|
|
|
05-10-2013, 06:16 PM
|
#2
|
Senior Member
Registered: Mar 2012
Distribution: Red Hat
Posts: 1,604
|
Have you tried changing your for loop to a while loop with read -r?
That would prevent backslash interpretation. Although I'm not sure if thats what you are after.
You could also do a set -f in the for loop to turn off expansion.
|
|
|
05-10-2013, 06:31 PM
|
#3
|
Member
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443
Original Poster
Rep:
|
Thanks for the response!
I was actually playing around with doing something like:
Code:
for i in "${fnames_array[@]}"; do
# rm -rfv -- "$i";
# echo "";
test1=`echo "${i}" | sed -e 's/^[ \t]*//'`;
#| xargs -0 --verbose -p rm -rfv " ";
rm -rfv "$test1";
done
So my latest attempt was to stick each name into another variable and remove the leading 'whitespace' from it which is generated earlier, then perform the 'rm' function on the new variable. Unfortunately it doesn't work??
Even running:
Code:
echo "${i}" | sed -e 's/^[ \t]*//' | xargs -0 --verbose -p rm -rfv " ";
Tells me that the system gets as far as:
Code:
rm -rfv <file name>
but then nothing happens?
I'm wondering if rm isn't operating on the correct dir?? Though I did set it before the for loop!
Perhaps running something like:
Code:
rm -rfv $dir_path" "
may work?
The really weird thing is that I tested the script with 2 test directories before hand test1 & test2 then ran a bunch of touch and mkdir commands in them for a through f.
All the common files and folders got deleted - ok the names were just 1 letter but; and that was without even using a for loop. I'm not sure what's going on but am still tinkering??
|
|
|
05-10-2013, 07:44 PM
|
#4
|
Member
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443
Original Poster
Rep:
|
Ok well I managed to figure out why the script wasn't working - or rather why rm wasn't doing anything; basically when I output 'i' from the for loop there is a space infront of it.
Adding the dir doesn't help as you get:
Code:
/path/to/dir/ file name
I've tried all sorts of text manipulation to remove the leading white space but none seems to have an effect. I'm having a look at bouncing the information off to a file now and seeing if that helps but there surely must be an easier solution??
|
|
|
05-10-2013, 10:02 PM
|
#5
|
Member
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443
Original Poster
Rep:
|
I've tried with both suggestions above now but with no luck:
set -f attempt:
Code:
IFS=$'\n'; for i in ${#fnames_array[@]}; do
# rm -rfv -- "$i";
set -f $i;
# echo "$i";
rm -rfv -- "$i";
done
and also using a while loop which I got an example from here:
http://www.cyberciti.biz/faq/bash-loop-over-file/
Code:
#!/bin/bash
while IFS= read -r file
do
[ -f "$file" ] && rm -f "$file"
done < "/tmp/data.txt"
which I adapted to this:
Code:
for i in "${fnames_array[@]}"; do
echo "$i" | sed -e :a -e 's/^.\{1,78\}$/ &/;ta' | sed 's/^[ \t]*//' >> /tmp/fnames_dup.diff;
done
ifs=$IFS
while IFS= read -r file
do
# cd /path/to/dir && echo "$file" | xargs -0 --verbose rm -rf --
cd /path/to/dir && rm -rf "$file"
done < "/tmp/fnames_dup.diff"
But that doesn't work either???
For whatever the reason 'rm' just won't function.... though xargs is definitely parsing the information to the rm command. Running rm with the -v flag set for verbosity doesn't seem to do anything at all as I don't know what rm is actually doing or where/if it's getting stuck.
|
|
|
05-11-2013, 02:37 AM
|
#6
|
Member
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443
Original Poster
Rep:
|
Ok good news :-)
I managed to get things almost working. I got rid of the extra complexity (extra while loop I added) and came up with this:
Code:
for i in "${fnames_array[@]}"; do
echo "$i" | sed -e :a -e 's/^.\{1,78\}$/ &/;ta' | sed 's/^[ \t]*//' >> /tmp/fnames_dup.diff;
xargs -r --arg-file=/tmp/fnames_dup.diff rm -rfv --
done
#Cleanup temporary files
sleep 2
rm /tmp/fnames1.diff
rm /tmp/fnames2.diff
I just have to figure out how to handle files and folders with whacky characters in them now then everything will be fine :-)
|
|
|
05-11-2013, 07:10 AM
|
#7
|
LQ Veteran
Registered: May 2008
Posts: 7,071
|
Whacky characters are definitely problematic when it comes to shell scripts..
How about this approach?
Code:
gazl@ws1:/tmp/wibble$ ls -Q dir1 dir2
"dir1":
"duplicate1" "duplicate2" "duplicate3" "new\nline" "original5" "original6" "space in name"
"dir2":
"duplicate1" "duplicate2" "duplicate3" "new\nline" "orginial4" "space in name"
gazl@ws1:/tmp/wibble$ cat /tmp/duplicates.sh
#!/bin/bash
IFS=$'\n'
for file in $( comm -12 <( ls -Q dir1 ) <( ls -Q dir2 ) )
do
file="${file#\"}"
file="${file%\"}"
file="$(echo -e "dir1/$file")"
ls -l -- "$file"
done
gazl@ws1:/tmp/wibble$ /tmp/duplicates.sh
-rw-r--r-- 1 gazl users 0 May 11 12:02 dir1/duplicate1
-rw-r--r-- 1 gazl users 0 May 11 12:02 dir1/duplicate2
-rw-r--r-- 1 gazl users 0 May 11 12:02 dir1/duplicate3
-rw-r--r-- 1 gazl users 0 May 11 12:02 dir1/new?line
-rw-r--r-- 1 gazl users 0 May 11 12:02 dir1/space in name
gazl@ws1:/tmp/wibble$
Of course, if you don't need to do any fancy logic in your script you could just use xargs like so:
Code:
gazl@ws1:/tmp/wibble$ cd dir1
gazl@ws1:/tmp/wibble/dir1$ xargs -0r ls -l -- < <( sort -z <( find . -type f -print0 ) <( cd ../dir2 ; find . -type f -print0) | uniq -z -d )
-rw-r--r-- 1 gazl users 0 May 11 12:02 ./duplicate1
-rw-r--r-- 1 gazl users 0 May 11 12:02 ./duplicate2
-rw-r--r-- 1 gazl users 0 May 11 12:02 ./duplicate3
-rw-r--r-- 1 gazl users 0 May 11 12:02 ./new?line
-rw-r--r-- 1 gazl users 0 May 11 12:02 ./space in name
gazl@ws1:/tmp/wibble/dir1$
Last edited by GazL; 05-11-2013 at 08:26 AM.
|
|
|
05-11-2013, 09:12 AM
|
#8
|
Member
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 852
|
how about just
Code:
cd "$dir2"
files=(*)
cd "$dir1"
rm -vfr "${files[@]}"
|
|
|
05-11-2013, 10:24 AM
|
#9
|
LQ Veteran
Registered: May 2008
Posts: 7,071
|
I try to avoid constructs such as that just in case they might exceed the maximum number of arguments the shell or system can cope with, but yes, unless you're dealing with a very large number of files it ought to be safe (but be wary of subdirectories). I usually use either xargs or a while/read loop approach in my scripts to avoid these issues.
Last edited by GazL; 05-11-2013 at 10:42 AM.
|
|
|
05-11-2013, 01:02 PM
|
#10
|
Member
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 852
|
Quote:
Originally Posted by GazL
I try to avoid constructs such as that just in case they might exceed the maximum number of arguments the shell or system can cope with
|
Yes, of course. My point was why make the diff of the directories when you can just "remove everything". It should be easy to modify my example to use xargs:
Code:
cd "$dir2"
files=(*)
cd "$dir1"
printf "%s\0" "${files[@]}" | xargs -0 rm -vfr --
|
|
|
05-11-2013, 06:54 PM
|
#11
|
Member
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443
Original Poster
Rep:
|
Thanks for the responses :-)
I know I went totally overboard by dumping everything into temp files but it was a good way to test out stuff offline and not on the 'live' directories.
Though I'm still learning scripting I thought it was not a good idea to run the 'ls' command in a script and instead use either 'find' or dump everything into an array as I did previously??
Quote:
cd "$dir1"
printf "%s\0" "${files[@]}" | xargs -0 rm -vfr --
|
hence my last attempt was very similar to this towards the end of the script, though for some reason 'rm' wasn't responding to files or folders with (whacky) characters in them (for UNIX) as some had () and [] style braces....
Manually running rm -rfv "whatever bad UNIX (style) [file]" works fine from stdinput (ie. typed on the cli) however, I tested using:
Code:
xargs -0 -I{} --arg-file=/tmp/fnames_dup.diff --verbose rm -rfv -- "{}"
which still wouldn't parse the information properly though when 'echoed' all was fine line by line.
I was considering reading xargs into a variable such as:
Code:
var=`xargs -0 -I{} --verbose --arg-file=/tmp/fnames_dup.diff`
rm -rfv "$var"
Though for whatever reason using " " quotes in my previous attempts still got interpreted wrongly by rm. [EDIT] rather then wrongly I should say not the way I expected things to be handled!! UNIX is never wrong it's only its operators that are lol
It would be really nice to get the stdinput version of: rm -rvf -- "funky chars in name" in to the script which I'm sure would solve all my issues....
Last edited by kayasaman; 05-11-2013 at 06:56 PM.
|
|
|
05-12-2013, 03:10 AM
|
#12
|
Member
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 852
|
Quote:
Originally Posted by kayasaman
Manually running rm -rfv "whatever bad UNIX (style) [file]" works fine from stdinput (ie. typed on the cli) however, I tested using:
Code:
xargs -0 -I{} --arg-file=/tmp/fnames_dup.diff --verbose rm -rfv -- "{}"
I was considering reading xargs into a variable such as:
Code:
var=`xargs -0 -I{} --verbose --arg-file=/tmp/fnames_dup.diff`
rm -rfv "$var"
which still wouldn't parse the information properly though when 'echoed' all was fine line by line.
Though for whatever reason using " " quotes in my previous attempts still got interpreted wrongly by rm. [EDIT] rather then wrongly
I should say not the way I expected things to be handled!! UNIX is never wrong it's only its operators that are lol
It would be really nice to get the stdinput version of: rm -rvf -- "funky chars in name" in to the script which I'm sure would solve all my issues....
|
The problem is not with rm. It's the xargs. The -0 switch means xargs will want the arguments separated by a \0 character. Without the switch, any whitespace will be used as delimiter. In the file, the filenames are separated by \n's. So, you may try
Code:
[whatever] | xargs -d$'\n' rm -vfr --
However, please note that in unix, the file names can also contain newlines.
|
|
|
05-14-2013, 07:33 PM
|
#13
|
Member
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443
Original Poster
Rep:
|
Thanks for the response and apologies for the delay....
Unfortunately that isn't working either?
Is there anyway that I can see what characters the line of text comprises of??
Using plain "ls", only a readable output is shown with the usual
characters in the name.
Perhaps using the '-d' (delimiter) option is correct but I need to figure out what the delimiter needs to be first.
|
|
|
05-14-2013, 08:50 PM
|
#14
|
Member
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443
Original Poster
Rep:
|
I managed to fix this, at least on the directory I was having trouble with:
Code:
ifs=$IFS; IFS=$'\n'; fnames_array=($fnames_same)
#Change back to DIR1 and remove duplicate items
echo ""
cd /path/to/dir/1
for i in "${fnames_array[@]}"; do
echo "$i" | sed -e :a -e 's/^.\{1,78\}$/ &/;ta' | sed 's/^[ \t]*//' >> /tmp/fnames_dup.diff;
var=`xargs -t --arg-file=/tmp/fnames_dup.diff echo`
rm -rfv -- $var
# echo "$var"
done
Note the extra insertion of blank line after the IFS statement and additionally putting the output of 'xargs' into a variable, then running the rm command on that.
For some reason:
Code:
xargs -t --arg-file=/tmp/fnames_dup.diff rm -rfv --
doesn't work, however, when run like above everything seems fne :-)
|
|
|
05-15-2013, 07:21 PM
|
#15
|
Member
Registered: Sep 2008
Location: Under the bridge where proper engineers walkover
Distribution: Various Linux, Solaris, BSD, Cisco
Posts: 443
Original Poster
Rep:
|
I managed to fix things perfectly now and everything works the way it should.
Here is the full script if anyone is interested:
Code:
#!/bin/bash
#Initialize file
rm /tmp/fnames_dup.diff;
#Read DIR1 and DIR2 from stdin
echo "Please enter the path to DIR1: "
echo ""
read DIR1
echo ""
echo "Please enter the path to DIR2: "
echo ""
read DIR2
#Change to DIR1 and read list of files into fnames1.diff - sorting as column
cd $DIR1
fnames1=( * )
echo " ${fnames1[@]/%/$' \n'}" > /tmp/fnames1.diff
#Change to DIR2 and read list of files into fnames2.diff - sorting as column
cd $DIR2
fnames2=( * )
echo " ${fnames2[@]/%/$' \n'}" > /tmp/fnames2.diff
#Find the differences between DIR1 and DIR2
fnames_diff=`diff -y --suppress-common-lines /tmp/fnames1.diff /tmp/fnames2.diff`
echo ""
#Output the differences with formatting
echo "These entries are not on destination DIR"
echo "----------------------------------------"
echo "$fnames_diff" | cut -d '>' -f 1 | cut -d '|' -f -1 | sort | awk 'NF'
echo ""
#Find the duplicate entries between DIR1 and DIR2
fnames_same=`sort /tmp/fnames1.diff /tmp/fnames2.diff|uniq -d`
#Output the duplicate entries between DIR1 and DIR2
echo "These entries are duplicates and will be removed"
echo "------------------------------------------------"
echo "$fnames_same"
sleep 2
#Convert entries to array
ifs=$IFS; IFS=$'\n'; fnames_array=($fnames_same)
#Change back to DIR1 and remove duplicate items
echo ""
cd $DIR1
for i in "${fnames_array[@]}"; do
echo "$i" | sed -e :a -e 's/^.\{1,78\}$/ &/;ta' | sed 's/^[ \t]*//' | sed 's/^[ \t]*//;s/[ \t]*$//' >> /tmp/fnames_dup.diff;
done
xargs -I{} -t --arg-file=/tmp/fnames_dup.diff rm -rfv "{}"
#Synchronize DIR1 and DIR2 using Rsync
echo ""
echo "Synchronization between $DIR1 and $DIR2 will now take place: "
echo "------------------------------------------------------------ "
echo ""
echo "WARNING! Activating this optiong will remove all source files "
echo ""
echo "To continue press y|Y "
echo ""
echo "To abort press n|N "
while :
do
read REPLY
case $REPLY in
y|Y)
rsync -avvcr --inplace --progress --remove-source-files $DIR1 $DIR2;
break
;;
n|N)
break;;
esac
done
#Cleanup temporary files
sleep 2
rm /tmp/fnames1.diff
rm /tmp/fnames2.diff
The issues I faced were basically white space at the begining and end of the names which weren't accounted for. With the new 'sed' statement they are all accounted for and removed so that when 'xargs' reads the input file the 'rm' statement has no problems in functioning due to misalligned input when coupled to names in the directories.
Thanks for everyone's help in the meantime :-)
|
|
|
All times are GMT -5. The time now is 04:34 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|