LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-12-2007, 12:03 AM   #1
silex_88
Member
 
Registered: May 2005
Posts: 49

Rep: Reputation: 15
Exclamation shell script to recursively "compare" all files in a directory...


Hi, the command I'm using has the format:

arrow --query=FILE1 --compare=FILE2

I want FILE1 and FILE2 to go through all files in a certain directory recursively.

My attempt was

arrow --query=`find -type f DIR` --compare=`find -type f DIR`

However I get a "too many argument" error message.
 
Old 05-12-2007, 12:40 AM   #2
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
I am not familiar with an arrow command.
Rather than cycling through all of the files and comparing it to remaining files in a list, I would run "md5sum" or "sum" on all of the files, and then locate duplicate checksum values.

You can do this for files in various subdirectories as well.

Code:
find . -maxdepth 1 -type f -exec md5sum '{}' \; >md5sumlist
cut -d' ' -f1  | sort md5sumlist | uniq -d >acopylist
grep -f acopylist md5sumlist
I think you can shorten this up using the "-w32 -D" for uniq. Then the grep command might not be necessary.

Code:
find . -maxdepth 1 -type f -exec md5sum '{}' \; | sort | uniq -w32 -D
----
Quote:
arrow --query=`find -type f DIR` --compare=`find -type f DIR`
1) The find command doesn't look right. The directory to base the search should come first.
2) --query=FILE1 --compare=FILE2 implies that the argument to query should be a single file instead of every file in the directory. Even if it allowed a number of files such as
arrow --compare=FILE --query="FILE1 FILE2 ..."
there could still be a problem if the number of files in the directory is to large.

Last edited by jschiwal; 05-12-2007 at 12:58 AM.
 
Old 05-12-2007, 12:47 AM   #3
silex_88
Member
 
Registered: May 2005
Posts: 49

Original Poster
Rep: Reputation: 15
Hi! Thanks for the prompt reply. I'm afraid I wasn't very clear: the arrow command I'm executing is custom, and I'm just looking for a way to give it all the files in a directory as parameters. In pseudo code, I want to do


Code:
for each FILE1 in DIR #recursive
do
  for each FILE2 in DIR #recursive
  do
    if FILE1 != FILE2
    do
      arrow --query=FILE1 --compare=FILE2 >> output.txt
    end
  end
end
where DIR is the same directory for both loops. However, I don't know how to write that as a shell script. Maybe it's easier in Python? If so, please show me how

Last edited by silex_88; 05-12-2007 at 12:50 AM.
 
Old 05-12-2007, 04:24 AM   #4
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Code:
# Demonstrate reading in a list of regular files into a variable array
# Show ways of manipulating the array variable and displaying elements.

for file1 in $DIR/*; do
   for file2 in $DIR/*; do
      if [ -f $file1 ] && [ $file1 != $file2 ]; then 
         arrow --query=$file1 --compare=$file2 >> output.txt
      fi
   done
done
The "[ -f $file1 ]" tests if the filename is a regular file. It could be a directory.

This isn't a good way of doing it. When $file1 is the first file, it will be compared against the 2nd to last file. The $file2 loop should start with the next file after $file1. It would be better to have an array containing the filenames. In the outer loop, loop through 1 .. n. In the inner loop, loop through file1+1 .. n.

Your way will execute n squared times. This way will execute (n-1)n / 2 times.

Look at this example for inspiration:
Code:
# Change IFS so that filenames with spaces don't get split up
ifs=$IFS
IFS='
'
# Fill an array with the files in the current directory
files=($(find . -maxdepth 1 -type f ))

# Restore old IFS value
IFS=$ifs

# Display the index of the last file
LAST=$((${#files[@]}-1)) 
echo $LAST

# Display the contents of the array
for (( i=0; i<=$LAST; i++ )); do echo $i : "${files[$i]}"; done
See the array section of the info bash manual for more details.

Last edited by jschiwal; 05-12-2007 at 04:26 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell Script: Find "Word" Run "Command" granatica Linux - Software 5 07-25-2007 07:42 AM
Shell script to "fire" Java files Thor Linux - Software 3 04-04-2007 09:56 AM
Problem with X consoles - "shell-init: error retrieving current directory" lowpro2k3 Slackware 4 03-26-2007 06:36 PM
shell script to mount samba share with " " in the remote path dohpaz Programming 2 10-20-2006 02:18 PM
Recursively deleting ".directory" Frank616 Linux - Software 2 03-25-2005 11:58 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:02 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration