LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Script to find file differences in two directory trees (bash) (http://www.linuxquestions.org/questions/linux-newbie-8/script-to-find-file-differences-in-two-directory-trees-bash-673527/)

Syqers 10-01-2008 08:40 AM

Script to find file differences in two directory trees (bash)
 
I'm new to bash scripting, and I'll try to describe my problem as succinctly as possible.

I have two directory trees, which have the same structure. These trees go 6 or 7 levels deep, with a mixture of files/folders at every level (until the final level obviously). Let's call these trees dir1 and dir2.

I want to write a script that greps all the files in dir1 to their counterpart in dir2, and then put all the individual file grep results into a third folder, say /grepped.

If a file exists in dir1, but not in dir2 (or vice versa), then a grepped file should exist in /grepped with just the line "/dir1/foo/fileName.txt does not exist in /dir2" - or something equivalent.

So how do I traverse the directory tree? I can use test -d to figure out if something is a path or directory, but then I'm not sure how to actually move around the tree.

Additionally, I searched google and this site for a solution to anything similar to this and was unable to find it. If someone has a good link, please send it my way.

Thanks!

slano 10-01-2008 09:26 AM

hey, it's quite easy. forget about the tree, treat it as a text, line by line.
just create your dir1.txt and dir2.txt by running find ./ > dir1.txt having working directory dir1 and do the same for dir2.

than you can do something like

cat dir1.txt | while read line
do
grep "$line" dir2.txt >> grepped.txt
done

I guess you know what to do next, let me know if you need additional help

Syqers 10-01-2008 02:43 PM

Code:

#!/bin/bash

## INSTRUCTIONS
## This script takes two directory trees and creates three output types:
##  1. *.diffed which include the diff results if both dir1 and dir2 contain the file
##  2. lonely.dir1 which includes all the files present in dir1 but not dir2
##  3. lonely.dir2 which includes all the files present in dir2 but not dir1
## It deletes -> recreates a directory /diffed in your run location

echo "Comparing $1 with $2........"

homedir=$(pwd)

## Has to track the current directory
cd $1
all_files=$(find * -type f)
cd $homedir

## Could put some sort of warning here
if [ -d "diffed" ]; then
  rm -r diffed
fi
mkdir diffed

for f in $all_files; do

  ## Have to remove the /'s from $f for naming
  fslashesremoved=$(echo $f | sed 's_/__')

  if [ -f $1/$f ]; then
      if [ -f $2/$f ]; then
        ## Have to check if there is a difference between files
        diff $1/$f $2/$f > /dev/null
        if [ $? != 0 ]; then
            ## echo "Writing diff between $1/$f and $2/$f"
            diff $1/$f $2/$f > diffed/$fslashesremoved.diffed
        fi
      else
        echo "$f: present in $1, but not in $2" >> diffed/lonely.dir1
      fi 
  fi
done

cd $2
extra_files=$(find * -type f)
cd $homedir

## Now have to do the reverse for tmp2 to tmp1, but only have to check if they are present or not

for f in $extra_files; do

  if [ -f $2/$f ]; then
      if [ -f $1/$f ]; then
        ## Have to figure out how not to do something here
        echo stuff > /dev/null
      else
        echo "$f: present in $2, but not in $1" >> diffed/lonely.dir2
      fi 
  fi
done

## Now we have a diffed directory that has lots of files
## What should we remove from them?
## Should also remove the *.*~ from this

Slano, it turns out I need a bit more functionality since I want a list of what files are not present in either directories. (But if I didn't need this, I would have done it your way).

Here is what I actually settled on. I know it's very rough, but it works so far! Still trying to figure out how to use say in code "if you don't find the file in the first directory, and it is present in the second directory". That's why you see the hack with echo stuff > /dev/null.

Also, I need to ignore all temporary files.

Mr. C. 10-01-2008 11:55 PM

Does the basis of diff -r not provide what you need?

It tells you:

a) which files exist in only one or the other directory tree
b) the differences between two corresponding files.

eg:
Code:

$ diff -r level0*
Only in level0.mirror/a/print: file
Only in level0/b/print: file
diff -r level0/bak/d/dir/file1 level0.mirror/bak/d/dir/file1
0a1
> I'm different

You can parse the output as diff lines always contain text output in the above demonstrated format (eg. "Only in ...", "> ...", etc.), or you can specify your own output format.


You can also use the -q option to give you easier parsing:

Code:

diff -qr level0*
Only in level0.mirror/a/print: file
Only in level0/b/print: file
Files level0/bak/d/dir/file1 and level0.mirror/bak/d/dir/file1 differ

and you can then perform your own diffs of the lines "Files...differ",

Diff has plenty of good options - be sure to review the man page.


All times are GMT -5. The time now is 08:25 PM.