shell script: compare 2 files

anhtt · 08-28-2007, 11:09 PM

I have 2 files. Now I want write 1 shell script which check one bye one row in file1. If the row of file1 is found in the file2, we won't do anything. Otherwise, we will save the row of file1 in a new file ( file3).
I also wrote a shell script. But it is so slow because there are a lot of rows in file1 and file2.

cat file1 | while read line;
do
echo $line > /tmp/check
x=`cut -f1 -d" " /tmp/check`
y=`grep -c "$x" file2
if [ "$y" -eq 1 ];then
continue
else
echo $x >> file3
fi
done

Help me !

gilead · 08-28-2007, 11:12 PM

It's probably easier to use one of the sdiff, diff or cmp commands. Have you tried them before?

chrism01 · 08-28-2007, 11:41 PM

Code:

for line in `cat t1.t`
do
        rslt=`grep $line t2.t`
        if [[ $? -ne 0 ]]
        then
                echo $line >>t3.t
        fi
done

ghostdog74 · 08-29-2007, 01:49 AM

Code:

awk 'FNR==NR{ arr[$0];next}
     {if ($0 in arr) print "found"
      else print "not found"   
     }' "file1" "file2"

farkus888 · 08-29-2007, 02:04 AM

Quote:

Originally Posted by gilead

It's probably easier to use one of the sdiff, diff or cmp commands. Have you tried them before?

I agree with this, sdiff will mark all changes with a "<", ">", or a "|". use the equivalent of a grep -v for those three things and then cut out one of the columns and you have exactly what you want. unfortunately right now I am too tired for code thats any better than stupid slow and inefficient at the moment so you'll need to figure out the details on your own.

theYinYeti · 08-29-2007, 02:33 AM

If the order does not matter, and all rows in each file are unique (no two identical rows in the same file), then the fastest is probably:

Code:

cat file1 file2 | sort | uniq -d | cat file1 - | sort | uniq -u

which would only display lines from file1 which are not in common between file1 and file2. If you want the reverse (lines in common), then it is simpler:

Code:

cat file1 file2 | sort | uniq -d

If the above conditions are false (identical rows or important order), then to some extent (some thousands of lines), this should be a good solution:

Code:

grep -vxf <(sed 's/[]\.*^$[]/\\\0/g' file2) file1

which gives lines in file1 that are not in file2.

To output the result to a "file3" file, just append

Code:

 >file3

to any of the above commands.

Yves.

[edit:]I had forgotten the 'f' option in grep. Now it is ok.[/edit]

gnashley · 08-29-2007, 02:39 AM

comm is the program you want -does exactly what you want.