Can diff display the differences/uniques for just one of the 2 files?

gruffy · 09-07-2006, 11:04 PM

I'm keeping a text file of line data that is about 1.2 GB in size.. lets call it "FileA"

I generate a "FileB" daily (about 25MB in size) which contains about 80% duplicate entries that are already contained in "FileA".

I need the unique lines only from "FileB" sent to the output. Is there a way to have diff do this?

gilead · 09-07-2006, 11:47 PM

Running the following produces a standard diff of the files. The grep limits the output to lines added (a '> ' at the start of line) and the sed trims the leading '> '. Does this do what you want?

Code:

diff file1 file2 | grep '^>' | sed -e 's/^> //'

gruffy · 09-08-2006, 12:34 PM

Quote:

Originally Posted by gilead

Running the following produces a standard diff of the files. The grep limits the output to lines added (a '> ' at the start of line) and the sed trims the leading '> '. Does this do what you want?

Code:

diff file1 file2 | grep '^>' | sed -e 's/^> //'

Actually I get "diff: memory exhausted" before it does anything. The -H switch doesn't help either... I'm guessing diff tries to load the entire file into memory (1.2GB into 1GB). Think I'll have to go find some bigger dimm's to put in this machine.

gruffy · 09-10-2006, 12:51 PM

Quote:

Originally Posted by gilead

Running the following produces a standard diff of the files. The grep limits the output to lines added (a '> ' at the start of line) and the sed trims the leading '> '. Does this do what you want?

Code:

diff file1 file2 | grep '^>' | sed -e 's/^> //'

I actually got this working and doing what I need - thanks a lot!

gilead · 09-10-2006, 02:17 PM

No problem - glad to hear it's working