Comparing files and copying differences

gregmcc · 04-22-2009, 11:35 AM

I've got 2 files - File1 and File2

File1:
abc:123
def:456
tex:765

File2:
abc:567

What I would like to do is compare the first column (up to the : ) in the files and write the changes to File2

In other words check abc in File1 against File2, is it exists in File2 then skip, otherwise append it to the file, then check def, then tex etc etc

So once the script has run File2 would contain:

abc:567
def:456
tex:765

Any ideas - I've played around with awk and for loops but don't seem to be getting anywhere

kapilsingh · 04-22-2009, 01:04 PM

I don't have complete solution for you but I suggest,

by using uniq command I have found,
(for your given example)
uniq file1 file2

after that content of file2

abc:123
def:456
tex:765
I think you want 567 in place of 123.
"comm" command may be helpful for you.

Thanks
Kapil Singh

gregmcc · 04-22-2009, 01:23 PM

You might be onto something but the problem is with uniq and comm is that it will compare the whole line.

I only want to compare the first field - semicolon delimited.

Update: After many hours of searching I came across this post which does the job!!!

http://www.unix.com/shell-programmin...n-2-files.html

jf.argentino · 04-22-2009, 01:57 PM

maybe by using temporary files filled with the gawk command?

Quigi · 04-22-2009, 02:23 PM

Quote:

Originally Posted by gregmcc

In other words check abc in File1 against File2, is it exists in File2 then skip, otherwise append it to the file, then check def, then tex etc etc

You don't specify, so I'll infer from your example that the keys are in ascending order in both input files. Then this will write the desired result to stdout:

Code:

sort -t: -msuk1,1 File2 File1

man sort.

To update File2, don't directly redirect output, because that would truncate File2 before sort reads it. Rather,

Code:

sort -t: -msuk1,1 File2 File1 > tmp
mv tmp File2

More generally, I think your objective is to merge two "associative arrays" (AKA "dictionaries" in PostScript or "hashes" in Perl). Do you care to tell us why you want to do this?

More robustly and flexibly than the above "sort" call, you could use Perl to build up the hash %t, then write it all out at the end. Note the opposite order of arguments:

Code:

perl -we 'while(<>) {($k,$v)=split /:/, $_, 2; $t{$k}=$v;} while (@e = each %t) {print join ":", @e}' File1 File2

Or, if you want the output rows ordered (and with nicer names):

Code:

perl -we 'while(<>) {($key,$value)=split /:/, $_, 2; $hash{$key}=$value;} for (sort keys %hash) {print $_, ":", $hash{$_}};' File1 File2

Quote:

I've played around with awk and for loops but don't seem to be getting anywhere

I learned csh scripting, and awk, and sed, and (ba)sh. When I came across Perl, I realized that was the one tool I should have leared in the beginning. Give it a try!

Quote:

Originally Posted by kapilsingh

uniq file1 file2

That keeps the unique lines from file1 (only!) and overwrites file2. Not the solution. Also, as gregmcc points out, this looks at the whole line.

You can tell uniq to only heed the first 3 characters on each line. That could do in the example, but it's not really colon-delimited.

/Christian

gregmcc · 04-22-2009, 03:07 PM

Thanks for the reply.

I should have specified - The keys are in a random order

Quote:

Originally Posted by Quigi

You don't specify, so I'll infer from your example that the keys are in ascending order in both input files. Then this will write the desired result to stdout:

Code:

sort -t: -msuk1,1 File2 File1

man sort.

I tried this and it works great if the files are already sorted.

Quote:

More generally, I think your objective is to merge two "associative arrays" (AKA "dictionaries" in PostScript or "hashes" in Perl). Do you care to tell us why you want to do this?

File1 is on one server and File2 is on another server. I want to keep File2 up to date with new info that is added to File1. But I couldnt do a copy or rsync as the file content is not exactly the same.

I ended up using this:

Code:

awk -F ":" 'BEGIN{while(getline<"/tmp/file1") a [$1]=1 } ; a [$1] !=1 {print $0 } ' /tmp/file2 > /tmp/file.diff

Still not 100% sure what it does but it works

Libu · 04-22-2009, 03:20 PM

How about

Quote:

grep -v `cut -d":" -f1 File2` File1 >> File2

Quigi · 04-23-2009, 12:54 PM

Quote:

Originally Posted by gregmcc

Thanks for the reply.

I should have specified - The keys are in a random order
I tried this and it works great if the files are already sorted.

As you probably saw in the man page, "-m" tells sort that the files are already sorted. If they aren't, simply drop the "m", and sort will order them. The keys will be in order in the output. I can't tell from your example if that's a problem.

Or use one of the Perl one-liners.

Quote:

File1 is on one server and File2 is on another server. I want to keep File2 up to date with new info that is added to File1. But I couldn't do a copy or rsync as the file content is not exactly the same.

OK, makes sense.