LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-22-2009, 11:35 AM   #1
gregmcc
Member
 
Registered: Mar 2007
Distribution: opensuse, ubuntu, debian
Posts: 43

Rep: Reputation: 1
Comparing files and copying differences


I've got 2 files - File1 and File2

File1:
abc:123
def:456
tex:765

File2:
abc:567

What I would like to do is compare the first column (up to the : ) in the files and write the changes to File2

In other words check abc in File1 against File2, is it exists in File2 then skip, otherwise append it to the file, then check def, then tex etc etc

So once the script has run File2 would contain:

abc:567
def:456
tex:765

Any ideas - I've played around with awk and for loops but don't seem to be getting anywhere

Last edited by gregmcc; 04-22-2009 at 12:41 PM.
 
Old 04-22-2009, 01:04 PM   #2
kapilsingh
LQ Newbie
 
Registered: Apr 2009
Location: Indore, India
Distribution: ubuntu, centOS,RHEL,Mandriva
Posts: 10

Rep: Reputation: 0
I don't have complete solution for you but I suggest,

by using uniq command I have found,
(for your given example)
uniq file1 file2

after that content of file2

abc:123
def:456
tex:765
I think you want 567 in place of 123.
"comm" command may be helpful for you.

Thanks
Kapil Singh
 
Old 04-22-2009, 01:23 PM   #3
gregmcc
Member
 
Registered: Mar 2007
Distribution: opensuse, ubuntu, debian
Posts: 43

Original Poster
Rep: Reputation: 1
You might be onto something but the problem is with uniq and comm is that it will compare the whole line.

I only want to compare the first field - semicolon delimited.

Update: After many hours of searching I came across this post which does the job!!!

http://www.unix.com/shell-programmin...n-2-files.html

Last edited by gregmcc; 04-22-2009 at 01:56 PM.
 
Old 04-22-2009, 01:57 PM   #4
jf.argentino
Member
 
Registered: Apr 2008
Location: Toulon (France)
Distribution: FEDORA CORE
Posts: 493

Rep: Reputation: 50
maybe by using temporary files filled with the gawk command?
 
Old 04-22-2009, 02:23 PM   #5
Quigi
Member
 
Registered: Mar 2003
Location: Cambridge, MA, USA
Distribution: Ubuntu (Dapper and Heron)
Posts: 377

Rep: Reputation: 31
Quote:
Originally Posted by gregmcc View Post
In other words check abc in File1 against File2, is it exists in File2 then skip, otherwise append it to the file, then check def, then tex etc etc
You don't specify, so I'll infer from your example that the keys are in ascending order in both input files. Then this will write the desired result to stdout:
Code:
sort -t: -msuk1,1 File2 File1
man sort.

To update File2, don't directly redirect output, because that would truncate File2 before sort reads it. Rather,
Code:
sort -t: -msuk1,1 File2 File1 > tmp
mv tmp File2
More generally, I think your objective is to merge two "associative arrays" (AKA "dictionaries" in PostScript or "hashes" in Perl). Do you care to tell us why you want to do this?

More robustly and flexibly than the above "sort" call, you could use Perl to build up the hash %t, then write it all out at the end. Note the opposite order of arguments:
Code:
perl -we 'while(<>) {($k,$v)=split /:/, $_, 2; $t{$k}=$v;} while (@e = each %t) {print join ":", @e}' File1 File2
Or, if you want the output rows ordered (and with nicer names):
Code:
perl -we 'while(<>) {($key,$value)=split /:/, $_, 2; $hash{$key}=$value;} for (sort keys %hash) {print $_, ":", $hash{$_}};' File1 File2
Quote:
I've played around with awk and for loops but don't seem to be getting anywhere
I learned csh scripting, and awk, and sed, and (ba)sh. When I came across Perl, I realized that was the one tool I should have leared in the beginning. Give it a try!

Quote:
Originally Posted by kapilsingh
uniq file1 file2
That keeps the unique lines from file1 (only!) and overwrites file2. Not the solution. Also, as gregmcc points out, this looks at the whole line.

You can tell uniq to only heed the first 3 characters on each line. That could do in the example, but it's not really colon-delimited.

/Christian

Last edited by Quigi; 04-22-2009 at 02:27 PM. Reason: Add "to", required by English grammar
 
Old 04-22-2009, 03:07 PM   #6
gregmcc
Member
 
Registered: Mar 2007
Distribution: opensuse, ubuntu, debian
Posts: 43

Original Poster
Rep: Reputation: 1
Thanks for the reply.

I should have specified - The keys are in a random order

Quote:
Originally Posted by Quigi View Post
You don't specify, so I'll infer from your example that the keys are in ascending order in both input files. Then this will write the desired result to stdout:
Code:
sort -t: -msuk1,1 File2 File1
man sort.
I tried this and it works great if the files are already sorted.

Quote:
More generally, I think your objective is to merge two "associative arrays" (AKA "dictionaries" in PostScript or "hashes" in Perl). Do you care to tell us why you want to do this?
File1 is on one server and File2 is on another server. I want to keep File2 up to date with new info that is added to File1. But I couldnt do a copy or rsync as the file content is not exactly the same.

I ended up using this:

Code:
awk -F ":" 'BEGIN{while(getline<"/tmp/file1") a [$1]=1 } ; a [$1] !=1 {print $0 } ' /tmp/file2 > /tmp/file.diff
Still not 100% sure what it does but it works
 
Old 04-22-2009, 03:20 PM   #7
Libu
Member
 
Registered: Oct 2003
Location: Chennai
Distribution: Slackware 12.1
Posts: 165

Rep: Reputation: 36
How about
Quote:
grep -v `cut -d":" -f1 File2` File1 >> File2
 
Old 04-23-2009, 12:54 PM   #8
Quigi
Member
 
Registered: Mar 2003
Location: Cambridge, MA, USA
Distribution: Ubuntu (Dapper and Heron)
Posts: 377

Rep: Reputation: 31
Quote:
Originally Posted by gregmcc View Post
Thanks for the reply.

I should have specified - The keys are in a random order
I tried this and it works great if the files are already sorted.
As you probably saw in the man page, "-m" tells sort that the files are already sorted. If they aren't, simply drop the "m", and sort will order them. The keys will be in order in the output. I can't tell from your example if that's a problem.

Or use one of the Perl one-liners.

Quote:
File1 is on one server and File2 is on another server. I want to keep File2 up to date with new info that is added to File1. But I couldn't do a copy or rsync as the file content is not exactly the same.
OK, makes sense.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Copying directories of files with differences devnull0 Linux - General 4 04-21-2008 01:09 PM
Comparing text files... jong357 Slackware 14 03-31-2007 04:29 PM
drastic speed differences on copying on hard disks lsu420luv Linux - Hardware 2 04-17-2006 10:50 AM
Using diff for comparing 2 files beep Programming 5 01-21-2005 12:51 PM
Comparing 2 Files xianzai Programming 2 05-23-2004 11:50 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:28 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration