LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-16-2008, 10:35 AM   #1
haydar68
Member
 
Registered: Jul 2008
Posts: 35

Rep: Reputation: 15
compare two files


Hi,

I want to compare 2 files and get a new output that will contain the differences. Each file contain 5 fields (matricule, first name, last name, age, profession)

file1 is the original file. file2 should be synchronized with file1:I want to look for any change on file1 and want to apply these changes on file2.

#cat file1
10000;john;Trad;40;teacher
10001;georges;Hold;34;physician
10002;Catherina;Rick;36;doctor
10003;marc;bob;46;techician

#cat file2
10000;john;Trad;40;teacher
10001;georges;Hold;40;physician
10003;marc;Robert;46;programmer
10004;Maria;Roch;39;nurse

I us this script:

awk 'NR==FNR {f1[$0]=$0}
NR!=FNR {f2[$0]=$0}
END {
for(i in f1) if(!(i in f2)) print "Only in f1: " f1[i]
for(i in f2) if(!(i in f1)) print "Only in f2: " f2[i]
}' file1 file2

I get this result:

==============
Only in f1: 10001;georges;Hold;34;physician
Only in f1: 10003;marc;bob;46;techician
Only in f1: 10002;Catherina;Rick;36;doctor
Only in f2: 10004;Maria;Roch;39;nurse
Only in f2: 10003;marc;Robert;46;programmer
Only in f2: 10001;georges;Hold;40;physician
==============

But it is not what I hope to get and obtain as result.
I want to get a result like that:

===========
matricule:10001
change: modified
age:40

matricule:10002
change: deleted

matricule:10003
change: modified
lastname: Robert
profession: programmer

matricule:10004
change: added
firstname:Maria
lastaname:Roch
age:39
profession:nurse
==========

Can someone help me to get this result with awk?

Thanks,

Haydar

Last edited by haydar68; 08-16-2008 at 10:41 AM.
 
Old 08-16-2008, 10:44 AM   #2
CRC123
Member
 
Registered: Aug 2008
Distribution: opensuse, RHEL
Posts: 374
Blog Entries: 1

Rep: Reputation: 31
Try writing a script with the 'diff' command. It takes two files as input and then reports the differences between them, if there are any. Read the man page for it.
 
Old 08-16-2008, 10:46 AM   #3
haydar68
Member
 
Registered: Jul 2008
Posts: 35

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by CRC123 View Post
Try writing a script with the 'diff' command. It takes two files as input and then reports the differences between them, if there are any. Read the man page for it.
Thanks for your suggestion, I know how to use diff, but I need to use awk, awk is a simple command to run fast than diff/grep to provide the result that I need.
 
Old 08-16-2008, 10:51 AM   #4
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
This seems very contrived and makes me think this is a homework question. The sample you posted doesn't look in individual records at all, so it doesn't seem that you even wrote it yourself. If the first file is being read then 'NR==FNR' will be true. The logic in the END section tests if the records saved in the array differ. You need to change what you do if they differ and test which fields differ in that case.
 
Old 08-16-2008, 11:16 AM   #5
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Quote:
file2 should be synchronized with file1:I want to look for any change on file1 and want to apply these changes on file2.
This suggests that you could just copy file1 to file2.
Perhaps what you meant to say is that, if file1 has data for a particular field which is different than the corresponding field in file2 (if it exists), then that field in file2 should be updated.

And, yes, why does it have to be AWK?
 
Old 08-16-2008, 11:34 AM   #6
haydar68
Member
 
Registered: Jul 2008
Posts: 35

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by pixellany View Post
This suggests that you could just copy file1 to file2.
Perhaps what you meant to say is that, if file1 has data for a particular field which is different than the corresponding field in file2 (if it exists), then that field in file2 should be updated.

And, yes, why does it have to be AWK?
Hi Pixellany,

I agree with you to copy file1 to file2.

But my goal is to track the changes that were done in file1. I did not find any link where it explains clearly how to manipulate 2 files and their fields by using awk.

Thanks for your comments,

Haydar
 
Old 08-16-2008, 11:57 AM   #7
jiml8
Senior Member
 
Registered: Sep 2003
Posts: 3,171

Rep: Reputation: 114Reputation: 114
The comm command is exactly what is required here.

man comm
 
Old 08-16-2008, 12:17 PM   #8
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Since a record in one file may be missing in another file, you may want to create two arrays as you are doing, but use the first field as the index instead of the record number. Life might be easier if both files are sorted by the first field as well. The sort command can guarantee that if it might not be the case in the files.
awk -f commands.awk <(sort -t; file1) <(sort -t; file2)

Since your report is only concerned with the difference, you could use the "comm" command to filter out common lines:
comm -23 <(sort -t; file1) >temp1
comm -13 <(sort -t; file2) >temp2
awk -f commands.awk temp1 temp2 >report

Also, remember that awk arrays are one-dimensional. That means that you can't have a two dimensional array of records/fields. You will either have to decompose each field manually (in the END section logic) instead of using $1, $2, etc.; Or assign the values of an array to $0 and then create a temporary array for file1, before assigning the corresponding array element value (for file2) to $0 from the cooresponding line from the second file.

Awk arrays are associative, so the index can be a word instead of an integer. That may help. The index could be lastname or profession. That will make your awk program easier to read.

Often in Unix/Linux, your best approach is to use small tools like grep, sort and comm, each doing part of the job. Comm only works on sorted files, so that is a given. Working with only entries that differ means that the arrays can be smaller in awk as well.

Last edited by jschiwal; 08-16-2008 at 01:36 PM.
 
Old 08-16-2008, 03:36 PM   #9
estabroo
Senior Member
 
Registered: Jun 2008
Distribution: debian, ubuntu, sidux
Posts: 1,094
Blog Entries: 2

Rep: Reputation: 111Reputation: 111
cat file1 file2 | perl -e 'while (<>) { ($key,$val) = split(/;/,$_,2); $keep{$key} = $val; }; foreach $key (sort(keys(%keep))) { print "$key;$keep{$key}" }'

just reverse the cat if you want the file precedence the other way. cat file1 file2 means that items in file2 will take the place of items in file1, which by your example looks like what you wanted.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
AWK: compare two files haydar68 Programming 6 08-02-2008 11:20 PM
compare linecount of two files sancho1980 Linux - Newbie 3 10-12-2007 09:37 AM
Compare files in specific directory to files on CD szim90 Linux - Newbie 3 02-10-2007 08:39 PM
Compare two files namit Linux - Software 1 12-31-2005 08:10 AM
How do I compare 2 files? linuxhippy Slackware 6 03-26-2005 01:54 AM


All times are GMT -5. The time now is 07:16 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration