Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
08-16-2008, 11:35 AM
|
#1
|
Member
Registered: Jul 2008
Posts: 35
Rep:
|
compare two files
Hi,
I want to compare 2 files and get a new output that will contain the differences. Each file contain 5 fields (matricule, first name, last name, age, profession)
file1 is the original file. file2 should be synchronized with file1:I want to look for any change on file1 and want to apply these changes on file2.
#cat file1
10000;john;Trad;40;teacher
10001;georges;Hold;34;physician
10002;Catherina;Rick;36;doctor
10003;marc;bob;46;techician
#cat file2
10000;john;Trad;40;teacher
10001;georges;Hold;40;physician
10003;marc;Robert;46;programmer
10004;Maria;Roch;39;nurse
I us this script:
awk 'NR==FNR {f1[$0]=$0}
NR!=FNR {f2[$0]=$0}
END {
for(i in f1) if(!(i in f2)) print "Only in f1: " f1[i]
for(i in f2) if(!(i in f1)) print "Only in f2: " f2[i]
}' file1 file2
I get this result:
==============
Only in f1: 10001;georges;Hold;34;physician
Only in f1: 10003;marc;bob;46;techician
Only in f1: 10002;Catherina;Rick;36;doctor
Only in f2: 10004;Maria;Roch;39;nurse
Only in f2: 10003;marc;Robert;46;programmer
Only in f2: 10001;georges;Hold;40;physician
==============
But it is not what I hope to get and obtain as result.
I want to get a result like that:
===========
matricule:10001
change: modified
age:40
matricule:10002
change: deleted
matricule:10003
change: modified
lastname: Robert
profession: programmer
matricule:10004
change: added
firstname:Maria
lastaname:Roch
age:39
profession:nurse
==========
Can someone help me to get this result with awk?
Thanks,
Haydar
Last edited by haydar68; 08-16-2008 at 11:41 AM.
|
|
|
08-16-2008, 11:44 AM
|
#2
|
Member
Registered: Aug 2008
Distribution: opensuse, RHEL
Posts: 374
Rep:
|
Try writing a script with the 'diff' command. It takes two files as input and then reports the differences between them, if there are any. Read the man page for it.
|
|
|
08-16-2008, 11:46 AM
|
#3
|
Member
Registered: Jul 2008
Posts: 35
Original Poster
Rep:
|
Quote:
Originally Posted by CRC123
Try writing a script with the 'diff' command. It takes two files as input and then reports the differences between them, if there are any. Read the man page for it.
|
Thanks for your suggestion, I know how to use diff, but I need to use awk, awk is a simple command to run fast than diff/grep to provide the result that I need.
|
|
|
08-16-2008, 11:51 AM
|
#4
|
LQ Guru
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
|
This seems very contrived and makes me think this is a homework question. The sample you posted doesn't look in individual records at all, so it doesn't seem that you even wrote it yourself. If the first file is being read then 'NR==FNR' will be true. The logic in the END section tests if the records saved in the array differ. You need to change what you do if they differ and test which fields differ in that case.
|
|
|
08-16-2008, 12:16 PM
|
#5
|
LQ Veteran
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809
|
Quote:
file2 should be synchronized with file1:I want to look for any change on file1 and want to apply these changes on file2.
|
This suggests that you could just copy file1 to file2.
Perhaps what you meant to say is that, if file1 has data for a particular field which is different than the corresponding field in file2 (if it exists), then that field in file2 should be updated.
And, yes, why does it have to be AWK?
|
|
|
08-16-2008, 12:34 PM
|
#6
|
Member
Registered: Jul 2008
Posts: 35
Original Poster
Rep:
|
Quote:
Originally Posted by pixellany
This suggests that you could just copy file1 to file2.
Perhaps what you meant to say is that, if file1 has data for a particular field which is different than the corresponding field in file2 (if it exists), then that field in file2 should be updated.
And, yes, why does it have to be AWK?
|
Hi Pixellany,
I agree with you to copy file1 to file2.
But my goal is to track the changes that were done in file1. I did not find any link where it explains clearly how to manipulate 2 files and their fields by using awk.
Thanks for your comments,
Haydar
|
|
|
08-16-2008, 12:57 PM
|
#7
|
Senior Member
Registered: Sep 2003
Posts: 3,171
Rep:
|
The comm command is exactly what is required here.
man comm
|
|
|
08-16-2008, 01:17 PM
|
#8
|
LQ Guru
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
|
Since a record in one file may be missing in another file, you may want to create two arrays as you are doing, but use the first field as the index instead of the record number. Life might be easier if both files are sorted by the first field as well. The sort command can guarantee that if it might not be the case in the files.
awk -f commands.awk <(sort -t; file1) <(sort -t; file2)
Since your report is only concerned with the difference, you could use the "comm" command to filter out common lines:
comm -23 <(sort -t; file1) >temp1
comm -13 <(sort -t; file2) >temp2
awk -f commands.awk temp1 temp2 >report
Also, remember that awk arrays are one-dimensional. That means that you can't have a two dimensional array of records/fields. You will either have to decompose each field manually (in the END section logic) instead of using $1, $2, etc.; Or assign the values of an array to $0 and then create a temporary array for file1, before assigning the corresponding array element value (for file2) to $0 from the cooresponding line from the second file.
Awk arrays are associative, so the index can be a word instead of an integer. That may help. The index could be lastname or profession. That will make your awk program easier to read.
Often in Unix/Linux, your best approach is to use small tools like grep, sort and comm, each doing part of the job. Comm only works on sorted files, so that is a given. Working with only entries that differ means that the arrays can be smaller in awk as well.
Last edited by jschiwal; 08-16-2008 at 02:36 PM.
|
|
|
08-16-2008, 04:36 PM
|
#9
|
Senior Member
Registered: Jun 2008
Distribution: debian, ubuntu, sidux
Posts: 1,127
Rep:
|
cat file1 file2 | perl -e 'while (<>) { ($key,$val) = split(/;/,$_,2); $keep{$key} = $val; }; foreach $key (sort(keys(%keep))) { print "$key;$keep{$key}" }'
just reverse the cat if you want the file precedence the other way. cat file1 file2 means that items in file2 will take the place of items in file1, which by your example looks like what you wanted.
|
|
|
All times are GMT -5. The time now is 03:28 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|