LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Compare semicolon seperated data in 2 files using shell script (https://www.linuxquestions.org/questions/linux-newbie-8/compare-semicolon-seperated-data-in-2-files-using-shell-script-769581/)

novice82 11-16-2009 10:45 PM

Compare semicolon seperated data in 2 files using shell script
 
hello members,

I have some data ( seperated by semicolon ) with close to 240 rows in a text file temp1.
temp2.txt stores 204 rows of data ( seperated by semicolon ).
I want to :
Sort the data in both files by field1.i.e first data field in every row.
compare the data in both files and print out the rows that are not equal in seperate files.

I was trying to do this with excel using vlookup, without a great deal of success. hence, i'm exploring the shell script option.

Code:

temp1.txt
1000xyz400100xyzA00680xyz0;19722.83;19565.7;157.13;11;2.74;11.00
1000xyz400100xyzA00682xyz0;7210.68;4111.53;3099.15;216.95;1.21;216.94
1000xyz430200xyzA00651xyz0;146.70;0.00;0.00;0.00;0.00;0.00

temp2.txt
1000xyz400100xyzA00680xyz0;19722.83;19565.7;157.13;11;2.74;11.00
1000xyz400100xyzA00682xyz0;7210.68;4111.53;3099.15;216.95;1.21;216.94

Appreciate if you can get me started. I have a solaris machine, where i intend to run the scripts.

regards,

kris

chrism01 11-16-2009 11:16 PM

You'll need 'sort' with the -t option http://linux.die.net/man/1/sort, then comm http://linux.die.net/man/1/comm

novice82 11-17-2009 02:10 AM

Thanks for the tip.

But how do i re-direct the missing / un-matched rows to different files, so i can trace which file they belong to ?

Guttorm 11-17-2009 03:50 AM

Hi

I don't have a real solution, but maybe some ideas you can work on.

First sort the files. You should not need any options if you want it sorted on the first column.

sort temp1.txt >temp1.sorted.txt
sort temp2.txt >temp2.sorted.txt

Then use diff to compare the files.

diff -y temp1.sorted.txt temp2.sorted.txt

I don't know if the report is good enough to be understood. You can read "man diff" to see options you can use. For example, if you are not interested in lines that are equal, you can add --suppress-common-lines.

chrism01 11-17-2009 05:26 PM

If you look at the man page link I gave for comm, it shows you what args to use to extract which set of files.


All times are GMT -5. The time now is 11:08 AM.