LinuxQuestions.org - Need to diff two files as described below.

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Need to diff two files as described below. (https://www.linuxquestions.org/questions/linux-newbie-8/need-to-diff-two-files-as-described-below-4175603024/)

Need to diff two files as described below.

Hi All ,
I am new to linux shell scripting.
What i want is to compare two files, out of which the first file is my master file i.e. the file i will use as base and the second file is the messed up one, in which some enteries are missing and some extra are present. I need to know what enteries are missing and what are extra compared to the file1. Please help me.

Welcome.

If they are two text files, the usual way is with diff.

Quote:

Originally Posted by Turbocapitalist (Post 5691589)

Welcome.

If they are two text files, the usual way is with diff.

: No, the files not only contains text, but numbers and time stamp also.
Diff is not helping properly

Numbers, including time stamps, are text as far as computers are concerned. What goes wrong when you try diff for your data?

Also, can you go into more detail about the data and what kind of differences you are looking for? Some (sanitized) sample data would help, with examples of what you expect to find.

Quote:

Originally Posted by Turbocapitalist (Post 5691595)

ok, let me describe it with example :
Suppose the first file, means the base file is :

StartInstall, CDM_2.5B263, OK
EndInstall, CDM_2.5B263, SUCCESS
StartPatch, CDM_2.5.0.2B1, OK
StartPatch, CDM_2.5.0.3B1, OK
EndPatch, CDM_2.5.0.3B1, SUCCESS
StartPatch, CDM_2.5.0_SM-10866B2, OK
EndPatch, CDM_2.5.0_SM-10866B2, SUCCESS
StartPatch, CDM_2.5.0.REQUEST-6753B2, OK
StartPatch, CDM_2.5.0_SM-11515B2, OK
EndPatch, CDM_2.5.0_SM-11515B2, SUCCESS

and the second file is :

StartInstall, CDM_2.5B263, OK
EndInstall, CDM_2.5B263, SUCCESS
StartPatch, CDM_2.5.0_SM-11515B2, OK
EndPatch, CDM_2.5.0_SM-11515B2, SUCCESS

Third file shud be :
all the lines missed from file1 and with the sequence.
The start/END should be taken as one.

I see that diff works fine on that sample, in part because it is sorted / grouped. I get the following:

Code:

diff file1 file2

3,8d2

< StartPatch, CDM_2.5.0.2B1, OK

< StartPatch, CDM_2.5.0.3B1, OK 

< EndPatch, CDM_2.5.0.3B1, SUCCESS

< StartPatch, CDM_2.5.0_SM-10866B2, OK

< EndPatch, CDM_2.5.0_SM-10866B2, SUCCESS

< StartPatch, CDM_2.5.0.REQUEST-6753B2, OK

The < means that the line printed is present in the first file (file1) and missing in the second file (file2).

What is missing when you run it on a larger data set?

I usually find the following args to diff create a nice o/p

Code:

diff -Nuw origfile newfile >file.diff

Any decent editor eg vim will understand the o/p syntax (ie .diff extension) and colour code the file.diff file recs for ease of reading.

HTH