Comparing two files

ab52 · 11-30-2010, 03:31 PM

I have two text files i want to compare the differances between but i dont wnat all of them, there is only about 30lines of relvent text i want to compare

any ideas, either perl or bsah

thanks
Adam

GrapefruiTgirl · 11-30-2010, 03:37 PM

Hi there,

probably better to provide some more information in order for folks to give some decent suggestions. I.e. do you want to compare 30 lines in one file, to an entire second file? Or compare 30 lines in one file to 30 lines in the second file? Maybe you want to compare the first 30 lines of each file, then quit? Are the lines consecutive in each file? What kind of data is in each file - plain alphabet soup, or some sort of XML?

In cases like this, it's often a great idea to show us snippets of the data files, and show what sort of output you expect.

Do you know any Perl or Bash? I'd lean towards Perl so far for this, but who knows, it may be a relatively easy job, and bash might do it. Maybe neither of these tools will seem to be the right one, once the problem is better understood.

Cheers!

ab52 · 11-30-2010, 03:46 PM

ok thanks, i will get you some exmaple to have a look at

my programming skills in limited to non

thanks
Adam

XavierP · 11-30-2010, 03:59 PM

Moved: This thread is more suitable in Programming and has been moved accordingly to help your thread/question get the exposure it deserves.

garyg007 · 11-30-2010, 04:51 PM

are you looking for something like this excerpt from a perl document I found?

Code:

Doing String Selections (Parsing)
If regular expressions' only benefit was looking for a (albeit complex)
string within a string, it wouldn't be worth learningl. Regular expressions
(and Perl itself, for that matter) really start earning their keep by allowing
you to select and process substrings based on what they contain, and the
context in which they appear.
For instance, create a program whose input is a piped in directory
command and whose output is stdout, and whose output represents a batch
file which copies every file (not directory) older than 12/22/97 to a
directory called \oldie. This would be pretty nasty in C or C++. The
directory output would look something like this:
  Volume in drive D has no label
  Volume Serial Number is 4547-15E0
  Directory of D:\polo\marco
.                   <DIR>            12-18-97 11:14a .
..                  <DIR>            12-18-97 11:14a ..
INDEX       HTM             3,237    02-06-98 3:12p index.htm
APPDEV      HTM             6,388    12-24-97 5:13p appdev.htm
NORM        HTM             5,297    12-24-97 5:13p norm.htm
IMAGES              <DIR>            12-18-97 11:14a images
TCBK        GIF               532    06-02-97 3:14p tcbk.gif
LSQL        HTM             5,027    12-24-97 5:13p lsql.htm
CRASHPRF    HTM            11,403    12-24-97 5:13p crashprf.htm
WS_FTP   LOG            5,416 12-24-97 5:24p WS_FTP.LOG
FIBB     HTM           10,234 12-24-97 5:13p fibb.htm
MEMLEAK HTM            19,736 12-24-97 5:13p memleak.htm
LITTPERL        <DIR>            02-06-98 1:58p littperl
         9 file(s)              67,270 bytes
         4 dir(s)        132,464,640 bytes free
UUUUgly! I'd hate to do this in C or C++. But wait. It's 18 lines in Perl?
while(<STDIN>)
  {
  my($line) = $_;
  chomp($line);
  if($line !~ /<DIR>/)               #directories don't count
    {
    #** only lines with dates at position 28 and (long) filename at pos 44 **
    if ($line =~ /.{28}(\d\d)-(\d\d)-(\d\d).{8}(.+)$/)
      {
      my($filename) = $4;
      my($yymmdd) = "$3$1$2";
      if($yymmdd lt "971222")
        {
        print "copy $filename \\oldie\n";
        }
      }
    }
  }

The above snippet came from [quote]Troubleshooters.Com and Code Corner
Present
Steve Litt's Perls of Wisdom:
Perl Regular
Expressions
(With Snippets)
[/code]

Lsatenstein · 11-30-2010, 08:38 PM

Quote:

Originally Posted by ab52

I have two text files i want to compare the differences between but i dont want all of them, there is only about 30lines of relevent text i want to compare

any ideas, either perl or bsah

thanks
Adam

I am not sure from your request if you wanted it done in a view mode or in batch. There are quite a few text editors that allow you to open two files and do a compare between them. One editor that I used in Windows (yes, where I found that tool, allowed me to see what was inserted and removed by file, using colors.

Mark1986 · 12-01-2010, 06:13 AM

If you are using Windows you can use TextDiff. If you are using some Linux version, you might want to try sdiff. It has built-in options to compare only those lines you want to compare. It is, however, used in command line. It can become a bit nasty when you compare long lines.

frogweasel · 12-01-2010, 07:20 AM

I agree with others that an appropriate editor is the best choice, but you seem to want to script this.
If so, and if the files are are of a predictable length and format, this may work for you:

For simplicity, assume two files of 10 lines each.
You want to compare lines 5-7 only.

head -7 filename | tail -3 > /tmp/temp.txt (create a file with the lines to be compared)

Do that with both files and use diff or sdiff to compare.

If the file formats are not predictable, additional work will have to be done.

dannybpng · 12-01-2010, 09:26 AM

SED (stream editor) would be a possible choice. Here is the way to get a range of lines out of files and use diff on them.

Print lines 5 to 10 inclusive:
sed -n '5,10p' file1.txt > section1.txt

Print lines starting with the line beginning with "START" till a line beginning with "END":
sed -n '/^START/,/^END/p' file2.txt > section2.txt

diff section1.txt section2.txt

Dan

archtoad6 · 12-01-2010, 10:20 AM

Edit: Dan posted while I was composing. My suggestion is now a bit redundant.

sed can do the line selection in one step:

Code:

sed -n '5,7p' file1 > temp1

IMNRHO, this problem is too simple to bother w/ Perl. -- I see it as a 3-liner in bash.
Generalizing the line #s to <w>,<x>,<y>,<z>:

Code:

sed -n '<w>,<x>p' file1 > temp1
sed -n '<y>,<z>p' file2 > temp2
diff temp1 temp2  | less -S#33

If the size of the files does not make the process too long, you could diff the files 1st & use sed or grep to disregard the irrelevant. This would avoid the creation of temp files.

johannes121 · 12-01-2010, 11:08 AM

Quote:

Originally Posted by archtoad6

Code:

sed -n '<w>,<x>p' file1 > temp1
sed -n '<y>,<z>p' file2 > temp2
diff temp1 temp2  | less -S#33

If the size of the files does not make the process too long, you could diff the files 1st & use sed or grep to disregard the irrelevant. This would avoid the creation of temp files.

Or you could just do it as a one-liner (without temp files):

Code:

diff <(sed -n '<w>,<x>p' file1) <(sed -n '<w>,<x>p' file2)