LinuxQuestions.org - Writing a script that compares two different files

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Writing a script that compares two different files (https://www.linuxquestions.org/questions/linux-newbie-8/writing-a-script-that-compares-two-different-files-893929/)

Writing a script that compares two different files

Hi, I'm having trouble figuring out how to match to find matches in two different files when comparing timestamps. The fields I'm wanting to match up are in the format:

Jul 26 09:33:02

I have tried reading the file line by line and using
awk '{print $1,$2,$3}' which only gets and stores the timestamp in one of the files. I've been looking around and saw this example:

awk 'FNR==NR{!a[$3]++;next }{ b[$3]++ }
END{
for(i in a){
for(k in b){
if (a[i]==1 && i ~ k ) { print i }
}
}
}' $FILE $FILE2

Which sorta works but its way over my head at the moment. The two files can be found in your /var/log/syslog and /var/log/auth.log (using Ubuntu 11.04)

Thank You

Hi, welcome to LQ!

Quote:

I have tried reading the file line by line and using
awk '{print $1,$2,$3}' which only gets and stores the timestamp in one of the files. I've been looking around and saw this example:

Code:

awk '

FNR==NR{

  !a[$3]++;

  next 

}

{ 

  b[$3]++ 

}

END{

  for(i in a){

    for(k in b){

      if (a[i]==1 && i ~ k ) {  

        print i 

      }

    }

  }

}' $FILE $FILE2

Which sorta works but its way over my head at the moment. The two files can be found in your /var/log/syslog and /var/log/auth.log (using Ubuntu 11.04)

So if this "sort of works" - what is your question?

Cheers,
Tink

Ah thanks. You know I reference this site a lot but I haven't actually posted anything here =) Anyways, my goal is to take an entry from syslog for instance:

Jul 26 11:35:44 bdouglas kernel: [70761.603498] usb 2-1.1.4: new high speed USB device using ehci_hcd and address 12

and an entry from auth.log:

Jul 26 13:17:01 bdouglas CRON[11888]: pam_unix(cron:session): session closed for user root

And compare the contents of both log files by their timestamps. If their timestamps match the exact hour:min:sec, I want both entries printed.

I don't know if this is less over your head, but... (I don't know awk, unfortunately, but I can understand bash):

Code:

while read line;

do

    grep "$(echo $line | grep -o '^[[:alpha:]]\{3\} [[:digit:]]\{2\} [[:digit:]]\{2\}\:[[:digit:]]\{2\}\:[[:digit:]]\{2\}')" /var/log/syslog;

done < /var/log/auth.log

It looks a bit scary, but that whole regular-expression just matches something in the format "Jul 26 11:35:44" at the beginning of the line. So all it's doing is looping through each line in auth.log, finding the bit that matches (the timestamp), and searching through syslog to find any lines which match this, then printing them.

Hope this helps,

Hmm. That is pretty straight forward. I like what I see so far. I can do an awk $3 which grabs the 3rd field not separated by spaces. I like how you feed a file into another file, I was getting the impression you were overwriting the auth.log file but the alligator is pointing the other way. I'll play with this and see what I can come up with. I would also like to see other variations of doing this since it looks a bit long winded. Thank You

And an "awk" method ..

Code:

FNR==NR{

  a[$1" "$2" "$3]=$0



}

FNR<NR{

  b[$1" "$2" "$3]=$0

}

END{

  for(i in a){

    #print "I: "i

    for(k in b){

    #print "J: "k

      if ( i == k ) {

        print a[i]

        print b[k]

      }

    }

  }

}

Where does file 1 and file 2 go exactly. Having a hard time visualizing. Thanks

Here's a Perl version; easy to read regexes and very flexible.
It assumes no more than one match per second, re-opens the 2nd file for each rec in first file, much like post#4 soln.
You could amend it to actually compare the dates in meaningful terms, ie so it knows when its passed the date/time in the 2nd file and doesn't waste time checking further recs; otoh this would mean checking all recs until a match or EOF...
Perl is very quick, so you prob don't need to worry about date matching.

Code:

#!/usr/bin/perl -w

use strict;            # Enforce declarations



my (

    $syslog_file, $s_rec, $s_mth, $s_day, $s_time,

    $auth_file, $a_rec, $a_mth, $a_day, $a_time,

    );



$syslog_file='syslog_tmp';

$auth_file='auth_tmp';



open( S_FILE, "<$syslog_file" ) or

            die "Can't open syslog_file: $syslog_file: $!\n";

while ( defined ( $s_rec = <S_FILE> ) )

{

    chomp($s_rec);

    ($s_mth, $s_day, $s_time) = (split(/\s+/, $s_rec))[0..2];



#DEBUG

#print "$s_mth, $s_day, $s_time\n";



    open( A_FILE, "<$auth_file" ) or

                die "Can't open auth_file: $auth_file: $!\n";

    while ( defined ( $a_rec = <A_FILE> ) )

    {

        chomp($a_rec);

        ($a_mth, $a_day, $a_time) = (split(/\s+/, $a_rec))[0..2];



#DEBUG

#print "$a_mth, $a_day, $a_time\n";



        if( $s_mth eq $a_mth && $s_day eq $a_day && $s_time eq $a_time )

        {

            print "$s_rec\n$a_rec\n\n";

            last;

        }

    }

    close(A_FILE) or die "Can't close auth_file: $auth_file: $!\n";;



}

close(S_FILE) or die "Can't close syslog_file: $syslog_file: $!\n";;

Quote:

Originally Posted by random0munky (Post 4426090)

Where does file 1 and file 2 go exactly. Having a hard time visualizing. Thanks

If you save the above as say munky.awk

Code:

awk -f munky file1 file2

Ah gotcha gotcha I'll take a look at it thank you for the reply