Bash Time Comparison within a fast moving log

snozebtn · 06-01-2010, 05:48 PM

Hey all,

As a long time creeper on this site, it is finally time to post my own question.

I have a bash script that is tracking ERROR connections (running in a cron every 15 minutes), sometimes the errors are true and are not connecting. But most of the time, the disconnects/reconnects are between 30 seconds are sending out false positives. I am looking for a time comparison code to pick out the 30 second disconnects and know that this is and "ok" error. Here is a section of the log with timestamps, errors, etc. If you need any more information please let me know, thanks in advance.

Tue 2010-06-01 22:01:30 ERROR [DataBufferSendService] 172.31.0.1:8016 caught exception: [NetworkAPI] Exception: [SocketStream] socket error
Tue 2010-06-01 22:01:32 ERROR [DataBufferSendService] 172.31.0.2:8016 caught exception: [NetworkAPI] Exception: [SocketStream] socket error
Tue 2010-06-01 22:02:00 INFO [Client] 172.31.0.1:8016 connecting to 172.31.0.1 on port 8016
Tue 2010-06-01 22:02:02 INFO [Client] 172.31.0.2:8016 connecting to 172.31.0.2 on port 8016

tvynrylan · 06-01-2010, 07:07 PM

I'm not sure what the criteria are that distinguish an "ok" error from a "bad" error. Unless you can articulate what makes an error one or the other, it won't be possible to write a program to calculate that separation. Can you elaborate on how, given an arbitrary chunk of log, you as a human would make that distinction?

If possible, the ideal solution would be to use the context available to the program at the time that it generates the error to distinguish between the two. But that presumes that you have the ability to edit its source code.

syg00 · 06-01-2010, 08:46 PM

Perl would seem the tool for this ...
If you insist on bash, I'd be taking fields 2 and 3 and passing them (together) to the date command to get seconds since epoch. Simple math after that.

grail · 06-01-2010, 09:26 PM

I am with tvynrylan on this one, are all the items you listed "ok"?
If so, we would need to see what distinguishes a "not ok" error?
If "not ok" is already here, which ones?

snozebtn · 06-02-2010, 12:39 AM

I apologize, let me see if I can clear this up....

I don't have access to the source code the is generating this log, just access to the logs themselves. The script I made tracks only ERRORS in the log and them emails them to myself, so I tend to get a lot of spam, I want to see if I can tighten my code a bit to send me true errors...

Speaking of which, an "ok" error is where the time stamp different is between less than or equal to 30 seconds.....

Tue 2010-06-01 22:01:32 ERROR [DataBufferSendService] 172.31.0.2:8016
and
Tue 2010-06-01 22:02:02 INFO [Client] 172.31.0.2:8016 connecting to 172.31.109.16 on port 8016

And a true error is if the time between them is greater than 30 seconds, then email that out. Hopefully that helps.

grail · 06-02-2010, 01:13 AM

Thanks, I think that is clearer, I just have one more question, is there potentially an unknown number of lines between the ERROR and INFO lines?

I ask as, assuming the number is unknown, it would then be necessary to identify the ERROR line (again I assume) by the IP:PORT grouping.

snozebtn · 06-02-2010, 02:14 AM

Yes that is correct...for example you have two ips that disconnect both displaying that error message, then you have the connect statement for both ips, so it looks like this in the logs....

Tue 2010-06-01 22:01:30 ERROR [DataBufferSendService] 172.31.0.1:8016
Tue 2010-06-01 22:01:32 ERROR [DataBufferSendService] 172.31.0.2:8016
Tue 2010-06-01 22:02:00 INFO [Client] 172.31.0.1:8016 connecting to 172.31.0.1 on port 8016
Tue 2010-06-01 22:02:02 INFO [Client] 172.31.0.2:8016 connecting to 172.31.0.2 on port 8016
Tue 2010-06-01 22:02:15 ERROR [DataBufferSendService] 172.31.0.1:8016
Tue 2010-06-01 22:02:17 ERROR [DataBufferSendService] 172.31.0.2:8016
Tue 2010-06-01 22:02:45 INFO [Client] 172.31.0.1:8016 connecting to 172.31.0.1 on port 8016
Tue 2010-06-01 22:02:47 INFO [Client] 172.31.0.2:8016 connecting to 172.31.0.2 on port 8016

And the pattern between the logs is disconnect 1 then 2, and then connect 1 then 2. As it is one line between the two associated IP:PORT.

tvynrylan · 06-02-2010, 07:37 AM

At the point at which you have multiple different IPs and ports, I'd move to a language such as Perl or Python which supports something approaching decent data structures. Perhaps something like the following:

Code:

#!/usr/bin/python

import datetime
import re
import sys

activity = {}
delay = datetime.timedelta(0,30) # sensitivity threshold: 30 seconds
pattern = re.compile("[0-9]+\\.[0-9]+\\.[0-9]+\\.[0-9]+")

for line in sys.stdin:
    # Get timestamp text
    tstamp = line[4:23]
    # Convert into usable data structure
    thisActivity = datetime.datetime.strptime(tstamp, '%Y-%m-%d %H:%M:%S')
    # Get activity IP address
    ip = pattern.search(line).group(0)
    # If we've seen it before, use the last known activity time; otherwise,
    # use the date 0001/01/01 to ensure that it gets seen.
    if ip in activity:
        lastActivity = activity[ip]
    else:
        lastActivity = datetime.datetime(datetime.MINYEAR, 1, 1)
    # Calculate time since last activity and print the line if more than our
    # threshold
    sinceActivity = thisActivity - lastActivity
    if sinceActivity > delay:
        sys.stdout.write(line)
    # Record this activity for the processing of future lines
    activity[ip] = thisActivity

This code does not handle the possibility of other lines between the relevant ones; I suggest you stream your log through egrep to match the lines you want before feeding it into this script. And for sake of documentation, I hereby release the above silly scrap code into the public domain. :-P

snozebtn · 06-04-2010, 11:59 AM

Unfortunately, I am not to familiar with python but thats not to much of a problem, give me a reason to learn it. But for the sake of this script, I think I will migrate it into perl since this does seem a super steep mountain to climb using BASH. Thanks for all the help, tvynrylan you code will be used to help me logic-out the perl.