LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   create an error table? finding strings, and counting... in bash (https://www.linuxquestions.org/questions/programming-9/create-an-error-table-finding-strings-and-counting-in-bash-651792/)

elinenbe 06-26-2008 06:52 AM

create an error table? finding strings, and counting... in bash
 
I have a script that I wrote that searches an error log file for known errors, counts them, and then display statistics at the end. However it runs slow as molasses. I use grep and two loops to go through everything.

Here is an example of the file:

Code:

04/02/08:20:16:57 - y:\logs: 04/02/08 20:16:57.300 - No valid sum
04/03/08:05:04:38 - y:\logs: 04/03/08 05:04:38.759 - ID does not match
04/03/08:05:15:16 - y:\logs: 04/03/08 05:15:16.695 - Wrong Batch
04/03/08:05:26:41 - y:\logs: 04/03/08 05:26:41.461 - Unknown Exception
04/03/08:05:30:41 - y:\logs: 04/03/08 05:30:41.289 - I Am A Bad Error
04/03/08:06:00:58 - y:\logs: 04/03/08 06:00:58.633 - Wrong Batch
04/03/08:06:00:58 - y:\logs: 04/03/08 06:00:58.633 - Wrong Error
04/03/08:06:00:58 - y:\logs: 04/03/08 06:00:58.633 - Unknown Exception
04/03/08:06:00:58 - y:\logs: 04/03/08 06:00:58.633 - I Am A Bad Error

Now what I have is a list of acceptable errors:
Code:

okerror(
      "No valid sum"
      "ID does not match"
      "Wrong Batch"
      "Unknown Exception"
      )

When the script is run, I'd like an output file something like
Code:

OK Errors:
1 No valid sum
1 ID does not match
2 Wrong Batch
2 Unknown Exception
BAD Errors:
2 I Am A Bad Error
1 Wrong Error

The bad errors can be anything that is not in the okerror array. I just think that someone here could do something better than what I have, as it almost takes a second per line. I was thinking something along the lines of "grep -f" or something, but I just can't come up with something very elegant.

Thanks,
Eric

ntubski 06-26-2008 08:59 AM

I think a perl/python hash table based solution would probably be faster, but this might be fast enough. I got 35000 lines in 0.7 seconds (just your sample file duplicated). The only thing that annoys me is the need for a temp file, if only tee could send a copy to another process...

I assumed that the "-" is a delimiter, if it shows up in the error messages or the times/locations this won't work.

Code:

#!/bin/sh

okerror="No valid sum|ID does not match|Wrong Batch|Unknown Exception"

cut -d- -f3 logfile | sort | uniq -c > counts

echo OK Errors:
egrep "$okerror" counts

echo BAD Errors:
egrep -v "$okerror" counts


elinenbe 06-26-2008 09:53 AM

WOW! This is super-slick.

Can you explain what this line does a little?

Code:

cut -d- -f3 logfile | sort | uniq -c > counts
I think I have it...

cut's each line of the logfile at the third dash, then sorts it, and counts the unique instances of each line. Very nice. It helps to know about these gnu utilities. So much for the crap I wrote.

Thanks!

ntubski 06-26-2008 03:49 PM

Quote:

think I have it...
yup, that's right.

You can always run each part of the pipeline separately to see what it does:
Code:

~/tmp$ cut -d- -f3 logfile
 No valid sum
 ID does not match
 Wrong Batch
 Unknown Exception
 I Am A Bad Error
 Wrong Batch
 Wrong Error
 Unknown Exception
 I Am A Bad Error
~/tmp$ cut -d- -f3 logfile | sort
 I Am A Bad Error
 I Am A Bad Error
 ID does not match
 No valid sum
 Unknown Exception
 Unknown Exception
 Wrong Batch
 Wrong Batch
 Wrong Error
~/tmp$ cut -d- -f3 logfile | sort | uniq -c
      2  I Am A Bad Error
      1  ID does not match
      1  No valid sum
      2  Unknown Exception
      2  Wrong Batch
      1  Wrong Error



All times are GMT -5. The time now is 01:23 PM.