LinuxQuestions.org - create an error table? finding strings, and counting... in bash

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - create an error table? finding strings, and counting... in bash (https://www.linuxquestions.org/questions/programming-9/create-an-error-table-finding-strings-and-counting-in-bash-651792/)

create an error table? finding strings, and counting... in bash

I have a script that I wrote that searches an error log file for known errors, counts them, and then display statistics at the end. However it runs slow as molasses. I use grep and two loops to go through everything.

Here is an example of the file:

Code:

04/02/08:20:16:57 - y:\logs: 04/02/08 20:16:57.300 - No valid sum

04/03/08:05:04:38 - y:\logs: 04/03/08 05:04:38.759 - ID does not match

04/03/08:05:15:16 - y:\logs: 04/03/08 05:15:16.695 - Wrong Batch

04/03/08:05:26:41 - y:\logs: 04/03/08 05:26:41.461 - Unknown Exception

04/03/08:05:30:41 - y:\logs: 04/03/08 05:30:41.289 - I Am A Bad Error

04/03/08:06:00:58 - y:\logs: 04/03/08 06:00:58.633 - Wrong Batch

04/03/08:06:00:58 - y:\logs: 04/03/08 06:00:58.633 - Wrong Error

04/03/08:06:00:58 - y:\logs: 04/03/08 06:00:58.633 - Unknown Exception

04/03/08:06:00:58 - y:\logs: 04/03/08 06:00:58.633 - I Am A Bad Error

Now what I have is a list of acceptable errors:

Code:

okerror(

      "No valid sum"

      "ID does not match"

      "Wrong Batch"

      "Unknown Exception"

      )

When the script is run, I'd like an output file something like

Code:

OK Errors:

1 No valid sum

1 ID does not match

2 Wrong Batch

2 Unknown Exception

BAD Errors:

2 I Am A Bad Error

1 Wrong Error

The bad errors can be anything that is not in the okerror array. I just think that someone here could do something better than what I have, as it almost takes a second per line. I was thinking something along the lines of "grep -f" or something, but I just can't come up with something very elegant.

Thanks,
Eric

I think a perl/python hash table based solution would probably be faster, but this might be fast enough. I got 35000 lines in 0.7 seconds (just your sample file duplicated). The only thing that annoys me is the need for a temp file, if only tee could send a copy to another process...

I assumed that the "-" is a delimiter, if it shows up in the error messages or the times/locations this won't work.

Code:

#!/bin/sh



okerror="No valid sum|ID does not match|Wrong Batch|Unknown Exception"



cut -d- -f3 logfile | sort | uniq -c > counts



echo OK Errors:

egrep "$okerror" counts



echo BAD Errors:

egrep -v "$okerror" counts

WOW! This is super-slick.

Can you explain what this line does a little?

Code:

cut -d- -f3 logfile | sort | uniq -c > counts

I think I have it...

cut's each line of the logfile at the third dash, then sorts it, and counts the unique instances of each line. Very nice. It helps to know about these gnu utilities. So much for the crap I wrote.

Thanks!

Quote:

think I have it...

yup, that's right.

You can always run each part of the pipeline separately to see what it does:

Code:

~/tmp$ cut -d- -f3 logfile

 No valid sum

 ID does not match

 Wrong Batch

 Unknown Exception

 I Am A Bad Error

 Wrong Batch

 Wrong Error

 Unknown Exception

 I Am A Bad Error

~/tmp$ cut -d- -f3 logfile | sort

 I Am A Bad Error

 I Am A Bad Error

 ID does not match

 No valid sum

 Unknown Exception

 Unknown Exception

 Wrong Batch

 Wrong Batch

 Wrong Error

~/tmp$ cut -d- -f3 logfile | sort | uniq -c

      2  I Am A Bad Error

      1  ID does not match

      1  No valid sum

      2  Unknown Exception

      2  Wrong Batch

      1  Wrong Error