Filtering out duplicate lines from a find/grep output
Hi all,
I'm struggling a bit with this. I have some big files of logs that contain errors printed by an app. They are most of the time relevant, however most of them are similar. So i figured i could check what happened between a time interval with a find. I´m using this one Code:
find */application/*/app.log -type f -print0 | xargs -0 grep -E " 15:|16:1|16:2" Code:
server1/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ... Or I need to create a program to do so? Because right now I get thousands of similar lines, but when I'm scrolling through them i sometimes miss relevant information that i would've otherwise noted if it wasn't all that spammy. I hope my question is clear and you guys can help me, Thanks in advance and regards. |
I'm 100% clear what you think you want to match/see..
However, if you just want lines with 'ERROR ..' grep ERROR filename if you want to see a few lines before and/or after such a line grep ERROR -A3 -B3 filename http://linux.die.net/man/1/grep (A=after, B=before) If you want all lines in a time period say 15:10 - 16:10, you could try logwatch maybe? Otherwise I'd write Perl to do it. The problem (for time periods) is that although eg sed can pull out data based on a start line match and an end line match, if you can't guarantee the logfile will always(!) have a log rec for both given timestamps, you'll need to write your own more intelligent/flexible program. Writing your own also means you can make it smart enough to only rtn recs you want to see in that time period. |
Hi, perhaps the program "uniq" will do the job.
Or maybe not, just noticed the date... sorry. |
Maybe if we knew more about the errors you 'are' looking for we can help with a better regex?
|
Maybe he can just try to append :
|uniq --count --skip-fields=2 to the command line he gives in the first post. |
First of all, thanks a lot for your answers, they enlightened me.
And it was a very close approach @chrism01 I know that I will always have logging every minute, many lines per minute. that's why the Code:
grep -E " 15:|16:1|16:2" And the problem with the grep only is that some files are so big that the have to be in tar, and grep can't read those (or i don't know how, but less does the work) @grail basically the errors are like the ones I put in the OC but here are some more lines of errors. Edit: the errors are on app.log and server1/application/log/app.log: is outputted by my find, the only part that is really log is what is after that, Still, i used the 72 on the skip for the uniq, because as I understood it, the grep is being done after the results of the find, and are therefore required Code:
server1/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ... That uniq was pretty close, I changed the number of characters that it should skip to do the comparison, but it didn't exactly gave me what I wanted. I used this Code:
find */application/*/app.log -type f -print0 | xargs -0 grep -E " 15:|16:1|16:2" | uniq --count --skip-fields=72 Code:
59959 server1/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ... Code:
59959 server1/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ... Thanks in advance, Regards |
I think I understand the mistake on this line
Code:
find */application/*/app.log -type f -print0 | xargs -0 grep -E " 15:|16:1|16:2" | uniq --count --skip-fields=72 |
I got this one working at the moment, I'm sure it is still perfectible, but it works fine at the moment
Code:
find */application/*/app.log -type f -print0 | xargs -0 grep "ERROR" | grep " 15:5" | sed 's/ [^[:space:]]*//' | sed 's/ [^[:space:]]*//' | sort | uniq -count -w 100 What do you guys think of this solution? Edit: I added a sort and it does exactly what i wanted, the only problem is its slooooow. |
Hey thundervolt
I was just wondering a few things: 1. Does this log only contain errors? 2. Based on information you have entered it appears that the first line required each time contains ",428" and the second line you would like to retrieve always has ",514". Is this the case or just in the examples you have given? 3. Would it be possible for you to attach maybe a 100 or so lines from the log in a file to this thread? (Maybe help to give you better answers) |
Hi grail,
Well, no it doesn't always contain those numbers it was just an example I was giving And no, it doesn't purely contain errors, but i was already grepping only the errors, so i had no problems there, i posted my final solution above your post (the actual implementation has a little more tweaks and seds to filter out info) but thats the solution I found. |
Based on you saying you wanted something more like:
Quote:
Code:
find -name app.log -exec awk 'BEGIN{f=0;g=0}$0 ~ /15:2.*\[/{k=$0;f=0;g=1}f && $0 ~ /ERROR!/{print k"\n"$0;g=0;f=0}g{f=1}' {} \; |
All times are GMT -5. The time now is 07:05 PM. |