LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 03-22-2010, 05:07 PM   #1
thundervolt
LQ Newbie
 
Registered: Mar 2010
Posts: 5

Rep: Reputation: Disabled
Unhappy Filtering out duplicate lines from a find/grep output


Hi all,
I'm struggling a bit with this.
I have some big files of logs that contain errors printed by an app.
They are most of the time relevant, however most of them are similar.
So i figured i could check what happened between a time interval with a find.
Im using this one

Code:
find */application/*/app.log -type f -print0 | xargs -0 grep -E " 15:|16:1|16:2"
And I get an output similar to this one.
Code:
server1/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
server1/application/log/app.log:2010-Mar-22 15:20:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
server1/application/log/app.log:2010-Mar-22 16:25:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
Is there a way to condensate the output lines to get only one or two, indicating the start and last occurrence of a block?
Or I need to create a program to do so?

Because right now I get thousands of similar lines, but when I'm scrolling through them i sometimes miss relevant information that i would've otherwise noted if it wasn't all that spammy.

I hope my question is clear and you guys can help me,
Thanks in advance and regards.

Last edited by thundervolt; 03-22-2010 at 05:09 PM.
 
Old 03-22-2010, 08:08 PM   #2
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.6, Centos 5.10
Posts: 16,324

Rep: Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041
I'm 100% clear what you think you want to match/see..
However, if you just want lines with 'ERROR ..'

grep ERROR filename

if you want to see a few lines before and/or after such a line

grep ERROR -A3 -B3 filename

http://linux.die.net/man/1/grep (A=after, B=before)

If you want all lines in a time period say 15:10 - 16:10, you could try logwatch maybe? Otherwise I'd write Perl to do it. The problem (for time periods) is that although eg sed can pull out data based on a start line match and an end line match, if you can't guarantee the logfile will always(!) have a log rec for both given timestamps, you'll need to write your own more intelligent/flexible program.
Writing your own also means you can make it smart enough to only rtn recs you want to see in that time period.
 
Old 03-23-2010, 06:28 AM   #3
dsmyth
LQ Newbie
 
Registered: Mar 2010
Location: Glasgow, Scotland
Distribution: Fedora 12
Posts: 26
Blog Entries: 6

Rep: Reputation: 17
Hi, perhaps the program "uniq" will do the job.

Or maybe not, just noticed the date... sorry.
 
1 members found this post helpful.
Old 03-23-2010, 06:33 AM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
Maybe if we knew more about the errors you 'are' looking for we can help with a better regex?
 
Old 03-23-2010, 10:12 AM   #5
berbae
Member
 
Registered: Jul 2005
Location: France
Distribution: Arch Linux
Posts: 540

Rep: Reputation: Disabled
Maybe he can just try to append :

|uniq --count --skip-fields=2

to the command line he gives in the first post.
 
1 members found this post helpful.
Old 03-24-2010, 02:39 PM   #6
thundervolt
LQ Newbie
 
Registered: Mar 2010
Posts: 5

Original Poster
Rep: Reputation: Disabled
First of all, thanks a lot for your answers, they enlightened me.
And it was a very close approach

@chrism01 I know that I will always have logging every minute, many lines per minute. that's why the
Code:
 grep -E " 15:|16:1|16:2"
work for me.

And the problem with the grep only is that some files are so big that the have to be in tar, and grep can't read those (or i don't know how, but less does the work)

@grail basically the errors are like the ones I put in the OC but here are some more lines of errors.
Edit: the errors are on app.log and
server1/application/log/app.log: is outputted by my find, the only part that is really log is what is after that, Still, i used the 72 on the skip for the uniq, because as I understood it, the grep is being done after the results of the find, and are therefore required

Code:
server1/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
server1/application/log/app.log:2010-Mar-22 15:20:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
server1/application/log/app.log:2010-Mar-22 15:20:21,514 ERROR! unable to retrieve credentials
server1/application/log/app.log:2010-Mar-22 15:23:20,310 ERROR! unable to retrieve credentials
server1/application/log/app.log:2010-Mar-22 16:25:21,428 [ExecutorThread-50@Running_app] ERROR Exception while notifying event ...
server2/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-700@Running_app] ERROR Exception while notifying event ...
server2/application/log/app.log:2010-Mar-22 15:18:21,514 ERROR! unable to retrieve credentials
server2/application/log/app.log:2010-Mar-22 15:20:21,428 [ExecutorThread-50@Running_app] ERROR Exception while notifying event ...
server2/application/log/app.log:2010-Mar-22 15:20:21,514 ERROR! unable to retrieve credentials
server2/application/log/app.log:2010-Mar-22 16:25:21,428 [ExecutorThread-700@Running_app] ERROR Exception while notifying event ...
@dsmyth and berbae
That uniq was pretty close, I changed the number of characters that it should skip to do the comparison, but it didn't exactly gave me what I wanted.
I used this
Code:
 find */application/*/app.log -type f -print0 | xargs -0 grep -E " 15:|16:1|16:2" | uniq --count --skip-fields=72
but got something like this as a result
Code:
59959 server1/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
what i would like as a result is something more like this.

Code:
59959 server1/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
500 server1/application/log/app.log:2010-Mar-22 15:20:21,514 ERROR! unable to retrieve credentials
56600 server2/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-700@Running_app] ERROR Exception while notifying event ...
20 server2/application/log/app.log:2010-Mar-22 15:20:21,514 ERROR! unable to retrieve credentials

Thanks in advance,
Regards

Last edited by thundervolt; 03-24-2010 at 02:59 PM.
 
Old 03-24-2010, 03:03 PM   #7
thundervolt
LQ Newbie
 
Registered: Mar 2010
Posts: 5

Original Poster
Rep: Reputation: Disabled
I think I understand the mistake on this line
Code:
find */application/*/app.log -type f -print0 | xargs -0 grep -E " 15:|16:1|16:2" | uniq --count --skip-fields=72
If I tell it to skip 72 chars, it won't compare the servers and apps, and i need those to be compared, basically the only part i need to be avoided would be the date/hour because i want the net results for that timeframe, but aren't really interested in when it happened.
 
Old 03-24-2010, 06:45 PM   #8
thundervolt
LQ Newbie
 
Registered: Mar 2010
Posts: 5

Original Poster
Rep: Reputation: Disabled
I got this one working at the moment, I'm sure it is still perfectible, but it works fine at the moment
Code:
find */application/*/app.log -type f -print0 | xargs -0 grep "ERROR" | grep " 15:5" | sed 's/ [^[:space:]]*//' | sed 's/ [^[:space:]]*//' | sort | uniq -count -w 100
the seds, delete the date/hour, so the uniq doesn't have trouble sorting them together as similar logs

What do you guys think of this solution?

Edit: I added a sort and it does exactly what i wanted, the only problem is its slooooow.

Last edited by thundervolt; 03-24-2010 at 11:07 PM.
 
Old 03-24-2010, 10:49 PM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
Hey thundervolt

I was just wondering a few things:

1. Does this log only contain errors?
2. Based on information you have entered it appears that the first line required each time contains ",428" and the second line you
would like to retrieve always has ",514". Is this the case or just in the examples you have given?
3. Would it be possible for you to attach maybe a 100 or so lines from the log in a file to this thread? (Maybe help to give you better answers)
 
Old 03-24-2010, 11:11 PM   #10
thundervolt
LQ Newbie
 
Registered: Mar 2010
Posts: 5

Original Poster
Rep: Reputation: Disabled
Hi grail,
Well, no it doesn't always contain those numbers it was just an example I was giving
And no, it doesn't purely contain errors, but i was already grepping only the errors, so i had no problems there, i posted my final solution above your post (the actual implementation has a little more tweaks and seds to filter out info) but thats the solution I found.
 
Old 03-25-2010, 04:32 AM   #11
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
Based on you saying you wanted something more like:
Quote:
59959 server1/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
500 server1/application/log/app.log:2010-Mar-22 15:20:21,514 ERROR! unable to retrieve credentials
How about:

Code:
find -name app.log -exec awk 'BEGIN{f=0;g=0}$0 ~ /15:2.*\[/{k=$0;f=0;g=1}f && $0 ~ /ERROR!/{print k"\n"$0;g=0;f=0}g{f=1}' {} \;
This yielded your results requested above based on the small amount of input data provided.
 
1 members found this post helpful.
  


Reply

Tags
duplicate, filter, find, grep, output, strings


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Delete Duplicate Lines in a file, leaving only the unique lines left xmrkite Linux - Software 6 01-14-2010 07:18 PM
find / -type f -perm +6000 - 32 lines of output, should it be less ? crispyleif Linux - Security 1 06-20-2009 03:40 AM
possible with grep? find string and then output that and the rest of the file captain_cthulhu Linux - Newbie 4 05-18-2009 01:09 PM
Trying to understand pipes - Can't pipe output from tail -f to grep then grep again lostjohnny Linux - Newbie 15 03-12-2009 11:31 PM
grep output on stdout and grep output to file don't match xnomad Linux - General 3 01-13-2007 05:56 AM


All times are GMT -5. The time now is 09:34 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration