[SOLVED] Filtering out duplicate lines from a find/grep output
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Filtering out duplicate lines from a find/grep output
I'm struggling a bit with this.
I have some big files of logs that contain errors printed by an app.
They are most of the time relevant, however most of them are similar.
So i figured i could check what happened between a time interval with a find.
I´m using this one
If you want all lines in a time period say 15:10 - 16:10, you could try logwatch maybe? Otherwise I'd write Perl to do it. The problem (for time periods) is that although eg sed can pull out data based on a start line match and an end line match, if you can't guarantee the logfile will always(!) have a log rec for both given timestamps, you'll need to write your own more intelligent/flexible program.
Writing your own also means you can make it smart enough to only rtn recs you want to see in that time period.
First of all, thanks a lot for your answers, they enlightened me.
And it was a very close approach
@chrism01 I know that I will always have logging every minute, many lines per minute. that's why the
grep -E " 15:|16:1|16:2"
work for me.
And the problem with the grep only is that some files are so big that the have to be in tar, and grep can't read those (or i don't know how, but less does the work)
@grail basically the errors are like the ones I put in the OC but here are some more lines of errors.
Edit: the errors are on app.log and
server1/application/log/app.log: is outputted by my find, the only part that is really log is what is after that, Still, i used the 72 on the skip for the uniq, because as I understood it, the grep is being done after the results of the find, and are therefore required
server1/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
server1/application/log/app.log:2010-Mar-22 15:20:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
server1/application/log/app.log:2010-Mar-22 15:20:21,514 ERROR! unable to retrieve credentials
server1/application/log/app.log:2010-Mar-22 15:23:20,310 ERROR! unable to retrieve credentials
server1/application/log/app.log:2010-Mar-22 16:25:21,428 [ExecutorThread-50@Running_app] ERROR Exception while notifying event ...
server2/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-700@Running_app] ERROR Exception while notifying event ...
server2/application/log/app.log:2010-Mar-22 15:18:21,514 ERROR! unable to retrieve credentials
server2/application/log/app.log:2010-Mar-22 15:20:21,428 [ExecutorThread-50@Running_app] ERROR Exception while notifying event ...
server2/application/log/app.log:2010-Mar-22 15:20:21,514 ERROR! unable to retrieve credentials
server2/application/log/app.log:2010-Mar-22 16:25:21,428 [ExecutorThread-700@Running_app] ERROR Exception while notifying event ...
@dsmyth and berbae
That uniq was pretty close, I changed the number of characters that it should skip to do the comparison, but it didn't exactly gave me what I wanted.
I used this
If I tell it to skip 72 chars, it won't compare the servers and apps, and i need those to be compared, basically the only part i need to be avoided would be the date/hour because i want the net results for that timeframe, but aren't really interested in when it happened.
1. Does this log only contain errors?
2. Based on information you have entered it appears that the first line required each time contains ",428" and the second line you
would like to retrieve always has ",514". Is this the case or just in the examples you have given?
3. Would it be possible for you to attach maybe a 100 or so lines from the log in a file to this thread? (Maybe help to give you better answers)
Well, no it doesn't always contain those numbers it was just an example I was giving
And no, it doesn't purely contain errors, but i was already grepping only the errors, so i had no problems there, i posted my final solution above your post (the actual implementation has a little more tweaks and seds to filter out info) but thats the solution I found.