[SOLVED] Filtering out duplicate lines from a find/grep output
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Filtering out duplicate lines from a find/grep output
Hi all,
I'm struggling a bit with this.
I have some big files of logs that contain errors printed by an app.
They are most of the time relevant, however most of them are similar.
So i figured i could check what happened between a time interval with a find.
I´m using this one
server1/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
server1/application/log/app.log:2010-Mar-22 15:20:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
server1/application/log/app.log:2010-Mar-22 16:25:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
Is there a way to condensate the output lines to get only one or two, indicating the start and last occurrence of a block?
Or I need to create a program to do so?
Because right now I get thousands of similar lines, but when I'm scrolling through them i sometimes miss relevant information that i would've otherwise noted if it wasn't all that spammy.
I hope my question is clear and you guys can help me,
Thanks in advance and regards.
Last edited by thundervolt; 03-22-2010 at 04:09 PM.
If you want all lines in a time period say 15:10 - 16:10, you could try logwatch maybe? Otherwise I'd write Perl to do it. The problem (for time periods) is that although eg sed can pull out data based on a start line match and an end line match, if you can't guarantee the logfile will always(!) have a log rec for both given timestamps, you'll need to write your own more intelligent/flexible program.
Writing your own also means you can make it smart enough to only rtn recs you want to see in that time period.
First of all, thanks a lot for your answers, they enlightened me.
And it was a very close approach
@chrism01 I know that I will always have logging every minute, many lines per minute. that's why the
Code:
grep -E " 15:|16:1|16:2"
work for me.
And the problem with the grep only is that some files are so big that the have to be in tar, and grep can't read those (or i don't know how, but less does the work)
@grail basically the errors are like the ones I put in the OC but here are some more lines of errors.
Edit: the errors are on app.log and
server1/application/log/app.log: is outputted by my find, the only part that is really log is what is after that, Still, i used the 72 on the skip for the uniq, because as I understood it, the grep is being done after the results of the find, and are therefore required
Code:
server1/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
server1/application/log/app.log:2010-Mar-22 15:20:21,428 [ExecutorThread-18@Running_app] ERROR Exception while notifying event ...
server1/application/log/app.log:2010-Mar-22 15:20:21,514 ERROR! unable to retrieve credentials
server1/application/log/app.log:2010-Mar-22 15:23:20,310 ERROR! unable to retrieve credentials
server1/application/log/app.log:2010-Mar-22 16:25:21,428 [ExecutorThread-50@Running_app] ERROR Exception while notifying event ...
server2/application/log/app.log:2010-Mar-22 15:16:21,428 [ExecutorThread-700@Running_app] ERROR Exception while notifying event ...
server2/application/log/app.log:2010-Mar-22 15:18:21,514 ERROR! unable to retrieve credentials
server2/application/log/app.log:2010-Mar-22 15:20:21,428 [ExecutorThread-50@Running_app] ERROR Exception while notifying event ...
server2/application/log/app.log:2010-Mar-22 15:20:21,514 ERROR! unable to retrieve credentials
server2/application/log/app.log:2010-Mar-22 16:25:21,428 [ExecutorThread-700@Running_app] ERROR Exception while notifying event ...
@dsmyth and berbae
That uniq was pretty close, I changed the number of characters that it should skip to do the comparison, but it didn't exactly gave me what I wanted.
I used this
If I tell it to skip 72 chars, it won't compare the servers and apps, and i need those to be compared, basically the only part i need to be avoided would be the date/hour because i want the net results for that timeframe, but aren't really interested in when it happened.
1. Does this log only contain errors?
2. Based on information you have entered it appears that the first line required each time contains ",428" and the second line you
would like to retrieve always has ",514". Is this the case or just in the examples you have given?
3. Would it be possible for you to attach maybe a 100 or so lines from the log in a file to this thread? (Maybe help to give you better answers)
Hi grail,
Well, no it doesn't always contain those numbers it was just an example I was giving
And no, it doesn't purely contain errors, but i was already grepping only the errors, so i had no problems there, i posted my final solution above your post (the actual implementation has a little more tweaks and seds to filter out info) but thats the solution I found.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.