LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-03-2011, 05:03 PM   #1
Kitherel
LQ Newbie
 
Registered: Aug 2011
Posts: 6

Rep: Reputation: Disabled
Question Log analysis using awk, grep, cut etc.


Greetings,

I am pretty new to linux and would ask for assistance in extracting certian records from large (600mb) log files, now fairly comfortable using basic grep and cut commands to view certain entries in our log files, but now I am looking to get more extravegant with my extracts.

The log files being used are laid out as follows:

[<full date and timestamp>] <process number> <process name> <error message/general information>

The field positions are fixed, so knowing what characters the process id will start and end at for example isnt a problem.

I am looking for a way of searching the file for a specific error code 'Timeout Error', then using the process ID and a timestamp I need to find a message within a certain thrushold, ie in previous 45seconds, the log entry with the same process ID and information detail 'Sending Order to Booking System'

I would assume that this will require some form of awk usage to filter records based on timestamp calculation and process ID etc. As far as I have worked out the fields defined in awk $1 $2 etc.. do not match the output, therefore the fields will need to be identified by character position I assume although there are spaces between fields.

Please could someone kindly offer a few pointers or suitable script for this requirement?

Thank you for any help you can provide.

Kitherel
 
Old 08-03-2011, 07:09 PM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
TBH, I'd use Perl for this, especially given you want to match one rec, then (effectively) backtrack up to 45 secs to find preceding 'matching' rec.
Its extremely good at this sort of thing.
If you are interested, here's some good links
http://perldoc.perl.org/
http://www.perlmonks.org/?node=Tutorials
 
Old 08-03-2011, 08:07 PM   #3
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Completely concur with chrism01. Perl is the way to go for that. We love to help here, but we aren't so inclined to do the whole thing for you. For starters, try reading the file, line at a time ( while(<>){ } ), and breaking each one into its respective fields. Hint:
Code:
perldoc -f substr
--- rod.
 
Old 08-04-2011, 04:22 AM   #4
Kitherel
LQ Newbie
 
Registered: Aug 2011
Posts: 6

Original Poster
Rep: Reputation: Disabled
Smile

Thank you for your advice chrism01 and theNbomr, I will need to see if I can run purl on the servers where the logs are held since they fairly secure and restricted.

I guess alternatively I could grep all the occurances required and process the required condtions as a two stage process on a time reversed log.

Thanks again for your suggestions and links.

Kith
 
Old 08-04-2011, 06:38 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,254

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
Well if you could provide some data I think awk could also give it a whirl (perhaps). Just need to get a better picture of the data and how it changes.
 
Old 08-04-2011, 07:15 PM   #6
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
Perl is usually part of the std install on most versions of Linux, so it shouldn't be a problem.
 
Old 08-04-2011, 11:30 PM   #7
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
Originally Posted by Kitherel View Post
I guess alternatively I could grep all the occurances required and process the required condtions as a two stage process on a time reversed log.
Yes, that is one alternative, but why use the wrong tool when the right one is available? With 600 MB of data, speed is probably an issue, too. Perl is generally considered the fastest tool/language for text processing.

--- rod.
 
Old 08-05-2011, 04:45 AM   #8
Kitherel
LQ Newbie
 
Registered: Aug 2011
Posts: 6

Original Poster
Rep: Reputation: Disabled
It seems that I am not going to be able to use purl for security reasons, not under my control. I shall download a sample of the file and post it here.

Thanks Grail for your kind offer to give me a hand with awk, any pointers is appreciated

Will post a sample on monday night when im back in the mad house.

Kith
 
Old 08-05-2011, 05:00 AM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 14,842

Rep: Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823
I jumped from bash scripting to perl, bypassing awk.
Thanks to some of the awk cognoscenti here on LQ (one of whom has appeared in this thread) I have managed to acquire some limited capability. I would be most surprised if awk couldn't handle this "window" requirement (almost ?) as well as perl.
Especially on such limited data. On current systems I'd expect that amount of data to remain in the pagecache - re-reading it would be "zero-cost" even for multiple grep invocations or similar.
 
Old 08-09-2011, 04:05 PM   #10
Kitherel
LQ Newbie
 
Registered: Aug 2011
Posts: 6

Original Poster
Rep: Reputation: Disabled
Sorry for the delay in responding (usual chaos going on),

I have posted: http://pastebin.com/pEw5Sb0k a sample of my grepped logs to pick out relevant lines only.

Once grepped, the logs only have 4 possible lines: a process start, process end, process elapsed timer and the entry I am particularly interested in, the read time out.

What I would like to produce if possible is to produce a list of start lines that match the read time out messages (eventually filtered by store ID). For every start msg theres either an end msg and elapsed msg or a read time out.

The only way to match these is to either eliminate every other start /end/elapsed line along the way, or how I would prefer to do it, match the start msg and read time out msg with the same process ID (00000hhh) within a specified thrushold ie within 1 minute of each other.

Its been a long day, I hope ive explained it in a understandable manner.

Any help would be appreciated.

Thanks for the help and advice given thus far.

Kith.

Last edited by Kitherel; 08-09-2011 at 04:07 PM.
 
Old 08-09-2011, 07:58 PM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,254

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
Based on the file you have attached, would you please provide an example of what the output would look like?
 
Old 08-10-2011, 04:27 PM   #12
Kitherel
LQ Newbie
 
Registered: Aug 2011
Posts: 6

Original Poster
Rep: Reputation: Disabled
Grail,

Thanks for bearing with me and my slow responses,

For the most part , I just need one output line per start time line, to show an indication that it Timed out or if it completed, what the end time was (if possible the duration of the process but that would be a luxury.

[08/08/11 16:11:23:245 GMT+01:00] 00000184 The start time of Galaxy invocation of ORDERS on Store 10693 Attempt Timed Out at 16:11:43:650 taking 00:00:20:405

[08/08/11 16:11:42:181 GMT+01:00] 0000016c The start time of Galaxy invocation of ORDERS on Store 10651 Ended at 16:11:43:250 taking 00:00:01:69

Any informaiton / examples that helps me understand how this might be done in awk would be appreciated.

Thank you,

Kith
 
Old 08-22-2011, 04:23 PM   #13
Kitherel
LQ Newbie
 
Registered: Aug 2011
Posts: 6

Original Poster
Rep: Reputation: Disabled
Well, after the suggestion by chrism01 that Perl was the way to go,

I finished a rather handy awk script I had started that parsed 3 different log files and presented it nicely with colour coding depending on information severity. Awk was pretty easy to get the hang of.

Started and struggled a bit with the Perl language and I'm still getting to grips with it but using the File::Tail library, I have managed to get a rather script going, does what I wanted to do with matching up the relevant lines etc..

I have expanded upon my original requirement as I am reading trough the entire log anyway, now gathering all kinds of useful bits of information, providing summaries every 25,000 lines, which is scarily created every 30-60secs.

Thanks for the recommendations.

Kith
 
Old 08-22-2011, 07:22 PM   #14
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
Yeah, this sort of thing is one of Perl's strong areas, and the File::Tail module is very handy for real time tracking of open log files.

There's loads of links off the pages I linked to that are very good (imho) compared to some stuff I've seen.
I really like that the official Perl docs contain lots of examples.
If you are going to get into Perl I cannot recommend the Perl Cookbook highly enough; worth its weight in gold
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
using grep, awk or sed to find and cut data Jekikii Programming 8 04-21-2011 10:48 AM
sed/awk/cut/grep.. looking for the best solution voda87 Programming 4 03-03-2011 08:59 AM
Get data from multi lined text file using awk, sed or perl - grep & cut not upto par cam34 Programming 4 07-02-2010 04:10 AM
How to use grep, cut, or awk to get an IP from a file chudster Linux - General 4 02-03-2010 08:06 PM
How to use command grep,cut,awk to cut a data from a file? hocheetiong Linux - Newbie 7 09-11-2008 08:16 PM


All times are GMT -5. The time now is 08:30 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration