LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Log analysis using awk, grep, cut etc. (https://www.linuxquestions.org/questions/linux-newbie-8/log-analysis-using-awk-grep-cut-etc-895381/)

Kitherel 08-03-2011 04:03 PM

Log analysis using awk, grep, cut etc.
 
Greetings,

I am pretty new to linux and would ask for assistance in extracting certian records from large (600mb) log files, now fairly comfortable using basic grep and cut commands to view certain entries in our log files, but now I am looking to get more extravegant with my extracts.

The log files being used are laid out as follows:

[<full date and timestamp>] <process number> <process name> <error message/general information>

The field positions are fixed, so knowing what characters the process id will start and end at for example isnt a problem.

I am looking for a way of searching the file for a specific error code 'Timeout Error', then using the process ID and a timestamp I need to find a message within a certain thrushold, ie in previous 45seconds, the log entry with the same process ID and information detail 'Sending Order to Booking System'

I would assume that this will require some form of awk usage to filter records based on timestamp calculation and process ID etc. As far as I have worked out the fields defined in awk $1 $2 etc.. do not match the output, therefore the fields will need to be identified by character position I assume although there are spaces between fields.

Please could someone kindly offer a few pointers or suitable script for this requirement?

Thank you for any help you can provide.

Kitherel

chrism01 08-03-2011 06:09 PM

TBH, I'd use Perl for this, especially given you want to match one rec, then (effectively) backtrack up to 45 secs to find preceding 'matching' rec.
Its extremely good at this sort of thing.
If you are interested, here's some good links
http://perldoc.perl.org/
http://www.perlmonks.org/?node=Tutorials

theNbomr 08-03-2011 07:07 PM

Completely concur with chrism01. Perl is the way to go for that. We love to help here, but we aren't so inclined to do the whole thing for you. For starters, try reading the file, line at a time ( while(<>){ } ), and breaking each one into its respective fields. Hint:
Code:

perldoc -f substr
--- rod.

Kitherel 08-04-2011 03:22 AM

Thank you for your advice chrism01 and theNbomr, I will need to see if I can run purl on the servers where the logs are held since they fairly secure and restricted.

I guess alternatively I could grep all the occurances required and process the required condtions as a two stage process on a time reversed log.

Thanks again for your suggestions and links.

Kith

grail 08-04-2011 05:38 AM

Well if you could provide some data I think awk could also give it a whirl (perhaps). Just need to get a better picture of the data and how it changes.

chrism01 08-04-2011 06:15 PM

Perl is usually part of the std install on most versions of Linux, so it shouldn't be a problem.

theNbomr 08-04-2011 10:30 PM

Quote:

Originally Posted by Kitherel (Post 4433345)
I guess alternatively I could grep all the occurances required and process the required condtions as a two stage process on a time reversed log.

Yes, that is one alternative, but why use the wrong tool when the right one is available? With 600 MB of data, speed is probably an issue, too. Perl is generally considered the fastest tool/language for text processing.

--- rod.

Kitherel 08-05-2011 03:45 AM

It seems that I am not going to be able to use purl for security reasons, not under my control. I shall download a sample of the file and post it here.

Thanks Grail for your kind offer to give me a hand with awk, any pointers is appreciated

Will post a sample on monday night when im back in the mad house.

Kith

syg00 08-05-2011 04:00 AM

I jumped from bash scripting to perl, bypassing awk.
Thanks to some of the awk cognoscenti here on LQ (one of whom has appeared in this thread) I have managed to acquire some limited capability. I would be most surprised if awk couldn't handle this "window" requirement (almost ?) as well as perl.
Especially on such limited data. On current systems I'd expect that amount of data to remain in the pagecache - re-reading it would be "zero-cost" even for multiple grep invocations or similar.

Kitherel 08-09-2011 03:05 PM

Sorry for the delay in responding (usual chaos going on),

I have posted: http://pastebin.com/pEw5Sb0k a sample of my grepped logs to pick out relevant lines only.

Once grepped, the logs only have 4 possible lines: a process start, process end, process elapsed timer and the entry I am particularly interested in, the read time out.

What I would like to produce if possible is to produce a list of start lines that match the read time out messages (eventually filtered by store ID). For every start msg theres either an end msg and elapsed msg or a read time out.

The only way to match these is to either eliminate every other start /end/elapsed line along the way, or how I would prefer to do it, match the start msg and read time out msg with the same process ID (00000hhh) within a specified thrushold ie within 1 minute of each other.

Its been a long day, I hope ive explained it in a understandable manner.

Any help would be appreciated.

Thanks for the help and advice given thus far.

Kith.

grail 08-09-2011 06:58 PM

Based on the file you have attached, would you please provide an example of what the output would look like?

Kitherel 08-10-2011 03:27 PM

Grail,

Thanks for bearing with me and my slow responses,

For the most part , I just need one output line per start time line, to show an indication that it Timed out or if it completed, what the end time was (if possible the duration of the process but that would be a luxury.

[08/08/11 16:11:23:245 GMT+01:00] 00000184 The start time of Galaxy invocation of ORDERS on Store 10693 Attempt Timed Out at 16:11:43:650 taking 00:00:20:405

[08/08/11 16:11:42:181 GMT+01:00] 0000016c The start time of Galaxy invocation of ORDERS on Store 10651 Ended at 16:11:43:250 taking 00:00:01:69

Any informaiton / examples that helps me understand how this might be done in awk would be appreciated.

Thank you,

Kith

Kitherel 08-22-2011 03:23 PM

Well, after the suggestion by chrism01 that Perl was the way to go,

I finished a rather handy awk script I had started that parsed 3 different log files and presented it nicely with colour coding depending on information severity. Awk was pretty easy to get the hang of.

Started and struggled a bit with the Perl language and I'm still getting to grips with it but using the File::Tail library, I have managed to get a rather script going, does what I wanted to do with matching up the relevant lines etc..

I have expanded upon my original requirement as I am reading trough the entire log anyway, now gathering all kinds of useful bits of information, providing summaries every 25,000 lines, which is scarily created every 30-60secs.

Thanks for the recommendations.

Kith

chrism01 08-22-2011 06:22 PM

Yeah, this sort of thing is one of Perl's strong areas, and the File::Tail module is very handy for real time tracking of open log files.

There's loads of links off the pages I linked to that are very good (imho) compared to some stuff I've seen.
I really like that the official Perl docs contain lots of examples.
If you are going to get into Perl I cannot recommend the Perl Cookbook highly enough; worth its weight in gold :)


All times are GMT -5. The time now is 01:02 PM.