LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 11-24-2006, 04:42 AM   #1
GigerMalmensteen
LQ Newbie
 
Registered: Nov 2006
Posts: 5

Rep: Reputation: 0
AWK/SED Multiple pattern matching over multiple lines issue


I have to construct a maintenance program, part of this program is the interrogation of log files.

Ordinarily a grep or sed would sort me right out however this problem has a few other restrictions.

I have to initially get the current date from the system and then match this to entries in a log file. Not a problem, already done. However once I have located a matching line I then have to step over the next lines looking for another pattern and, if found, write these entries to a file. I can ONLY use either grep, sed or awk to do this. I believe awk will do it no problem however I am not familiar with all it's aspects. An example of the data may help:

test.log:

2006 Nov 06 18:01:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:03:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:04:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:06:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:07:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:08:26:179 GMT +1 userQueue - [Unknown
] - Severity: 2; Category: ; ExceptionCode: ; Message: unable to create new nati
ve thread; Parameters: <n/a>; Stack Trace: Job-18507 Error in userQueue
java.lang.OutOfMemoryError: unable to create new native thread

I need to extract the corresponding line(s) relating to the OutOfMemoryError and date! e.g. output should look like:

(date) (filename) (error)

2006 Nov 06 userQueue java.lang.OutOfMemoryError: unable to create new native thread


Currently I'm using something like this:

#!/bin/bash

date=`date | awk '{print $6 " " $2 " " $3}'`

filename=`sed -n "/$date/p" *.log* | awk '{print $7}'`

echo "Date is: " $date
echo "Filename is: " $filename

search=`sed "/$date/p" *.log* | grep OutOfMemory`

echo "Search Results: " $search

totalString=$date" "$filename" "$search
echo "Final Result: "$totalString > errorFiles

This of course doesn't work and gets every instance of either 2006 Nov 06 OR OutOfMemory.

I have also played around with simple oneliners like:

sed -e '/2006 Nov 06/b' -e '/OutOfMemoryError/b' -e d test.log > output
awk '{ if($1 == "2006" && $2 == "Nov" && $3 == "21") print}' test.log

I believe awk is the way to go. From the above example I should only have to search for the next pattern and output. But I'm unsure.

I hope some Linux crack could help with this. I'm sure someone with a more in-depth knowledge of awk or sed could solve this very simply.

Any help would be great. Thanks.

Last edited by GigerMalmensteen; 11-24-2006 at 07:10 AM.
 
Old 11-24-2006, 06:33 AM   #2
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
It's not entirely clear to me which lines exactly you're trying to filter out of the file. So here are few commands that I think/hope may help...
Code:
# Get all lines that start with $date and also
# contains "OutOfMemoryError". Just grep will do
# in that case.
#
grep "^$date.*OutOfMemory" log.txt

# Get all lines between (and including) the
# first that starts with $date until the first line
# after that which contains "OutOfMemoryError" 
# (possibly starts with a different date)
#
sed -n "/^$date/,/OutOfMemoryError/p" log.txt

# Get all lines between (and including) the
# first that starts with $date until the first line
# after that which starts with $date and
# also contains "OutOfMemoryError".
#
sed -n "/^$date/,/^$date.*OutOfMemoryError/p" log.txt
Just a minor tip: getting the date of today in that format is easier and executing faster this way:
Code:
date=`date +"%Y %b %d"`
Hope this helps.
 
Old 11-24-2006, 07:05 AM   #3
GigerMalmensteen
LQ Newbie
 
Registered: Nov 2006
Posts: 5

Original Poster
Rep: Reputation: 0
Hko,

Thanks for your time, response and advice regarding getting the date.

I am trying to filter out the entire file. All I need is the line that is identified as having a date, that matches the current date, and is preceeded by the 'OutOfMemory' error string. Which in most cases will be 4 lines below the matched date line.

My primary problem is that when I make a search I get a list of all instances that have the date value.

The date field is not a unique identifier. The relationship between the date and error is the unique part!

Thanks
 
Old 11-24-2006, 10:49 AM   #4
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
OK. If I understand correctly what you're trying to do, this would do the trick:
Code:
#!/bin/bash

date=`date +"%Y %b %d"`
string="OutOfMemoryError"
file="log.txt"

sed -n -e'/^'"$date"'/{' -eh -en -e\} \
    -e'/'"$string"'/{' -eH -eg -e\} \
    -e'/^'"$date"'.*\n.*'"$string"'/p' -eH \
    "$file"

Last edited by Hko; 11-24-2006 at 10:52 AM.
 
Old 11-25-2006, 01:14 PM   #5
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,978
Blog Entries: 11

Rep: Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879
Quote:
Originally Posted by GigerMalmensteen
I have to construct a maintenance program, part of this program is the interrogation of log files.

Ordinarily a grep or sed would sort me right out however this problem has a few other restrictions.

I have to initially get the current date from the system and then match this to entries in a log file. Not a problem, already done. However once I have located a matching line I then have to step over the next lines looking for another pattern and, if found, write these entries to a file. I can ONLY use either grep, sed or awk to do this. I believe awk will do it no problem however I am not familiar with all it's aspects. An example of the data may help:

test.log:

2006 Nov 06 18:01:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:03:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:04:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:06:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:07:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:08:26:179 GMT +1 userQueue - [Unknown
] - Severity: 2; Category: ; ExceptionCode: ; Message: unable to create new nati
ve thread; Parameters: <n/a>; Stack Trace: Job-18507 Error in userQueue
java.lang.OutOfMemoryError: unable to create new native thread

I need to extract the corresponding line(s) relating to the OutOfMemoryError and date! e.g. output should look like:

(date) (filename) (error)

2006 Nov 06 userQueue java.lang.OutOfMemoryError: unable to create new native thread


Currently I'm using something like this:

#!/bin/bash

date=`date | awk '{print $6 " " $2 " " $3}'`

filename=`sed -n "/$date/p" *.log* | awk '{print $7}'`

echo "Date is: " $date
echo "Filename is: " $filename

search=`sed "/$date/p" *.log* | grep OutOfMemory`

echo "Search Results: " $search

totalString=$date" "$filename" "$search
echo "Final Result: "$totalString > errorFiles

This of course doesn't work and gets every instance of either 2006 Nov 06 OR OutOfMemory.

I have also played around with simple oneliners like:

sed -e '/2006 Nov 06/b' -e '/OutOfMemoryError/b' -e d test.log > output
awk '{ if($1 == "2006" && $2 == "Nov" && $3 == "21") print}' test.log

I believe awk is the way to go. From the above example I should only have to search for the next pattern and output. But I'm unsure.

I hope some Linux crack could help with this. I'm sure someone with a more in-depth knowledge of awk or sed could solve this very simply.

Any help would be great. Thanks.
Have you mangled the log files lines like that on purpose, could you
make all stuff that belongs to one log-entry reside on one line?


Cheers,
Tink
 
Old 11-26-2006, 11:56 PM   #6
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 628

Rep: Reputation: 367Reputation: 367Reputation: 367Reputation: 367
Hi!
One more tip: All your lines begins with 2006... so you can use it as a line delimiter and delete newlines at all.
Code:
...|tr -d '\n'|awk -F '200[0-9]' '/OutOfMemoryError/ {print}'|...
You have to test this code, because I can not do this at the moment.
 
Old 11-27-2006, 12:43 AM   #7
igorc
Member
 
Registered: May 2005
Location: Sydney, Australia
Distribution: Ubuntu 5.04, Debian 3.1
Posts: 74

Rep: Reputation: 15
Take a look in the getline command which is part of awk/gawk program.
 
Old 11-27-2006, 12:57 AM   #8
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Since you are looking for a pattern on a single line containing both the date and "Out of Memory", these could both be contained in a regular expression pattern. Just put a ".*" pattern inbetween the two patterns.

Or you could use grep twice: "grep 'pattern1' logfile | grep 'pattern2'" to produce an intersection of the two patterns.

There are three other things you can use with sed. The -n option will suppress output unless you use the print command. The -e option allows you to enter more then a single command ( As demonstrated by poster Hko above ). You can use brackets to use subpatterns inside // slashes to further fine tune the search. This may allow you to first select lines with the current date, and then create different files which filter different patterns.

If you have a gawk-doc package, you might want to install it. It includes the book "Gawk: Effective AWK Programming."

Last edited by jschiwal; 11-27-2006 at 01:08 AM.
 
Old 11-27-2006, 02:56 AM   #9
GigerMalmensteen
LQ Newbie
 
Registered: Nov 2006
Posts: 5

Original Poster
Rep: Reputation: 0
Thanks for all the feedback guys.

HKO your solution worked great on a single entry log file I tested, however sed died with a "sed: Memory allocation failed." error when tested on a real 8MB file. Any suggestions?

Last edited by GigerMalmensteen; 11-27-2006 at 04:20 AM.
 
Old 11-28-2006, 08:54 AM   #10
GigerMalmensteen
LQ Newbie
 
Registered: Nov 2006
Posts: 5

Original Poster
Rep: Reputation: 0
Just in case anyone was interested, an ugly solution I came up with is this:

#!/bin/bash
date=`date +"%Y %b %d"`
errorCode=$1
sed -n '/'"$date"'/,$p' ./data/5.log > tempfile
lineValue=`grep -n "$errorCode" tempfile | cut -d: -f 1 > lineValues`
count=`wc -w < lineValues`
grep -n "$date" tempfile | cut -d: -f 1 > dateValues

for((j=1;j<="$count";j++)); do
nOe=`sed "$j"'q;d' lineValues`
nOd=`sed "$j"'q;d' dateValues`
max=$nOe
min=$nOd
for ((i="$nOe";i>=0;i--)); do
if [ "$i" == "$max" ];then
error=`sed "$max"'q;d' tempfile`
fi
if [ "$i" == "$min" ];then
info=`sed "$min"'q;d' tempfile`
fi
done
output="$info"" ""$error"
done


Thanks for the help guys.
 
Old 11-28-2006, 03:23 PM   #11
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
Quote:
Originally Posted by GigerMalmensteen
HKO your solution worked great on a single entry log file I tested, however sed died with a "sed: Memory allocation failed." error when tested on a real 8MB file. Any suggestions?
Here's a different, simpler approach. It will still read in entire files into memory, but it's not sed who has to that.
Code:
#!/bin/bash

date=`date +"%Y %b %d"`
string="OutOfMemoryError"
file="log.txt"

tac "$file" | sed -n '/'"$string"'/,/^'"$date"'/p' | tac
If the script above doesn't have the memory problem (I expect it doesn't, but I have tried it on large files), it's a much cleaner solution than your "ugly" one IMHO.
 
Old 11-29-2006, 03:57 AM   #12
GigerMalmensteen
LQ Newbie
 
Registered: Nov 2006
Posts: 5

Original Poster
Rep: Reputation: 0
Hko,

Once again thanks for your response. Just to let you know 'tac' doesn't come as standard with the SunOS version I am using. So the elegant solution you proposed can't be used :?

I am working with limited resources.
 
Old 12-01-2006, 04:32 PM   #13
osvaldomarques
Member
 
Registered: Jul 2004
Location: Rio de Janeiro - Brazil
Distribution: Conectiva 10 - Conectiva 8 - Slackware 9 - starting with LFS
Posts: 519

Rep: Reputation: 34
Hi GigerMalmensteen,

As you have several steps to accomplish your task, I guess the best tool for your needs is awk: first, identify the messages of the day, second cat all the physical lines that compound the logical one, decide if it is to be reported and finally cut the slices you want to display.

Below I show you an script which does the above steps:
Code:
#!/bin/sh

DATE=`date +"%Y %b %d"`
DATE="2006 Nov 06" # to test your test.log

cat *.log | \
awk 'BEGIN { date = "'"$DATE"'" }

function check_output()
{
  #
  # check for error report on the assembled line
  #
  if ((ind = match(line, /Error in /)) != 0)
  {
    # ind points to the string "Error in "
    ind += 9 # go to post string
    # get the portion of the line which
    # contains the file and error message
    tmp = substr(line, ind)
    # get the separator between file and error
    ind = index(tmp, ":")
    file = substr(tmp, 1, ind - 1)
    error = substr(tmp, ind + 1)
    # printing the 3 fields separated by [TAB]
    printf("%s\t%s\t%s\n", date, file, error)
  }
  line = ""
}

{ # main loop
  if (index($0, date) != 0)
  {
    # if the line starts with the date
    # check to see if there is one
    # already assembled
    if (length(line) != 0)
      check_output()
    # Initialize a new line
    line = $0
  }
  else
  {
    # if the line does not start with
    # the date, check to see if there
    # is already a line in process. If
    # positive, cat the input to the
    # line. Otherwise, discard it.
    if (length(line) != 0)
      line = line " " $0
  }
}

END {
  # End of file, we could have an
  # assembled line; go and check it
  if (length(line) != 0)
    check_output()
}'
 
Old 12-01-2006, 06:16 PM   #14
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
It would all be very easy and elegant (not to mention pretty fast) to use perl:
Code:
#!/usr/bin/perl -w

use strict;

my $last_date = "unknown";
while(<>) {
    if ( /^(\d\d\d\d \w\w\w \d\d \d\d:\d\d:\d\d:\d\d\d \w\w\w ([+\-]\d)?)/ ) {
        $last_date = $1;
    }

    if ( /OutOfMemoryError/ ) {
        print "Out of memory detected at line $. - date = $last_date\n";
        next;
    }
}
You would run this on the logfiles by saving it to a file, e.g. "mylogscan", changing the mode of logscan to be executable:
Code:
chmod 755 mylogscan
And then executing with the filename of the log (or multiple logfiles if you like) as arguments to the program:
Code:
./mylogscan logfile1 logfile2 logfile3
A little Perl de-mystification might help to know how it's working:

use strict; just means complain a lot about potentially risky code. It's generally a good idea to use this.

Code:
while(<>) { ... }
The mysterious object here for Perl virgins is the <>. <SOMETHING> is Perl's way to read one line from the file handle SOMETHING. If you don't specify a SOMETHING, Perl opens files names as arguments to the script in turn (names in the array @ARGV), reads lines from them, closes them, opens the next file etc. If you don't specify any files as arguments to the script, Perl will read from standard input. Lines read in this manner get put in the variable $_. <> returns true until the end of possible input, at which point your while loop will terminate.

Code:
/^(\d\d\d\d \w\w\w \d\d \d\d:\d\d:\d\d:\d\d\d \w\w\w ([+\-]\d)?)/
This line is the most likely, in my opinion, to have Perl virgins running for the hills screaming. The bit between the slashes is a Perl style regular expression. \d mean "a digit", \w means a "word" character (letters and _). So this stuff between the slashes means "four digits, a space, four word characters, two digits, a colon etc. The [+\-] is a way of saying a + or a - character, the ? means "the previous bit, is optional". Brackets group expressions together and if there is a match, the matched values are assigned to $1 for the first set of brackets, $2 for the second set etc. By default, regular expressions are matched against the $_ variable, which is set to the line read from <> as described above. the /expression/ returns true if a match is found. Phew! In short all this means "look for something which looks like a date, and if you find it, put the matched value in $1, which we then save in the variable $last_date."

The rest is pretty self explanatory I think.

Perl's syntax is highly abbreviated for this sort of task because it's exactly the sort of thing that needs to be done a lot. It saves a lot of typing at the expense of scaring off newbies.

Perl eats gigabytes of log files for breakfast, and still has room left for more! Long live Perl!
 
Old 12-03-2006, 04:59 PM   #15
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,269

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
Matthew42g, you ought to be able to shorten the regex with these operators I believe:

{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times

see http://perldoc.perl.org/perlre.html
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sed pattern matching digitalbrutus Programming 1 08-20-2006 01:37 PM
pattern matching problem in sed digitalbrutus Programming 4 08-20-2006 04:40 AM
awk/gawk/sed - read lines from file1, comment out or delete matching lines in file2 rascal84 Linux - General 1 05-24-2006 09:19 AM
awk print lines that doesn't have a pattern huynguye Programming 5 05-04-2006 11:08 AM
replacement with sed: replace pattern with multiple lines Hcman Programming 5 11-18-2004 07:40 AM


All times are GMT -5. The time now is 05:24 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration