Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
11-24-2006, 04:42 AM
|
#1
|
LQ Newbie
Registered: Nov 2006
Posts: 5
Rep:
|
AWK/SED Multiple pattern matching over multiple lines issue
I have to construct a maintenance program, part of this program is the interrogation of log files.
Ordinarily a grep or sed would sort me right out however this problem has a few other restrictions.
I have to initially get the current date from the system and then match this to entries in a log file. Not a problem, already done. However once I have located a matching line I then have to step over the next lines looking for another pattern and, if found, write these entries to a file. I can ONLY use either grep, sed or awk to do this. I believe awk will do it no problem however I am not familiar with all it's aspects. An example of the data may help:
test.log:
2006 Nov 06 18:01:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:03:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:04:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:06:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:07:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:08:26:179 GMT +1 userQueue - [Unknown
] - Severity: 2; Category: ; ExceptionCode: ; Message: unable to create new nati
ve thread; Parameters: <n/a>; Stack Trace: Job-18507 Error in userQueue
java.lang.OutOfMemoryError: unable to create new native thread
I need to extract the corresponding line(s) relating to the OutOfMemoryError and date! e.g. output should look like:
(date) (filename) (error)
2006 Nov 06 userQueue java.lang.OutOfMemoryError: unable to create new native thread
Currently I'm using something like this:
#!/bin/bash
date=`date | awk '{print $6 " " $2 " " $3}'`
filename=`sed -n "/$date/p" *.log* | awk '{print $7}'`
echo "Date is: " $date
echo "Filename is: " $filename
search=`sed "/$date/p" *.log* | grep OutOfMemory`
echo "Search Results: " $search
totalString=$date" "$filename" "$search
echo "Final Result: "$totalString > errorFiles
This of course doesn't work and gets every instance of either 2006 Nov 06 OR OutOfMemory.
I have also played around with simple oneliners like:
sed -e '/2006 Nov 06/b' -e '/OutOfMemoryError/b' -e d test.log > output
awk '{ if($1 == "2006" && $2 == "Nov" && $3 == "21") print}' test.log
I believe awk is the way to go. From the above example I should only have to search for the next pattern and output. But I'm unsure.
I hope some Linux crack could help with this. I'm sure someone with a more in-depth knowledge of awk or sed could solve this very simply.
Any help would be great. Thanks.
Last edited by GigerMalmensteen; 11-24-2006 at 07:10 AM.
|
|
|
11-24-2006, 06:33 AM
|
#2
|
Senior Member
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536
Rep: 
|
It's not entirely clear to me which lines exactly you're trying to filter out of the file. So here are few commands that I think/hope may help...
Code:
# Get all lines that start with $date and also
# contains "OutOfMemoryError". Just grep will do
# in that case.
#
grep "^$date.*OutOfMemory" log.txt
# Get all lines between (and including) the
# first that starts with $date until the first line
# after that which contains "OutOfMemoryError"
# (possibly starts with a different date)
#
sed -n "/^$date/,/OutOfMemoryError/p" log.txt
# Get all lines between (and including) the
# first that starts with $date until the first line
# after that which starts with $date and
# also contains "OutOfMemoryError".
#
sed -n "/^$date/,/^$date.*OutOfMemoryError/p" log.txt
Just a minor tip: getting the date of today in that format is easier and executing faster this way:
Code:
date=`date +"%Y %b %d"`
Hope this helps.
|
|
|
11-24-2006, 07:05 AM
|
#3
|
LQ Newbie
Registered: Nov 2006
Posts: 5
Original Poster
Rep:
|
Hko,
Thanks for your time, response and advice regarding getting the date.
I am trying to filter out the entire file. All I need is the line that is identified as having a date, that matches the current date, and is preceeded by the 'OutOfMemory' error string. Which in most cases will be 4 lines below the matched date line.
My primary problem is that when I make a search I get a list of all instances that have the date value.
The date field is not a unique identifier. The relationship between the date and error is the unique part!
Thanks
|
|
|
11-24-2006, 10:49 AM
|
#4
|
Senior Member
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536
Rep: 
|
OK. If I understand correctly what you're trying to do, this would do the trick:
Code:
#!/bin/bash
date=`date +"%Y %b %d"`
string="OutOfMemoryError"
file="log.txt"
sed -n -e'/^'"$date"'/{' -eh -en -e\} \
-e'/'"$string"'/{' -eH -eg -e\} \
-e'/^'"$date"'.*\n.*'"$string"'/p' -eH \
"$file"
Last edited by Hko; 11-24-2006 at 10:52 AM.
|
|
|
11-25-2006, 01:14 PM
|
#5
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
Quote:
Originally Posted by GigerMalmensteen
I have to construct a maintenance program, part of this program is the interrogation of log files.
Ordinarily a grep or sed would sort me right out however this problem has a few other restrictions.
I have to initially get the current date from the system and then match this to entries in a log file. Not a problem, already done. However once I have located a matching line I then have to step over the next lines looking for another pattern and, if found, write these entries to a file. I can ONLY use either grep, sed or awk to do this. I believe awk will do it no problem however I am not familiar with all it's aspects. An example of the data may help:
test.log:
2006 Nov 06 18:01:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:03:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:04:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:06:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:07:25:538 GMT +1 userQueue - Job-18494
s/QueryLog]: located user queue on line 654 of system 5432
2006 Nov 06 18:08:26:179 GMT +1 userQueue - [Unknown
] - Severity: 2; Category: ; ExceptionCode: ; Message: unable to create new nati
ve thread; Parameters: <n/a>; Stack Trace: Job-18507 Error in userQueue
java.lang.OutOfMemoryError: unable to create new native thread
I need to extract the corresponding line(s) relating to the OutOfMemoryError and date! e.g. output should look like:
(date) (filename) (error)
2006 Nov 06 userQueue java.lang.OutOfMemoryError: unable to create new native thread
Currently I'm using something like this:
#!/bin/bash
date=`date | awk '{print $6 " " $2 " " $3}'`
filename=`sed -n "/$date/p" *.log* | awk '{print $7}'`
echo "Date is: " $date
echo "Filename is: " $filename
search=`sed "/$date/p" *.log* | grep OutOfMemory`
echo "Search Results: " $search
totalString=$date" "$filename" "$search
echo "Final Result: "$totalString > errorFiles
This of course doesn't work and gets every instance of either 2006 Nov 06 OR OutOfMemory.
I have also played around with simple oneliners like:
sed -e '/2006 Nov 06/b' -e '/OutOfMemoryError/b' -e d test.log > output
awk '{ if($1 == "2006" && $2 == "Nov" && $3 == "21") print}' test.log
I believe awk is the way to go. From the above example I should only have to search for the next pattern and output. But I'm unsure.
I hope some Linux crack could help with this. I'm sure someone with a more in-depth knowledge of awk or sed could solve this very simply.
Any help would be great. Thanks.
|
Have you mangled the log files lines like that on purpose, could you
make all stuff that belongs to one log-entry reside on one line?
Cheers,
Tink
|
|
|
11-26-2006, 11:56 PM
|
#6
|
Member
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709
|
Hi!
One more tip: All your lines begins with 2006... so you can use it as a line delimiter and delete newlines at all.
Code:
...|tr -d '\n'|awk -F '200[0-9]' '/OutOfMemoryError/ {print}'|...
You have to test this code, because I can not do this at the moment.
|
|
|
11-27-2006, 12:43 AM
|
#7
|
Member
Registered: May 2005
Location: Sydney, Australia
Distribution: Ubuntu 5.04, Debian 3.1
Posts: 74
Rep:
|
Take a look in the getline command which is part of awk/gawk program.
|
|
|
11-27-2006, 12:57 AM
|
#8
|
LQ Guru
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
|
Since you are looking for a pattern on a single line containing both the date and "Out of Memory", these could both be contained in a regular expression pattern. Just put a ".*" pattern inbetween the two patterns.
Or you could use grep twice: "grep 'pattern1' logfile | grep 'pattern2'" to produce an intersection of the two patterns.
There are three other things you can use with sed. The -n option will suppress output unless you use the print command. The -e option allows you to enter more then a single command ( As demonstrated by poster Hko above ). You can use brackets to use subpatterns inside // slashes to further fine tune the search. This may allow you to first select lines with the current date, and then create different files which filter different patterns.
If you have a gawk-doc package, you might want to install it. It includes the book "Gawk: Effective AWK Programming."
Last edited by jschiwal; 11-27-2006 at 01:08 AM.
|
|
|
11-27-2006, 02:56 AM
|
#9
|
LQ Newbie
Registered: Nov 2006
Posts: 5
Original Poster
Rep:
|
Thanks for all the feedback guys.
HKO your solution worked great on a single entry log file I tested, however sed died with a "sed: Memory allocation failed." error when tested on a real 8MB file. Any suggestions?
Last edited by GigerMalmensteen; 11-27-2006 at 04:20 AM.
|
|
|
11-28-2006, 08:54 AM
|
#10
|
LQ Newbie
Registered: Nov 2006
Posts: 5
Original Poster
Rep:
|
Just in case anyone was interested, an ugly solution I came up with is this:
#!/bin/bash
date=`date +"%Y %b %d"`
errorCode=$1
sed -n '/'"$date"'/,$p' ./data/5.log > tempfile
lineValue=`grep -n "$errorCode" tempfile | cut -d: -f 1 > lineValues`
count=`wc -w < lineValues`
grep -n "$date" tempfile | cut -d: -f 1 > dateValues
for((j=1;j<="$count";j++)); do
nOe=`sed "$j"'q;d' lineValues`
nOd=`sed "$j"'q;d' dateValues`
max=$nOe
min=$nOd
for ((i="$nOe";i>=0;i--)); do
if [ "$i" == "$max" ];then
error=`sed "$max"'q;d' tempfile`
fi
if [ "$i" == "$min" ];then
info=`sed "$min"'q;d' tempfile`
fi
done
output="$info"" ""$error"
done
Thanks for the help guys.
|
|
|
11-28-2006, 03:23 PM
|
#11
|
Senior Member
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536
Rep: 
|
Quote:
Originally Posted by GigerMalmensteen
HKO your solution worked great on a single entry log file I tested, however sed died with a "sed: Memory allocation failed." error when tested on a real 8MB file. Any suggestions?
|
Here's a different, simpler approach. It will still read in entire files into memory, but it's not sed who has to that.
Code:
#!/bin/bash
date=`date +"%Y %b %d"`
string="OutOfMemoryError"
file="log.txt"
tac "$file" | sed -n '/'"$string"'/,/^'"$date"'/p' | tac
If the script above doesn't have the memory problem (I expect it doesn't, but I have tried it on large files), it's a much cleaner solution than your "ugly" one IMHO.
|
|
|
11-29-2006, 03:57 AM
|
#12
|
LQ Newbie
Registered: Nov 2006
Posts: 5
Original Poster
Rep:
|
Hko,
Once again thanks for your response. Just to let you know 'tac' doesn't come as standard with the SunOS version I am using. So the elegant solution you proposed can't be used :?
I am working with limited resources.
|
|
|
12-01-2006, 04:32 PM
|
#13
|
Member
Registered: Jul 2004
Location: Rio de Janeiro - Brazil
Distribution: Conectiva 10 - Conectiva 8 - Slackware 9 - starting with LFS
Posts: 519
Rep:
|
Hi GigerMalmensteen,
As you have several steps to accomplish your task, I guess the best tool for your needs is awk: first, identify the messages of the day, second cat all the physical lines that compound the logical one, decide if it is to be reported and finally cut the slices you want to display.
Below I show you an script which does the above steps:
Code:
#!/bin/sh
DATE=`date +"%Y %b %d"`
DATE="2006 Nov 06" # to test your test.log
cat *.log | \
awk 'BEGIN { date = "'"$DATE"'" }
function check_output()
{
#
# check for error report on the assembled line
#
if ((ind = match(line, /Error in /)) != 0)
{
# ind points to the string "Error in "
ind += 9 # go to post string
# get the portion of the line which
# contains the file and error message
tmp = substr(line, ind)
# get the separator between file and error
ind = index(tmp, ":")
file = substr(tmp, 1, ind - 1)
error = substr(tmp, ind + 1)
# printing the 3 fields separated by [TAB]
printf("%s\t%s\t%s\n", date, file, error)
}
line = ""
}
{ # main loop
if (index($0, date) != 0)
{
# if the line starts with the date
# check to see if there is one
# already assembled
if (length(line) != 0)
check_output()
# Initialize a new line
line = $0
}
else
{
# if the line does not start with
# the date, check to see if there
# is already a line in process. If
# positive, cat the input to the
# line. Otherwise, discard it.
if (length(line) != 0)
line = line " " $0
}
}
END {
# End of file, we could have an
# assembled line; go and check it
if (length(line) != 0)
check_output()
}'
|
|
|
12-01-2006, 06:16 PM
|
#14
|
Senior Member
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530
Rep:
|
It would all be very easy and elegant (not to mention pretty fast) to use perl:
Code:
#!/usr/bin/perl -w
use strict;
my $last_date = "unknown";
while(<>) {
if ( /^(\d\d\d\d \w\w\w \d\d \d\d:\d\d:\d\d:\d\d\d \w\w\w ([+\-]\d)?)/ ) {
$last_date = $1;
}
if ( /OutOfMemoryError/ ) {
print "Out of memory detected at line $. - date = $last_date\n";
next;
}
}
You would run this on the logfiles by saving it to a file, e.g. "mylogscan", changing the mode of logscan to be executable:
Code:
chmod 755 mylogscan
And then executing with the filename of the log (or multiple logfiles if you like) as arguments to the program:
Code:
./mylogscan logfile1 logfile2 logfile3
A little Perl de-mystification might help to know how it's working:
use strict; just means complain a lot about potentially risky code. It's generally a good idea to use this.
The mysterious object here for Perl virgins is the <>. <SOMETHING> is Perl's way to read one line from the file handle SOMETHING. If you don't specify a SOMETHING, Perl opens files names as arguments to the script in turn (names in the array @ARGV), reads lines from them, closes them, opens the next file etc. If you don't specify any files as arguments to the script, Perl will read from standard input. Lines read in this manner get put in the variable $_. <> returns true until the end of possible input, at which point your while loop will terminate.
Code:
/^(\d\d\d\d \w\w\w \d\d \d\d:\d\d:\d\d:\d\d\d \w\w\w ([+\-]\d)?)/
This line is the most likely, in my opinion, to have Perl virgins running for the hills screaming. The bit between the slashes is a Perl style regular expression. \d mean "a digit", \w means a "word" character (letters and _). So this stuff between the slashes means "four digits, a space, four word characters, two digits, a colon etc. The [+\-] is a way of saying a + or a - character, the ? means "the previous bit, is optional". Brackets group expressions together and if there is a match, the matched values are assigned to $1 for the first set of brackets, $2 for the second set etc. By default, regular expressions are matched against the $_ variable, which is set to the line read from <> as described above. the /expression/ returns true if a match is found. Phew! In short all this means "look for something which looks like a date, and if you find it, put the matched value in $1, which we then save in the variable $last_date."
The rest is pretty self explanatory I think.
Perl's syntax is highly abbreviated for this sort of task because it's exactly the sort of thing that needs to be done a lot. It saves a lot of typing at the expense of scaring off newbies.
Perl eats gigabytes of log files for breakfast, and still has room left for more! Long live Perl!
|
|
|
12-03-2006, 04:59 PM
|
#15
|
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.x
Posts: 18,442
|
Matthew42g, you ought to be able to shorten the regex with these operators I believe:
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
see http://perldoc.perl.org/perlre.html
|
|
|
All times are GMT -5. The time now is 02:29 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|