LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-24-2011, 09:12 AM   #1
d1verjim
LQ Newbie
 
Registered: Oct 2011
Posts: 3

Rep: Reputation: Disabled
Search Backwards from line number


Hi all,

This is my first post on here, though I've been tinkering with linux/unix scripts for a while.

I have written a script that searches a file for a string, and then searches backwards for the first date it comes across. THe reason being I need to search across multiple log files on multiple boxes, and find the most recent occurrence of a string. However the date is not on the same line as the string I'm searching for, as I'm searching for something in a multi line XML input.

The script I've written is this:

###########
PREVLINENUM=1
grep -n $1 $2 | while read LINE; do
LINENUM=`echo $LINE | cut -d: -f1`
STR=`echo $LINE | cut -d: -f2`
sed -n $PREVLINENUM','$LINENUM'p' $2 | tac | grep -m1 "^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]"
echo $STR
PREVLINENUM=$LINENUM
done
#############

and as you can see, it grep's the file with line numbers, and for each match, searches backwards from that line number to the previous occurrence. This works, but I can't help feeling there has to be a better way, as it's really inefficient.

Any ideas on optimising it?

Many thanks,
James.
 
Old 10-24-2011, 09:39 AM   #2
zootboy
Member
 
Registered: Nov 2008
Location: In a dumpster, with my laptop.
Distribution: Fedora
Posts: 124

Rep: Reputation: 25
Have you considered using some sort of XML parser to do it? It would seem like that would be much more efficient than regexes.
 
Old 10-24-2011, 10:47 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,425

Rep: Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826
Without an idea of what the data looks like I would agree with zootboy.
 
Old 11-10-2011, 09:52 AM   #4
d1verjim
LQ Newbie
 
Registered: Oct 2011
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thanks for your reply guys (and sorry for the delay in responding).

The files I am parsing are not XML, they are application log files, so look like:

2011-11-09 11:00:02.234214 aThreadId DEBUG A line of the log file that contains all sorts of stuff
2011-11-09 11:00:02.234314 aThreadId DEBUG Another line of the log file that contains all sorts of stuff
2011-11-09 11:00:02.234214 anotherThreadId DEBUG A line of the log file that contains all sorts of stuff about a different thread
2011-11-09 11:00:02.234214 aThreadId DEBUG XML Import received: <?xml version="1.0" encoding="UTF-8"?>
<RootNode>
<A node>blah</A>
<Another node>more blah</AnotherNode>
</RootNode>

and so on...

Because the XML in the logfile is multiline, if I want to find an input with "more blah", I then need to search up to find the timestamp from the log line at the start of that XML.
 
Old 11-10-2011, 10:01 AM   #5
zootboy
Member
 
Registered: Nov 2008
Location: In a dumpster, with my laptop.
Distribution: Fedora
Posts: 124

Rep: Reputation: 25
My first thought would be to try and get your application to output the data more directly. If out has any way to separate its logs, that would probably help.

But I'll assume that you can't. In that case, I would try using sed and regexes to split off the entire multi-line chunk. Parse the xml, then pull off the info you wanted.

But this still screams of hack, so I would definitely search for a better way of getting the data in the first place.
 
Old 11-10-2011, 10:39 AM   #6
bhanuvrat
Member
 
Registered: Jan 2010
Posts: 47

Rep: Reputation: 15
Lightbulb Python can help

Assuming that the aim is to get the task done and not to learn shell scripting,
the task can be accomplished using a small python script like this:
Code:
import datetime
searchFor = "more blah"
while(True):
    try:
        x = raw_input()
        d=x.split()[0]
    except Exception:
        print "Searched string not found"
        break

    try :        
        d = [int(i) for i in d.split('-')]        
        prevdate = datetime.date(d[0],d[1],d[2])        
    except:
        pass
    
    if searchFor in x :
        print "\n found the search key after ", prevdate
        break
redirect the log output as input to this script and it will do your job
what say?
 
Old 11-10-2011, 11:44 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,425

Rep: Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826Reputation: 2826
Why not use tac (reverse of cat) and then when you find what your looking for you can keep going till you find a date and time at the start of the line.
 
1 members found this post helpful.
Old 11-11-2011, 09:45 AM   #8
d1verjim
LQ Newbie
 
Registered: Oct 2011
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thanks guys.

I'll give these a try.

James.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Insert line using sed or awk at line using line number as variable sunilsagar Programming 11 02-03-2012 11:48 AM
[SOLVED] Using sed to search and replace backwards jimieee Programming 15 10-25-2010 11:13 AM
Create File Listing in C++ that will generate a line number on every line of code marzrocks Programming 11 04-12-2010 07:10 AM
Linux scripting search textfile backwards fab1234 Programming 19 07-16-2008 02:54 AM
how to search backwards using more just_a_kid Linux - Newbie 4 12-14-2006 06:50 AM


All times are GMT -5. The time now is 10:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration