LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-18-2010, 10:36 PM   #1
ykheong
LQ Newbie
 
Registered: Feb 2010
Posts: 4

Rep: Reputation: 0
Shell script to extract single report by pattern, then both backward and forward


Hi all,

Greetings, newbie here! Sorry for the confusing subject. I have to admit that I register to LQ after I failed to search for similar solutions.

oK, let me see whether I can explain my problem clearly. I need to extract a single report from a big file. The big file looks something like this:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Report for yyyyyy

Your info 999-9999999

End of Report

Report for zzzzzz

Your info 000-0000000

End of Report
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

I need to search for a user provided string, say 999-9999999, in the big file. Then I have to extract the single report. My logic is simple,
1) find 999-9999999
2) backward search for "Report for", note down the line number
3) forward search for "End of Report", note down the line number
4) extract the record by using info found from step 2) and 3).

I am trying to do this in bash, with awk and sed (I am new to both).

Possible, or should I just write a program to do this?

Thanks in advance!


rgds,
YKheong
 
Old 02-18-2010, 11:07 PM   #2
ashok.g
Member
 
Registered: Dec 2009
Location: Hyderabad,India
Distribution: RHEl AS 4
Posts: 215

Rep: Reputation: 32
Here is what I did for you ykheong using perl programming.
Code:
open(IN,"a.txt");
@a=<IN>;
for($i=0;$i<=$#a;$i++)
{
if($a[$i]=~/999-9999999$/)
{
print "$a[$i-2]$a[$i-1]$a[$i]$a[$i+1]$a[$i+2]";
}
}
Hope that helps you
 
Old 02-18-2010, 11:27 PM   #3
ykheong
LQ Newbie
 
Registered: Feb 2010
Posts: 4

Original Poster
Rep: Reputation: 0
Thanks Ashok.

I think I oversimplified things a bit too much. The distance between the search string and both ends are dynamic. It looks actually more like this.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Report for yyyyyy
something trivia here
and there

Your info 999-9999999

another thing here
and elsewhere
End of Report

Report for zzzzzz
Simpler beginnin

Your info 000-0000000

lots of stuffs
under
this
line, bla bla
End of Report
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

So, once I found 999-9999999, I need to search backward for "Report for", then forward for "End of Report" as these are the only indicator of where the report belongs. (note: indentation for readability, actual report has no indentations)

Thank you.


rgds,
YKheong
 
Old 02-18-2010, 11:38 PM   #4
okos
Member
 
Registered: May 2007
Location: California
Distribution: Slackware/Ubuntu
Posts: 609

Rep: Reputation: 38
Is this for a class?
Find someone to do your homework for you?
 
Old 02-19-2010, 12:04 AM   #5
ykheong
LQ Newbie
 
Registered: Feb 2010
Posts: 4

Original Poster
Rep: Reputation: 0
Hi okos,

Unfortunately it is not a homework, otherwise I'll have classmates whom I can ask for help :-), instead of searching in the Net. (lame joke, I know)

I must admit that I am not good at sed, nor awk (and thus, perl). So far I can get away with grep. Try to `grep -C` is the closest I can get, but it is fixed number of lines so doesn't help much.

Cheers!


rgds,
YKheong
 
Old 02-19-2010, 01:00 AM   #6
ashok.g
Member
 
Registered: Dec 2009
Location: Hyderabad,India
Distribution: RHEl AS 4
Posts: 215

Rep: Reputation: 32
Hi ykheong,
I'm back with the code what actually you want.
Here is the code:
Code:
open(IN,"a.txt");
@a=<IN>;
$f=0;
for($i=0;$i<=$#a;$i++)
{
	if($a[$i]=~/999-9999999$/)
	{
		for($j=$i-1;$j>=0;$j--)
		{
			if($a[$j] !~ /^(Report)/)
			{
				push @final,$a[$j];
			}
			else
			{
				last;
			}
		}
		push @final, $a[$j];
		print reverse @final;
		print $a[$i];
		for($j=$i+1;$j<=$#a;$j++)
		{
			if($a[$j] !~ /^End of Report/)
			{
				print "$a[$j]";
			}
			else
			{
				last;
			}
		}
		print "$a[$j]";
		exit;
	}
}
 
Old 02-19-2010, 02:07 AM   #7
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 556Reputation: 556Reputation: 556Reputation: 556Reputation: 556Reputation: 556
And here's a Bash way (pretty low-tech though; I guess the AWKxperts are not around today )

Code:
#!/bin/bash

save='0'

while read line; do
   [ "$(echo "$line" | awk '/999-9999999/{print}')" ] && save='1'
   if [ "$save" = '1' ]; then
     output="${output}\n${line}"
     if [ "$(echo "$line" | grep "End of Report")" ]; then
       save='0'
     fi
   fi
done < test

save='0'
tac test > .temptest

while read line; do
   [ "$(echo "$line" | awk '/999-9999999/{print}')" ] && save='2'
   if [ $save = '1' ]; then
      output="${line}\n${output}"
      if [ "$(echo "$line" | grep "Report for")" ]; then
        save=0
      fi
   fi
   [ "$save" = '2' ] && save='1'
done < .temptest

echo -e "${output}" | sed '/^$/{
N
/^\n$/D
}'

rm .temptest
Have fun!
Sasha
 
Old 02-19-2010, 02:43 AM   #8
ykheong
LQ Newbie
 
Registered: Feb 2010
Posts: 4

Original Poster
Rep: Reputation: 0
Hi Ashok & Sasha,

Thanks for the sample. I have yet to test it out (am quite dizzy after reading lot of examples from the Net, also I am calling the day off). Nevertheless I will revert back to you on the results.

On the other hand, I have "half" of the solution: to print from matching patter till end of report,

sed -n '/999-9999999/,/End of Report/p' bigFile

Now, it will be great if I can find the "first half" of the solutions......

See you guys tomorrow!


rgds,
YKheong
 
Old 02-22-2010, 01:20 AM   #9
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,352

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Assuming a big file ie too big for swallowing into memory
Code:
#!/usr/bin/perl -w
use strict;

my ($start, $data_rec);

open( FILE, "<", "t.t" ) or
       die "Can't open file: t.t: $!\n";

# Init start of first block/rpt; assumes no other file headers
$start = 0;
while ( defined ( $data_rec = <FILE> ) )
{

    # Match end of Rpt block & save position;
    # this will be start of next Rpt block
    if( $data_rec =~ /^End/ )
    {
        $start = tell(FILE);
    }

    # Match Rpt id and output Rpt lines
    if( $data_rec =~ /99-999/ )
    {
        # Go back to start of Rpt block
        seek(FILE, $start, 0);
        while ( defined ( $data_rec = <FILE> ) )
        {
            # Output all recs until end of block
            print "$data_rec";
            last if( $data_rec =~ /^End/ ) ;
        }

        # Note where we are; its start of next block
        $start = tell(FILE);
    }
}

close(FILE) or
       die "Can't close file: t.t: $!\n";
You may want to make the key value a variable later, but this will do for now.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell Script to Delete line if pattern exists topcat Programming 22 08-23-2011 04:58 AM
Shell Script To extract more than one tar file in to a single folder greensuman Linux - General 5 12-02-2009 11:08 PM
how to find a pattern using shell script sharad Linux - General 1 05-23-2006 03:50 AM
Procmail: match pattern then pass to shell script essdeeay Linux - Software 1 11-08-2004 02:19 PM
Going backward to go forward Wonderer Slackware 1 01-31-2004 07:14 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration