Shell script to extract single report by pattern, then both backward and forward

ykheong · 02-18-2010, 10:36 PM

Hi all,

Greetings, newbie here! Sorry for the confusing subject. I have to admit that I register to LQ after I failed to search for similar solutions.

oK, let me see whether I can explain my problem clearly. I need to extract a single report from a big file. The big file looks something like this:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Report for yyyyyy

Your info 999-9999999

End of Report

Report for zzzzzz

Your info 000-0000000

End of Report
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

I need to search for a user provided string, say 999-9999999, in the big file. Then I have to extract the single report. My logic is simple,
1) find 999-9999999
2) backward search for "Report for", note down the line number
3) forward search for "End of Report", note down the line number
4) extract the record by using info found from step 2) and 3).

I am trying to do this in bash, with awk and sed (I am new to both).

Possible, or should I just write a program to do this?

Thanks in advance!

rgds,
YKheong

ashok.g · 02-18-2010, 11:07 PM

Here is what I did for you ykheong using perl programming.

Code:

open(IN,"a.txt");
@a=<IN>;
for($i=0;$i<=$#a;$i++)
{
if($a[$i]=~/999-9999999$/)
{
print "$a[$i-2]$a[$i-1]$a[$i]$a[$i+1]$a[$i+2]";
}
}

Hope that helps you

ykheong · 02-18-2010, 11:27 PM

Thanks Ashok.

I think I oversimplified things a bit too much. The distance between the search string and both ends are dynamic. It looks actually more like this.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Report for yyyyyy
something trivia here
and there

Your info 999-9999999

another thing here
and elsewhere
End of Report

Report for zzzzzz
Simpler beginnin

Your info 000-0000000

lots of stuffs
under
this
line, bla bla
End of Report
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

So, once I found 999-9999999, I need to search backward for "Report for", then forward for "End of Report" as these are the only indicator of where the report belongs. (note: indentation for readability, actual report has no indentations)

Thank you.

rgds,
YKheong

okos · 02-18-2010, 11:38 PM

Is this for a class?
Find someone to do your homework for you?

ykheong · 02-19-2010, 12:04 AM

Hi okos,

Unfortunately it is not a homework, otherwise I'll have classmates whom I can ask for help :-), instead of searching in the Net. (lame joke, I know)

I must admit that I am not good at sed, nor awk (and thus, perl). So far I can get away with grep. Try to `grep -C` is the closest I can get, but it is fixed number of lines so doesn't help much.

Cheers!

rgds,
YKheong

ashok.g · 02-19-2010, 01:00 AM

Hi ykheong,
I'm back with the code what actually you want.
Here is the code:

Code:

open(IN,"a.txt");
@a=<IN>;
$f=0;
for($i=0;$i<=$#a;$i++)
{
	if($a[$i]=~/999-9999999$/)
	{
		for($j=$i-1;$j>=0;$j--)
		{
			if($a[$j] !~ /^(Report)/)
			{
				push @final,$a[$j];
			}
			else
			{
				last;
			}
		}
		push @final, $a[$j];
		print reverse @final;
		print $a[$i];
		for($j=$i+1;$j<=$#a;$j++)
		{
			if($a[$j] !~ /^End of Report/)
			{
				print "$a[$j]";
			}
			else
			{
				last;
			}
		}
		print "$a[$j]";
		exit;
	}
}

GrapefruiTgirl · 02-19-2010, 02:07 AM

And here's a Bash way (pretty low-tech though; I guess the AWKxperts are not around today

)

Code:

#!/bin/bash

save='0'

while read line; do
   [ "$(echo "$line" | awk '/999-9999999/{print}')" ] && save='1'
   if [ "$save" = '1' ]; then
     output="${output}\n${line}"
     if [ "$(echo "$line" | grep "End of Report")" ]; then
       save='0'
     fi
   fi
done < test

save='0'
tac test > .temptest

while read line; do
   [ "$(echo "$line" | awk '/999-9999999/{print}')" ] && save='2'
   if [ $save = '1' ]; then
      output="${line}\n${output}"
      if [ "$(echo "$line" | grep "Report for")" ]; then
        save=0
      fi
   fi
   [ "$save" = '2' ] && save='1'
done < .temptest

echo -e "${output}" | sed '/^$/{
N
/^\n$/D
}'

rm .temptest

Have fun!
Sasha

ykheong · 02-19-2010, 02:43 AM

Hi Ashok & Sasha,

Thanks for the sample. I have yet to test it out (am quite dizzy after reading lot of examples from the Net, also I am calling the day off). Nevertheless I will revert back to you on the results.

On the other hand, I have "half" of the solution: to print from matching patter till end of report,

sed -n '/999-9999999/,/End of Report/p' bigFile

Now, it will be great if I can find the "first half" of the solutions......

See you guys tomorrow!

rgds,
YKheong

chrism01 · 02-22-2010, 01:20 AM

Assuming a big file ie too big for swallowing into memory

Code:

#!/usr/bin/perl -w
use strict;

my ($start, $data_rec);

open( FILE, "<", "t.t" ) or
       die "Can't open file: t.t: $!\n";

# Init start of first block/rpt; assumes no other file headers
$start = 0;
while ( defined ( $data_rec = <FILE> ) )
{

    # Match end of Rpt block & save position;
    # this will be start of next Rpt block
    if( $data_rec =~ /^End/ )
    {
        $start = tell(FILE);
    }

    # Match Rpt id and output Rpt lines
    if( $data_rec =~ /99-999/ )
    {
        # Go back to start of Rpt block
        seek(FILE, $start, 0);
        while ( defined ( $data_rec = <FILE> ) )
        {
            # Output all recs until end of block
            print "$data_rec";
            last if( $data_rec =~ /^End/ ) ;
        }

        # Note where we are; its start of next block
        $start = tell(FILE);
    }
}

close(FILE) or
       die "Can't close file: t.t: $!\n";

You may want to make the key value a variable later, but this will do for now.