Shell script to extract single report by pattern, then both backward and forward
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Shell script to extract single report by pattern, then both backward and forward
Hi all,
Greetings, newbie here! Sorry for the confusing subject. I have to admit that I register to LQ after I failed to search for similar solutions.
oK, let me see whether I can explain my problem clearly. I need to extract a single report from a big file. The big file looks something like this:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Report for yyyyyy
Your info 999-9999999
End of Report
Report for zzzzzz
Your info 000-0000000
End of Report
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
I need to search for a user provided string, say 999-9999999, in the big file. Then I have to extract the single report. My logic is simple,
1) find 999-9999999
2) backward search for "Report for", note down the line number
3) forward search for "End of Report", note down the line number
4) extract the record by using info found from step 2) and 3).
I am trying to do this in bash, with awk and sed (I am new to both).
Possible, or should I just write a program to do this?
I think I oversimplified things a bit too much. The distance between the search string and both ends are dynamic. It looks actually more like this.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Report for yyyyyy
something trivia here
and there
Your info 999-9999999
another thing here
and elsewhere
End of Report
Report for zzzzzz
Simpler beginnin
Your info 000-0000000
lots of stuffs
under
this
line, bla bla
End of Report
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
So, once I found 999-9999999, I need to search backward for "Report for", then forward for "End of Report" as these are the only indicator of where the report belongs. (note: indentation for readability, actual report has no indentations)
Unfortunately it is not a homework, otherwise I'll have classmates whom I can ask for help :-), instead of searching in the Net. (lame joke, I know)
I must admit that I am not good at sed, nor awk (and thus, perl). So far I can get away with grep. Try to `grep -C` is the closest I can get, but it is fixed number of lines so doesn't help much.
And here's a Bash way (pretty low-tech though; I guess the AWKxperts are not around today )
Code:
#!/bin/bash
save='0'
while read line; do
[ "$(echo "$line" | awk '/999-9999999/{print}')" ] && save='1'
if [ "$save" = '1' ]; then
output="${output}\n${line}"
if [ "$(echo "$line" | grep "End of Report")" ]; then
save='0'
fi
fi
done < test
save='0'
tac test > .temptest
while read line; do
[ "$(echo "$line" | awk '/999-9999999/{print}')" ] && save='2'
if [ $save = '1' ]; then
output="${line}\n${output}"
if [ "$(echo "$line" | grep "Report for")" ]; then
save=0
fi
fi
[ "$save" = '2' ] && save='1'
done < .temptest
echo -e "${output}" | sed '/^$/{
N
/^\n$/D
}'
rm .temptest
Thanks for the sample. I have yet to test it out (am quite dizzy after reading lot of examples from the Net, also I am calling the day off). Nevertheless I will revert back to you on the results.
On the other hand, I have "half" of the solution: to print from matching patter till end of report,
sed -n '/999-9999999/,/End of Report/p' bigFile
Now, it will be great if I can find the "first half" of the solutions......
Assuming a big file ie too big for swallowing into memory
Code:
#!/usr/bin/perl -w
use strict;
my ($start, $data_rec);
open( FILE, "<", "t.t" ) or
die "Can't open file: t.t: $!\n";
# Init start of first block/rpt; assumes no other file headers
$start = 0;
while ( defined ( $data_rec = <FILE> ) )
{
# Match end of Rpt block & save position;
# this will be start of next Rpt block
if( $data_rec =~ /^End/ )
{
$start = tell(FILE);
}
# Match Rpt id and output Rpt lines
if( $data_rec =~ /99-999/ )
{
# Go back to start of Rpt block
seek(FILE, $start, 0);
while ( defined ( $data_rec = <FILE> ) )
{
# Output all recs until end of block
print "$data_rec";
last if( $data_rec =~ /^End/ ) ;
}
# Note where we are; its start of next block
$start = tell(FILE);
}
}
close(FILE) or
die "Can't close file: t.t: $!\n";
You may want to make the key value a variable later, but this will do for now.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.