How can I find a window of text in a file using Perl?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
How can I find a window of text in a file using Perl?
I'm trying to find a window of text within a file using Perl, but I'm not having much luck figuring it out. I've done it with the regular shell and Grep, but in this situation I must use Perl.
The basic idea is to pass a search term to the script, along with a window size, and have the script return the search term from within the file(s), along with a "window" of text surrounding the data.
I.E.
./windowSearch -w4 filename(s) searchterm
Results would be something like:
preceding line
preceding line
searchterm
subsequent line
subsequent line
This wasn't too hard with the shell, as shown below, but I need to do this with Perl.
This is my original script
Code:
#!/bin/bash
# Title: windowSearch
# Purpose: Finds a window of text in a file
# Requirements: none
# Arguments:
# -w specify a window size. I.E. -w3
# -f tell windowSearch that you are searching inside multiple files
# no argument, tells windowSearch to simply search the file for the string.
RETVAL=0
while test -n "$1"; do
case "$1" in
-w|w) # Case to search a single file using a window size
windowSize=$2; searchTerm=$3; filename=$4
a=`grep -n "$searchTerm" $filename | cut -d":" -f 1`
((b=a+$windowSize))
((c=a-$windowSize))
echo; sed -n ""$c","$b" p " $filename; echo
exit
;;
-f|f) # Case to search multiple files
searchTerm=$2;
echo
echo "FILE NAME: SEARCH TERM"
echo "----------------------"
for var in "$@"
do
sed -n "/$searchTerm/s/^/$var: /p" $var 2> /dev/null
done; echo
exit
;;
*) # Default search in a file if no argument was specified
searchTerm=$1
filename=$2
sed -n /$searchTerm/!d $filename
;;
esac
done
exit $RETVAL
Any thoughts or pointers help...I'm kind of lost when it comes to perl.
I'm trying to find a window of text within a file using Perl, but I'm not having much luck figuring it out. I've done it with the regular shell and Grep, but in this situation I must use Perl.
The basic idea is to pass a search term to the script, along with a window size, and have the script return the search term from within the file(s), along with a "window" of text surrounding the data.
I.E.
./windowSearch -w4 filename(s) searchterm
Results would be something like:
preceding line
preceding line
searchterm
subsequent line
subsequent line
This wasn't too hard with the shell, as shown below, but I need to do this with Perl.
This is my original script
Code:
#!/bin/bash
# Title: windowSearch
# Purpose: Finds a window of text in a file
# Requirements: none
# Arguments:
# -w specify a window size. I.E. -w3
# -f tell windowSearch that you are searching inside multiple files
# no argument, tells windowSearch to simply search the file for the string.
RETVAL=0
while test -n "$1"; do
case "$1" in
-w|w) # Case to search a single file using a window size
windowSize=$2; searchTerm=$3; filename=$4
a=`grep -n "$searchTerm" $filename | cut -d":" -f 1`
((b=a+$windowSize))
((c=a-$windowSize))
echo; sed -n ""$c","$b" p " $filename; echo
exit
;;
-f|f) # Case to search multiple files
searchTerm=$2;
echo
echo "FILE NAME: SEARCH TERM"
echo "----------------------"
for var in "$@"
do
sed -n "/$searchTerm/s/^/$var: /p" $var 2> /dev/null
done; echo
exit
;;
*) # Default search in a file if no argument was specified
searchTerm=$1
filename=$2
sed -n /$searchTerm/!d $filename
;;
esac
done
exit $RETVAL
Any thoughts or pointers help...I'm kind of lost when it comes to perl.
First solve a simpler problem - open input file(s) and print its/their contents to STDOUT. For example, start from here:
Okay, I've made some progress. Here's my script to search a file for a term, but I'm at a loss as how to search for a window of text around the matching line.
Code:
#############
# Main body #
#############
# Test to make sure an argument was passed
if ( not defined $ARGV[0] ) {
die "\nYou need to enter a search term.\n";
} else { # Do the search
$search=$ARGV[0]; # Search term
$file=$ARGV[1]; # File to search
open(FILE, "$file"); # Open the file as "FILE"
@array=<FILE>; # Fill the array
close (FILE); # Close the file handle
print "\nSearch Results\n";
foreach $line (@array){
if ($line =~ /$search/){
print "$line";
}
}
}
Okay, I've made some progress. Here's my script to search a file for a term, but I'm at a loss as how to search for a window of text around the matching line.
Code:
#############
# Main body #
#############
# Test to make sure an argument was passed
if ( not defined $ARGV[0] ) {
die "\nYou need to enter a search term.\n";
} else { # Do the search
$search=$ARGV[0]; # Search term
$file=$ARGV[1]; # File to search
open(FILE, "$file"); # Open the file as "FILE"
@array=<FILE>; # Fill the array
close (FILE); # Close the file handle
print "\nSearch Results\n";
foreach $line (@array){
if ($line =~ /$search/){
print "$line";
}
}
}
What is "window" ? How many lines before and after the line with search item ?
Have you read about Perl 'push' and 'shift functions ? Using the two will allow pretty straightforwardly to have the window if I understand correctly what you mean.
By the way, your script is bad in a sense it is memory hungry - because of
change your for loop to loop through a range from 0 to the size of @array. then as the loop goes through each element, search for your line, if found, get that index. Then you can easily get that window you want using that index by subtracting or addition to that index, understand?
change your for loop to loop through a range from 0 to the size of @array. then as the loop goes through each element, search for your line, if found, get that index. Then you can easily get that window you want using that index by subtracting or addition to that index, understand?
ghostdog74, algorithmically your suggestion is correct, but to hold the whole file in array, as I wrote above, is not a good idea.
Furthermore, explicit numeric indexes in Perl programs are rarely needed, and more often than not a solution without them exists - using numeric indexes it's not considered to be good style in Perl.
That's why I'm pushing the OP towards queue/fifo, which for starters can be implemented through an Perl array holding a small line of numbers, using, say, 'push' and 'shift' operations.
ghostdog74, algorithmically your suggestion is correct, but to hold the whole file in array, as I wrote above, is not a good idea.
i agree with you. but i don't think he understands how to do it. I am so tempted to show how, but in order not to spoil your good will, i will just guide him on how to get the next few lines...
@OP, you can use a counter to get next lines after your search pattern
Code:
while(<>){
if(/search/){
$count = 4 # eg, get next 4 lines
}
print if $count-- > 0;
}
you can use this to incorporate into your code together with the one to get previous lines...good luck.
i agree with you. but i don't think he understands how to do it. I am so tempted to show how, but in order not to spoil your good will, i will just guide him on how to get the next few lines...
@OP, you can use a counter to get next lines after your search pattern
Code:
while(<>){
if(/search/){
$count = 4 # eg, get next 4 lines
}
print if $count-- > 0;
}
Thanks for the replies ghostdog74 and Sergei.
As for Sergei's comment about the array, can I just call the file handle directly? I.E.
Although I know conceptually what push and shift do (move data off a stack?) I've never used them. I don't understand what the code in the last post (by ghostdog74) is doing either - print if $count --> 0? What is it printing? You'll have to forgive my ignorance - I've barely used Perl at all, and most of my scripting has been basic things.
The "window" I refer to means "find the matching line, then print the nth line above and below the matching line as well."
I.E. if our match was found on line 10, and the window size was 1, the script would print lines 9, 10 and 11.
No, you can't work on a filehandle that way. (And you don't want to use foreach since it builds up a list before it iterates, and you are trying to avoid building up a list of the whole file at once (in case the file is very large and would gobble up all your available memory)).
Quote:
Originally Posted by hgate73
Although I know conceptually what push and shift do (move data off a stack?) I've never used them.
You should read up on them. They are very basic functions in handling arrays in Perl. (See perldoc -f push and perldoc -f shift.)
Quote:
Originally Posted by hgate73
I don't understand what the code in the last post (by ghostdog74) is doing either - print if $count --> 0? What is it printing?
If you call print without an explicit item you want to print, then Perl defaults to printing whatever is currently in the special variable $_. Since that variable also is the default for reading through a file using the diamond operator (while <FILE>...), you often see people code like this:
Code:
while (<$file_handle>) {
s/foo/bar/; # = s/foo/bar/ =~ $_ = s/foo/bar/ in the current line
print; # = print $_ = print current line in the file
}
This sort of thing is hard to read initially for some people, but it saves you a lot of typing. It's idiomatic Perl, so you will have to get used to seeing it (if you continue to use Perl). Ghostdog is trying to suggest a way to print your n items after the find. (His solution there doesn't cover the n lines before the find.)
Quote:
Originally Posted by hgate73
You'll have to forgive my ignorance - I've barely used Perl at all, and most of my scripting has been basic things.
The "window" I refer to means "find the matching line, then print the nth line above and below the matching line as well."
I.E. if our match was found on line 10, and the window size was 1, the script would print lines 9, 10 and 11.
This is a bit trickier than it may seem, and it seems like a bad idea to do it using a language you don't know at all. In a nutshell, you will need to (1) work through the file line by line, but (2) keep a running array of the last n lines - where n = your window, and then (3) when you find a hit, (4) print the last n lines from the saved array plus the current line, plus the next n lines. If I understand Sergei correctly, he is suggesting push and shift for step 2. You push (add) items onto the end of the array and shift (remove) them from the front.
Code:
my @array = qw/one two three four/;
push @array, 'five'; # @array now = 'one', 'two', 'three', 'four', 'five'
shift @array; # @array now = 'two', 'three', 'four', 'five'
Since an entire string (ie, a line of a file) can easily be an item of a Perl array, you can store your last n lines using this technique.
Although I know conceptually what push and shift do (move data off a stack?) I've never used them. I don't understand what the code in the last post (by ghostdog74) is doing either - print if $count --> 0? What is it printing? You'll have to forgive my ignorance - I've barely used Perl at all, and most of my scripting has been basic things.
The "window" I refer to means "find the matching line, then print the nth line above and below the matching line as well."
I.E. if our match was found on line 10, and the window size was 1, the script would print lines 9, 10 and 11.
In order to "feel" what 'push' and 'shift' do, temporarily forget about matching, and implement the following - for a desired $n (say, 3) read line from file and whenever possible, print last $n lines read uncoditionally, i.e. after reading line #3 prin line numbers 1, 2, 3; after reading line #4 print line numbers 2, 3, 4 and so forth.
When you know how to do this, the solution to your complete problem will be obvious.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.