LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   How can I find a window of text in a file using Perl? (https://www.linuxquestions.org/questions/programming-9/how-can-i-find-a-window-of-text-in-a-file-using-perl-722137/)

hgate73 04-27-2009 11:23 PM

How can I find a window of text in a file using Perl?
 
I'm trying to find a window of text within a file using Perl, but I'm not having much luck figuring it out. I've done it with the regular shell and Grep, but in this situation I must use Perl.

The basic idea is to pass a search term to the script, along with a window size, and have the script return the search term from within the file(s), along with a "window" of text surrounding the data.

I.E.

./windowSearch -w4 filename(s) searchterm

Results would be something like:

preceding line
preceding line
searchterm
subsequent line
subsequent line

This wasn't too hard with the shell, as shown below, but I need to do this with Perl.

This is my original script

Code:

#!/bin/bash
# Title:          windowSearch
# Purpose:        Finds a window of text in a file
# Requirements:  none

# Arguments:
# -w    specify a window size. I.E. -w3
# -f    tell windowSearch that you are searching inside multiple files
#    no argument, tells windowSearch to simply search the file for the string.

RETVAL=0

while test -n "$1"; do
  case "$1" in

  -w|w) # Case to search a single file using a window size
    windowSize=$2; searchTerm=$3; filename=$4
    a=`grep -n "$searchTerm" $filename | cut -d":" -f 1`
    ((b=a+$windowSize))
    ((c=a-$windowSize))
    echo; sed -n ""$c","$b" p " $filename; echo
    exit
      ;;

  -f|f) # Case to search multiple files
    searchTerm=$2;
    echo
    echo "FILE NAME: SEARCH TERM"
    echo "----------------------"
    for var in "$@"
    do
          sed -n "/$searchTerm/s/^/$var: /p" $var 2> /dev/null
    done; echo
    exit
      ;;
  *)  # Default search in a file if no argument was specified
    searchTerm=$1
    filename=$2
    sed -n /$searchTerm/!d $filename
      ;;
  esac
done

exit $RETVAL

Any thoughts or pointers help...I'm kind of lost when it comes to perl.

Sergei Steshenko 04-27-2009 11:38 PM

Quote:

Originally Posted by hgate73 (Post 3523124)
I'm trying to find a window of text within a file using Perl, but I'm not having much luck figuring it out. I've done it with the regular shell and Grep, but in this situation I must use Perl.

The basic idea is to pass a search term to the script, along with a window size, and have the script return the search term from within the file(s), along with a "window" of text surrounding the data.

I.E.

./windowSearch -w4 filename(s) searchterm

Results would be something like:

preceding line
preceding line
searchterm
subsequent line
subsequent line

This wasn't too hard with the shell, as shown below, but I need to do this with Perl.

This is my original script

Code:

#!/bin/bash
# Title:          windowSearch
# Purpose:        Finds a window of text in a file
# Requirements:  none

# Arguments:
# -w    specify a window size. I.E. -w3
# -f    tell windowSearch that you are searching inside multiple files
#    no argument, tells windowSearch to simply search the file for the string.

RETVAL=0

while test -n "$1"; do
  case "$1" in

  -w|w) # Case to search a single file using a window size
    windowSize=$2; searchTerm=$3; filename=$4
    a=`grep -n "$searchTerm" $filename | cut -d":" -f 1`
    ((b=a+$windowSize))
    ((c=a-$windowSize))
    echo; sed -n ""$c","$b" p " $filename; echo
    exit
      ;;

  -f|f) # Case to search multiple files
    searchTerm=$2;
    echo
    echo "FILE NAME: SEARCH TERM"
    echo "----------------------"
    for var in "$@"
    do
          sed -n "/$searchTerm/s/^/$var: /p" $var 2> /dev/null
    done; echo
    exit
      ;;
  *)  # Default search in a file if no argument was specified
    searchTerm=$1
    filename=$2
    sed -n /$searchTerm/!d $filename
      ;;
  esac
done

exit $RETVAL

Any thoughts or pointers help...I'm kind of lost when it comes to perl.


First solve a simpler problem - open input file(s) and print its/their contents to STDOUT. For example, start from here:

http://www.perlfect.com/articles/perlfile.shtml
.

Then choosing lines according to some criteria will be simple.

...

And will 'grep' do what you need ?

chrism01 04-28-2009 02:39 AM

Well, here's the perldocs (with examples); specifically open().
http://perldoc.perl.org/functions/open.html

hgate73 04-30-2009 12:34 AM

Okay, I've made some progress. Here's my script to search a file for a term, but I'm at a loss as how to search for a window of text around the matching line.

Code:

#############
# Main body #
#############

# Test to make sure an argument was passed
if ( not defined $ARGV[0] ) {
        die "\nYou need to enter a search term.\n";
} else { # Do the search
       
        $search=$ARGV[0];        # Search term
        $file=$ARGV[1];                # File to search

        open(FILE, "$file");        # Open the file as "FILE"
        @array=<FILE>;                # Fill the array
        close (FILE);                # Close the file handle

        print "\nSearch Results\n";

        foreach $line (@array){
                if ($line =~ /$search/){
                        print "$line";
                }
        }
}


Sergei Steshenko 04-30-2009 12:55 AM

Quote:

Originally Posted by hgate73 (Post 3525425)
Okay, I've made some progress. Here's my script to search a file for a term, but I'm at a loss as how to search for a window of text around the matching line.

Code:

#############
# Main body #
#############

# Test to make sure an argument was passed
if ( not defined $ARGV[0] ) {
        die "\nYou need to enter a search term.\n";
} else { # Do the search
       
        $search=$ARGV[0];        # Search term
        $file=$ARGV[1];                # File to search

        open(FILE, "$file");        # Open the file as "FILE"
        @array=<FILE>;                # Fill the array
        close (FILE);                # Close the file handle

        print "\nSearch Results\n";

        foreach $line (@array){
                if ($line =~ /$search/){
                        print "$line";
                }
        }
}


What is "window" ? How many lines before and after the line with search item ?

Have you read about Perl 'push' and 'shift functions ? Using the two will allow pretty straightforwardly to have the window if I understand correctly what you mean.

By the way, your script is bad in a sense it is memory hungry - because of

Code:

        @array=<FILE>;                # Fill the array
.

Functionally you do not need that at all.

ghostdog74 04-30-2009 01:36 AM

Quote:

Originally Posted by hgate73 (Post 3525425)
Code:

        foreach $line (@array){
                if ($line =~ /$search/){
                        print "$line";
                }
        }
}


change your for loop to loop through a range from 0 to the size of @array. then as the loop goes through each element, search for your line, if found, get that index. Then you can easily get that window you want using that index by subtracting or addition to that index, understand?

Sergei Steshenko 04-30-2009 03:57 AM

Quote:

Originally Posted by ghostdog74 (Post 3525471)
change your for loop to loop through a range from 0 to the size of @array. then as the loop goes through each element, search for your line, if found, get that index. Then you can easily get that window you want using that index by subtracting or addition to that index, understand?

ghostdog74, algorithmically your suggestion is correct, but to hold the whole file in array, as I wrote above, is not a good idea.

Furthermore, explicit numeric indexes in Perl programs are rarely needed, and more often than not a solution without them exists - using numeric indexes it's not considered to be good style in Perl.

That's why I'm pushing the OP towards queue/fifo, which for starters can be implemented through an Perl array holding a small line of numbers, using, say, 'push' and 'shift' operations.

ghostdog74 04-30-2009 04:55 AM

Quote:

Originally Posted by Sergei Steshenko (Post 3525577)
ghostdog74, algorithmically your suggestion is correct, but to hold the whole file in array, as I wrote above, is not a good idea.

i agree with you. but i don't think he understands how to do it. I am so tempted to show how, but in order not to spoil your good will, i will just guide him on how to get the next few lines...

@OP, you can use a counter to get next lines after your search pattern
Code:

while(<>){
 if(/search/){
  $count = 4 # eg, get next 4 lines 
 }
 print if $count-- > 0;
}

you can use this to incorporate into your code together with the one to get previous lines...good luck.

hgate73 04-30-2009 12:13 PM

Quote:

Originally Posted by ghostdog74 (Post 3525631)
i agree with you. but i don't think he understands how to do it. I am so tempted to show how, but in order not to spoil your good will, i will just guide him on how to get the next few lines...

@OP, you can use a counter to get next lines after your search pattern
Code:

while(<>){
 if(/search/){
  $count = 4 # eg, get next 4 lines 
 }
 print if $count-- > 0;
}


Thanks for the replies ghostdog74 and Sergei.

As for Sergei's comment about the array, can I just call the file handle directly? I.E.

Code:

        open(FILE, $file);       
        print "\nSearch Results\n";

        foreach $line (FILE){
                if ($line =~ /$search/){
                        print "Your search returned: $line";
                }
        close (NUMBERS);
        }
}


Secondly,

Although I know conceptually what push and shift do (move data off a stack?) I've never used them. I don't understand what the code in the last post (by ghostdog74) is doing either - print if $count --> 0? What is it printing? You'll have to forgive my ignorance - I've barely used Perl at all, and most of my scripting has been basic things.

The "window" I refer to means "find the matching line, then print the nth line above and below the matching line as well."

I.E. if our match was found on line 10, and the window size was 1, the script would print lines 9, 10 and 11.

Telemachos 04-30-2009 01:30 PM

Quote:

Originally Posted by hgate73
As for Sergei's comment about the array, can I just call the file handle directly? I.E.

Code:

        open(FILE, $file);       
        print "\nSearch Results\n";

        foreach $line (FILE){


No, you can't work on a filehandle that way. (And you don't want to use foreach since it builds up a list before it iterates, and you are trying to avoid building up a list of the whole file at once (in case the file is very large and would gobble up all your available memory)).

Quote:

Originally Posted by hgate73
Although I know conceptually what push and shift do (move data off a stack?) I've never used them.

You should read up on them. They are very basic functions in handling arrays in Perl. (See perldoc -f push and perldoc -f shift.)

Quote:

Originally Posted by hgate73
I don't understand what the code in the last post (by ghostdog74) is doing either - print if $count --> 0? What is it printing?

If you call print without an explicit item you want to print, then Perl defaults to printing whatever is currently in the special variable $_. Since that variable also is the default for reading through a file using the diamond operator (while <FILE>...), you often see people code like this:
Code:

while (<$file_handle>) {
  s/foo/bar/; # = s/foo/bar/ =~ $_ = s/foo/bar/ in the current line
  print;      # = print $_ = print current line in the file
}

This sort of thing is hard to read initially for some people, but it saves you a lot of typing. It's idiomatic Perl, so you will have to get used to seeing it (if you continue to use Perl). Ghostdog is trying to suggest a way to print your n items after the find. (His solution there doesn't cover the n lines before the find.)

Quote:

Originally Posted by hgate73
You'll have to forgive my ignorance - I've barely used Perl at all, and most of my scripting has been basic things.

The "window" I refer to means "find the matching line, then print the nth line above and below the matching line as well."

I.E. if our match was found on line 10, and the window size was 1, the script would print lines 9, 10 and 11.

This is a bit trickier than it may seem, and it seems like a bad idea to do it using a language you don't know at all. In a nutshell, you will need to (1) work through the file line by line, but (2) keep a running array of the last n lines - where n = your window, and then (3) when you find a hit, (4) print the last n lines from the saved array plus the current line, plus the next n lines. If I understand Sergei correctly, he is suggesting push and shift for step 2. You push (add) items onto the end of the array and shift (remove) them from the front.
Code:

my @array = qw/one two three four/;
push @array, 'five'; # @array now = 'one', 'two', 'three', 'four', 'five'
shift @array;        # @array now = 'two', 'three', 'four', 'five'

Since an entire string (ie, a line of a file) can easily be an item of a Perl array, you can store your last n lines using this technique.

Sergei Steshenko 04-30-2009 02:13 PM

Quote:

Originally Posted by hgate73 (Post 3525997)
Thanks for the replies ghostdog74 and Sergei.

As for Sergei's comment about the array, can I just call the file handle directly? I.E.

Code:

        open(FILE, $file);       
        print "\nSearch Results\n";

        foreach $line (FILE){
                if ($line =~ /$search/){
                        print "Your search returned: $line";
                }
        close (NUMBERS);
        }
}


Secondly,

Although I know conceptually what push and shift do (move data off a stack?) I've never used them. I don't understand what the code in the last post (by ghostdog74) is doing either - print if $count --> 0? What is it printing? You'll have to forgive my ignorance - I've barely used Perl at all, and most of my scripting has been basic things.

The "window" I refer to means "find the matching line, then print the nth line above and below the matching line as well."

I.E. if our match was found on line 10, and the window size was 1, the script would print lines 9, 10 and 11.

In order to "feel" what 'push' and 'shift' do, temporarily forget about matching, and implement the following - for a desired $n (say, 3) read line from file and whenever possible, print last $n lines read uncoditionally, i.e. after reading line #3 prin line numbers 1, 2, 3; after reading line #4 print line numbers 2, 3, 4 and so forth.

When you know how to do this, the solution to your complete problem will be obvious.


All times are GMT -5. The time now is 05:28 PM.