grep till second occurance of pattern

rattlesnakejoe · 11-22-2009, 05:54 AM

Hi...

I want the lines upto the second occurance of a pattern to be printed.... i
know only till the first occurance....

i use awk '/pattern1/,/pattern2/' file

or sed -n '/pat1/,/pat2/p' file

I want all the lines from the start of the file till the second occurance of pattern called "Score"

Thanks

ghostdog74 · 11-22-2009, 06:01 AM

show input examples and output.

gnashley · 11-22-2009, 07:59 AM

grep -m2

ghostdog74 · 11-22-2009, 08:32 AM

Quote:

Originally Posted by gnashley

grep -m2

I think the OP wants to print lines until the 2nd occurrence of pattern, if i understand his english properly

ghostdog74 · 11-22-2009, 08:39 AM

Code:

awk -vf=0 '/pattern/{f=1;++d}d!=2{print}f&&d==2{print;exit}' file

pixellany · 11-22-2009, 09:11 AM

Code:

sed -n '/pattern/{:1;p;n;/pattern/{p;q};b1};p' filename

makyo · 11-22-2009, 12:12 PM

Hi.

Some members of the grep family can do this, for example, the non-standard cgrep:

Code:

#!/usr/bin/env bash

# @(#) s1	Demonstrate copying through the Nth occurrence of pattern.
# http://www.bell-labs.com/project/wwexptools/cgrep/

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) cgrep
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

echo
echo " Results:"
cgrep -N 2 -999999 "score" $FILE
# If only the matched lines are desired:
# cgrep -N 2 "score" $FILE

exit 0

Producing:

Code:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
cgrep - (local: ~/executable/cgrep May 29 10:59 )

 Data file data1:
one
two
three
four score
five
six score
seven
eight score

 Results:
one
two
three
four score
five
six score

Of the many options for cgrep, these mean: consider only the first 2 matched lines, and copy (almost) a million lines that precede them (that is, in the case of a reasonable-length file, all the lines from the beginning of the file).

The URL in the script shows where to get cgrep. You will need a c compiler. I have compiled it in both 32 and 64-bit systems without trouble. It comes with a standard man page.

Best wishes ... cheers, makyo

urban.yoga.journeys · 11-22-2009, 08:24 PM

the others already have cleaner solutions to this but here's a bash script that will do the job. to use it type the script name, followed by the pattern, then the file name

Code:

#!/bin/bash

if [ $# != 2 ]; then
        echo "USAGE: $0 pattern file"
        exit 1
fi


#counter for pattern matches
count="0"
#pattern is first argument
pattern="$1"
#file is second argument
file="$2"

#for every line read, grep will try to match the pattern
#if pattern is found, counter increments by 1, once it reaches two it will exit the loop
while read line; do
        echo $line
        echo $line | grep -q $pattern
        if [ $? == "0" ]; then
                count=$[$count+1]
        fi
        if [ $count == "2" ]; then
                break
        fi
done <$2

sundialsvcs · 11-22-2009, 08:35 PM

Tools like awk, and of course the programming language perl, are well-designed for tasks like this.

For instance: awk programs generally consist of occurrences of blocks that look like this:

/pattern_to_match/ { what_to_do_when_you_find_it }

If you stop and think about that (as you surf to and read an online copy of man awk ...), "you can do a helluva lot of goodness with a tool like that." And since it's based on the same technology that grep uses, you won't lose any speed (nor any sleep) over it. Nor will you have to, really, "write anything."

ghostdog74 · 11-22-2009, 09:05 PM

Quote:

Originally Posted by urban.yoga.journeys

here's a bash script that will do the job.

well, you can do the "grep" with bash without grep or other external tools

Code:

let count=0
while read -r line
do
  case "$line" in 
    *PATTERN* ) count=$((count+1));;
  esac
  [ "$count" -eq 2 ] && echo $line && break
  echo $line
done <"file"

urban.yoga.journeys · 11-22-2009, 09:39 PM

that's also true, i didn't think about matching word per word.

however, since you're reading from a file, there are no backslashes correct? don't those only occur from STDOUT of commands? i assume that you're using the -r option to escape the backslashes that are used to denote spaces, in this case spaces between the words?

as a side question, not to hijack the thread, but i've come across this bit of code that i don't really understand.

Code:

read -d $'\000'

i know that -d denotes the delimiter, but what about $'\000'?

ghostdog74 · 11-22-2009, 09:52 PM

Quote:

Originally Posted by urban.yoga.journeys

however, since you're reading from a file, there are no backslashes correct?

why not? anything can be in a file.!

Quote:

Code:

read -d $'\000'

i know that -d denotes the delimiter, but what about $'\000'?

if you look at the ascii table, that's the octal code for NULL.

urban.yoga.journeys · 11-22-2009, 10:02 PM

i mean, backslashes in the shell are used to denote spaces between words/filenames. in a file those backslashes aren't present between spaces. it's the difference between:

Code:

for line in $(ls); do
   echo $line
done

and

Code:

file=$(ls)
for line in file; do
   echo $file
done

isn't that why you're using the -r option, to make sure that read will read each word, instead of each line? or did i totally misunderstand something?

btw thanks for the help with that other thing.

ghostdog74 · 11-22-2009, 10:20 PM

Quote:

Originally Posted by urban.yoga.journeys

i mean, backslashes in the shell are used to denote spaces between words/filenames. in a file those backslashes aren't present between spaces. it's the difference between:

Code:

for line in $(ls); do
   echo $line
done

first, don't iterate a list of files using ls and for like that. ls is redundant.

Code:

for file in *
do
 echo $file
done

as for your read -r question. take a look at this example of input file

Code:

a b \ c

there's a backslash in the file. If there's no -r

Code:

$ while read line; do echo $line; done < "file"
a b c

But i want to keep the backslash, that's why i need it not to escape

Code:

$ while read -r line; do echo $line; done < "file"
a b \ c

urban.yoga.journeys · 11-22-2009, 10:42 PM

Quote:

Originally Posted by ghostdog74

first, don't iterate a list of files using ls and for like that. ls is redundant.

Code:

for file in *
do
 echo $file
done

yeah i use the globbing feature, that was just to demonstrate what i was trying to get across.

ok so you're using -r to keep literal backslashes in the line, i understand that now.

then with this piece of code:

Code:

while read -r line
do
  case "$line" in 
    *PATTERN* ) count=$((count+1));;

as i understand it, and i'm no expert so please bear with me, $line is the entire line correct? which would mean that *PATTERN* would also have to be the entire line to match, if it were only a word or part of a word, there would be no match.

or does the while-read-line construct read word per word? AFAIK it reads entire lines, but like i said i'm no expert