LinuxQuestions.org - Sed/Awk: print lines between n'th and (n+1)'th match of "foo"

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Sed/Awk: print lines between n'th and (n+1)'th match of "foo" (https://www.linuxquestions.org/questions/programming-9/sed-awk-print-lines-between-nth-and-n-1-th-match-of-foo-577962/)

Sed/Awk: print lines between n'th and (n+1)'th match of "foo"

I have a textfile, which may for example look like this:

Code:

blaaat

foo bar

some text here

-o-

more text

more foo's

and even more bar

-o-

.<only a dot>

-o-

...

....

...

...

...

-o-

There's a "^-o-$" between every 'record'.
The user enters a number (call it $1), my script should use sed and/or awk to print out the x-th record in the file.

This is what I've got so far, it's only input-checking actually:

Code:

ubound=`cat $file | grep -c -e -o-`

test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null

if [ $? = 0 ]; then

        echo "Warning: n should be in the range [1,$ubound]."

        exit 1

fi

Can anyone help me further on the sed/awk part please?

awk -v RS='-o-' 'NR == '$1' { print }' input_file

Quote:

Originally Posted by xaverius (Post 2862845)

Code:

ubound=`cat $file | grep -c -e -o-`

test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null

if [ $? = 0 ]; then

        echo "Warning: n should be in the range [1,$ubound]."

        exit 1

fi

Incidentally, why not

Code:

ubound=$(grep -c '^-o-$' $file)

if [[ $1 -lt 1 || $1 -gt $ubound ]]; then

        echo "Warning: n should be in the range [1,$ubound]."

        exit 1

fi

Quote:

Originally Posted by digiot (Post 2862865)

Incidentally, why not

Code:

ubound=$(grep -c '^-o-$' $file)

if [[ $1 -lt 1 || $1 -gt $ubound ]]; then

        echo "Warning: n should be in the range [1,$ubound]."

        exit 1

fi

Because of the stderr-redirection: users might nog enter a real number, but rather something like "1q" or "foo"... doing it this way, the script will print a nice error message and just exit :)
edit2: I hadn't noticed the double [[/]] you used there... do they have the same effect?

edit: Why should I prefer your way of setting ubound above mine? (no offense ;) )
I agree it looks better, but are there other advantages? I never actually learned about the $( )-construction, so...

Also, the solution you first advised does not seem to work correctly, here's a bash-session:

Code:

Macbook-2:~/ex_st xaverius$ cat getMessage 

#!/bin/bash



file=~/.forumpje.txt

EOF="-o-"



if [ $# != 1 ]; then

        echo "Syntax: $0 <nr>"

        exit 1

fi



ubound=`cat $file | grep -c -e -o-`

test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null

if [ $? = 0 ]; then

        echo "Waarschuwing: de waarde van nr moet in het interval [1,$ubound] gelegen zijn."

        exit 1

fi



awk -v RS='-o-' 'NR == '$1' { print }' $file

Macbook-2:~/ex_st xaverius$ cat ~/.forumpje.txt 



me

niets

blaat nieuwe regel enzo done

Fri Aug 17 15:13:46 CEST 2007

-o-

ikke

veel over weinig

niets dus ;-)

Fri Aug 17 15:14:01 CEST 2007

-o-

Macbook-2:~/ex_st xaverius$ getMessage 1



me

niets

blaat nieuwe regel enzo done

Fri Aug 17 15:13:46 CEST 2007



Macbook-2:~/ex_st xaverius$ getMessage 2

o

Macbook-2:~/ex_st xaverius$

Check out the last command, doesn't seem correct imo...

Hi,

I just tried (copy->paste) the test session in your previous post (#5), and it works.

I did notice something else: Your prompt says: Macbook-2. Are you trying this on Apple's OS X?
If so, check to see which awk is actually used and if it is posix compliant. Maybe you can use nawk instead.

Hope this helps.

I have no idea which version of awk OSX is using, but I had access to another machine:

Code:

$ uname -a

SunOS <hostname> 5.8 Generic_108528-07 sun4u sparc SUNW,Ultra-4

The code runs smoothly there, and since this is the machine it's supposed to run on later, it's ok now :)
Thx!

Btw: still interested in a way to solve it using sed though :p

Quote:

Originally Posted by xaverius (Post 2862936)

I have no idea which version of awk OSX is using

You can get gawk for Macs but it may not be the default and awks other than gawk probably don't accept multichar regexes for RS.

Quote:

Originally Posted by xaverius (Post 2862874)

Because of the stderr-redirection: users might nog enter a real number, but rather something like "1q" or "foo"... doing it this way, the script will print a nice error message and just exit :)

Well, this is bash-specific, so maybe you're better off the way you had it. It's just that testing the return code with a test, when that's what test *does* kind of bothers me.

Code:

if [[ ! $1 =~ ^[0-9]+$ || $1 -lt 1 || $1 -gt $ubound ]]; then

        echo "Warning: n should be in the range [1,$ubound]."

        exit 1

fi

This will require one or more integers (and only integers) for the argument. || and -o both mean 'or', but work differently (or fail to work) with the test/[ builtin and the [[ command. And I just find || more readable than -o in general. The $(...) syntax is about the same as the `...` syntax, except that it nests better and is regarded as the more 'modern' way (though I still type `...` in interactive shells where possible). This is just me, though. The only important things were the removal of a useless cat in the assignment to 'ubound' and the combining two tests into one. I also tightened up the regex for grep (and I should have made it tighter for the (g)awk) (by anchoring it).

Quote:

Originally Posted by xaverius (Post 2862874)

edit: Why should I prefer your way of setting ubound above mine? (no offense ;) )

Anyway, no offense taken at all. :) My attitude is that, as long as it works, there's no reason to prefer one way over another. It's just that some things work in certain corner cases and fail in others, some things are microscopically more efficient than others (which can add up in certain scenarios), some things just look or feel better than others, etc. It's largely just a matter of taste, and that's why I asked 'why not' rather than said 'you should have' ;) Your point about input validation is a good one, when I wasn't initially thinking of even testing for non-integer input.

Quote:

Originally Posted by xaverius (Post 2862936)

Btw: still interested in a way to solve it using sed though :p

It's probably doable, but (g)awk is much better designed for this sort of task and using sed to that extent makes my head hurt.

Quote:

Originally Posted by xaverius (Post 2862884)

Also, the solution you first advised does not seem to work correctly,

that's because the awk has a syntax error. The correct one should look something like this

Code:

# var=1

# awk -v input=$var 'BEGIN{RS="-o-"}NR==input{ print  }' file

blaaat

foo bar

some text here

Quote:

Originally Posted by ghostdog74 (Post 2863587)

that's because the awk has a syntax error. The correct one should look something like this

Code:

# var=1

# awk -v input=$var 'BEGIN{RS="-o-"}NR==input{ print  }' file

blaaat

foo bar

some text here

I've been completely brain-damaged lately but, still, if you're referring to this:

Code:

awk -v RS='-o-' 'NR == '$1' { print }' input_file

there is no syntax error and it works under both gawk and mawk (and surprisingly, to me, under original awk) and apparently under SunOS as well as Linux - dunno why it breaks on OSX, but it's OSX-specific (or 'whatever-OSX's-awk-is'-specific).

Code:

:cat foo.sh

ubound=$(grep -c '^-o-$' $2)

if [[ ! $1 =~ ^[0-9]+$ || $1 -lt 1 || $1 -gt $ubound ]]; then

        echo "Warning: n should be in the range [1,$ubound]."

        exit 1

fi

awk -v RS='-o-' 'NR == '$1' { print }' $2



:sh foo.sh 5q foo

Warning: n should be in the range [1,4].



:sh foo.sh 1 foo

blaaat

foo bar

some text here

Quote:

Originally Posted by digiot (Post 2863919)

I've been completely brain-damaged lately but, still, if you're referring to this:

Code:

awk -v RS='-o-' 'NR == '$1' { print }' input_file

there is no syntax error and it works under both gawk and mawk ..

Code:

# more file

blaaat

foo bar

some text here

-o-

more text

more foo's

and even more bar

-o-

.<only a dot>

-o-

# awk -v RS='-o-' 'NR == '$1' { print }' file

awk: NR ==  { print }

awk:        ^ syntax error

# awk --version

GNU Awk 3.1.5

Copyright (C) 1989, 1991-2005 Free Software Foundation.

So give $1 a value by passing in an argument.

Quote:

Originally Posted by digiot (Post 2864087)

So give $1 a value by passing in an argument.

my bad. thought it was awk's $1

Ah, yeah, that's easy to do. One of the most bothersome things about mixing shell and awk.

This is what the final script looks like:

Code:

#!/bin/bash



file=~/.forumpje.txt

EOF="-o-"



if [ $# != 1 ]; then

        echo "Syntax: $0 <nr>"

        exit 1

fi



ubound=`cat $file | grep -c -e -o-`

test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null

if [ $? = 0 ]; then

        echo "Waarschuwing: de waarde van nr moet in het interval [1,$ubound] gelegen zijn."

        exit 1

fi



awk -v RS="$EOF" 'NR == '$1' { print }' $file

I think I don't really understand the '$1'-part, but it works, so that's okay :)

It's tested on GNU awk 3.0.4

Can someone explain the awk rule please?
If the current record = $1, then print this line - but what is the value of $1? Is it supposed to be the bash $1, or the awk $1? It's not really clear to me... The -v flag is used to set the recordseperator (RS) I guess...