Sed/Awk: print lines between n'th and (n+1)'th match of "foo"

xaverius · 08-18-2007, 03:51 AM

I have a textfile, which may for example look like this:

Code:

blaaat
foo bar
some text here
-o-
more text
more foo's
and even more bar
-o-
.<only a dot>
-o-
...
....
...
...
...
-o-

There's a "^-o-$" between every 'record'.
The user enters a number (call it $1), my script should use sed and/or awk to print out the x-th record in the file.

This is what I've got so far, it's only input-checking actually:

Code:

ubound=`cat $file | grep -c -e -o-`
test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null
if [ $? = 0 ]; then
        echo "Warning: n should be in the range [1,$ubound]."
        exit 1
fi

Can anyone help me further on the sed/awk part please?

slakmagik · 08-18-2007, 04:13 AM

awk -v RS='-o-' 'NR == '$1' { print }' input_file

slakmagik · 08-18-2007, 04:44 AM

Quote:

Originally Posted by xaverius

Code:

ubound=`cat $file | grep -c -e -o-`
test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null
if [ $? = 0 ]; then
        echo "Warning: n should be in the range [1,$ubound]."
        exit 1
fi

Incidentally, why not

Code:

ubound=$(grep -c '^-o-$' $file)
if [[ $1 -lt 1 || $1 -gt $ubound ]]; then
        echo "Warning: n should be in the range [1,$ubound]."
        exit 1
fi

xaverius · 08-18-2007, 05:07 AM

Quote:

Originally Posted by digiot

Incidentally, why not

Code:

ubound=$(grep -c '^-o-$' $file)
if [[ $1 -lt 1 || $1 -gt $ubound ]]; then
        echo "Warning: n should be in the range [1,$ubound]."
        exit 1
fi

Because of the stderr-redirection: users might nog enter a real number, but rather something like "1q" or "foo"... doing it this way, the script will print a nice error message and just exit

edit2: I hadn't noticed the double [[/]] you used there... do they have the same effect?

edit: Why should I prefer your way of setting ubound above mine? (no offense

)
I agree it looks better, but are there other advantages? I never actually learned about the $( )-construction, so...

xaverius · 08-18-2007, 05:23 AM

Also, the solution you first advised does not seem to work correctly, here's a bash-session:

Code:

Macbook-2:~/ex_st xaverius$ cat getMessage 
#!/bin/bash

file=~/.forumpje.txt
EOF="-o-"

if [ $# != 1 ]; then
        echo "Syntax: $0 <nr>"
        exit 1
fi

ubound=`cat $file | grep -c -e -o-`
test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null
if [ $? = 0 ]; then
        echo "Waarschuwing: de waarde van nr moet in het interval [1,$ubound] gelegen zijn."
        exit 1
fi

awk -v RS='-o-' 'NR == '$1' { print }' $file
Macbook-2:~/ex_st xaverius$ cat ~/.forumpje.txt 

me
niets
blaat nieuwe regel enzo done
Fri Aug 17 15:13:46 CEST 2007
-o-
ikke
veel over weinig
niets dus ;-)
Fri Aug 17 15:14:01 CEST 2007
-o-
Macbook-2:~/ex_st xaverius$ getMessage 1

me
niets
blaat nieuwe regel enzo done
Fri Aug 17 15:13:46 CEST 2007

Macbook-2:~/ex_st xaverius$ getMessage 2
o
Macbook-2:~/ex_st xaverius$

Check out the last command, doesn't seem correct imo...

druuna · 08-18-2007, 06:16 AM

Hi,

I just tried (copy->paste) the test session in your previous post (#5), and it works.

I did notice something else: Your prompt says: Macbook-2. Are you trying this on Apple's OS X?
If so, check to see which awk is actually used and if it is posix compliant. Maybe you can use nawk instead.

Hope this helps.

xaverius · 08-18-2007, 07:01 AM

I have no idea which version of awk OSX is using, but I had access to another machine:

Code:

$ uname -a
SunOS <hostname> 5.8 Generic_108528-07 sun4u sparc SUNW,Ultra-4

The code runs smoothly there, and since this is the machine it's supposed to run on later, it's ok now

Thx!

Btw: still interested in a way to solve it using sed though

slakmagik · 08-18-2007, 03:55 PM

Quote:

Originally Posted by xaverius

I have no idea which version of awk OSX is using

You can get gawk for Macs but it may not be the default and awks other than gawk probably don't accept multichar regexes for RS.

Quote:

Originally Posted by xaverius

Because of the stderr-redirection: users might nog enter a real number, but rather something like "1q" or "foo"... doing it this way, the script will print a nice error message and just exit

Well, this is bash-specific, so maybe you're better off the way you had it. It's just that testing the return code with a test, when that's what test *does* kind of bothers me.

Code:

if [[ ! $1 =~ ^[0-9]+$ || $1 -lt 1 || $1 -gt $ubound ]]; then
        echo "Warning: n should be in the range [1,$ubound]."
        exit 1
fi

This will require one or more integers (and only integers) for the argument. || and -o both mean 'or', but work differently (or fail to work) with the test/[ builtin and the [[ command. And I just find || more readable than -o in general. The $(...) syntax is about the same as the `...` syntax, except that it nests better and is regarded as the more 'modern' way (though I still type `...` in interactive shells where possible). This is just me, though. The only important things were the removal of a useless cat in the assignment to 'ubound' and the combining two tests into one. I also tightened up the regex for grep (and I should have made it tighter for the (g)awk) (by anchoring it).

Quote:

Originally Posted by xaverius

edit: Why should I prefer your way of setting ubound above mine? (no offense

)

Anyway, no offense taken at all.

My attitude is that, as long as it works, there's no reason to prefer one way over another. It's just that some things work in certain corner cases and fail in others, some things are microscopically more efficient than others (which can add up in certain scenarios), some things just look or feel better than others, etc. It's largely just a matter of taste, and that's why I asked 'why not' rather than said 'you should have'

Your point about input validation is a good one, when I wasn't initially thinking of even testing for non-integer input.

Quote:

Originally Posted by xaverius

Btw: still interested in a way to solve it using sed though

It's probably doable, but (g)awk is much better designed for this sort of task and using sed to that extent makes my head hurt.

ghostdog74 · 08-19-2007, 05:12 AM

Quote:

Originally Posted by xaverius

Also, the solution you first advised does not seem to work correctly,

that's because the awk has a syntax error. The correct one should look something like this

Code:

# var=1
# awk -v input=$var 'BEGIN{RS="-o-"}NR==input{ print  }' file
blaaat
foo bar
some text here

slakmagik · 08-19-2007, 01:56 PM

Quote:

Originally Posted by ghostdog74

that's because the awk has a syntax error. The correct one should look something like this

Code:

# var=1
# awk -v input=$var 'BEGIN{RS="-o-"}NR==input{ print  }' file
blaaat
foo bar
some text here

I've been completely brain-damaged lately but, still, if you're referring to this:

Code:

awk -v RS='-o-' 'NR == '$1' { print }' input_file

there is no syntax error and it works under both gawk and mawk (and surprisingly, to me, under original awk) and apparently under SunOS as well as Linux - dunno why it breaks on OSX, but it's OSX-specific (or 'whatever-OSX's-awk-is'-specific).

Code:

:cat foo.sh
ubound=$(grep -c '^-o-$' $2)
if [[ ! $1 =~ ^[0-9]+$ || $1 -lt 1 || $1 -gt $ubound ]]; then
        echo "Warning: n should be in the range [1,$ubound]."
        exit 1
fi
awk -v RS='-o-' 'NR == '$1' { print }' $2

:sh foo.sh 5q foo
Warning: n should be in the range [1,4].

:sh foo.sh 1 foo
blaaat
foo bar
some text here

ghostdog74 · 08-19-2007, 06:09 PM

Quote:

Originally Posted by digiot

I've been completely brain-damaged lately but, still, if you're referring to this:

Code:

awk -v RS='-o-' 'NR == '$1' { print }' input_file

there is no syntax error and it works under both gawk and mawk ..

Code:

# more file
blaaat
foo bar
some text here
-o-
more text
more foo's
and even more bar
-o-
.<only a dot>
-o-
# awk -v RS='-o-' 'NR == '$1' { print }' file
awk: NR ==  { print }
awk:        ^ syntax error
# awk --version
GNU Awk 3.1.5
Copyright (C) 1989, 1991-2005 Free Software Foundation.

slakmagik · 08-19-2007, 06:19 PM

So give $1 a value by passing in an argument.

ghostdog74 · 08-19-2007, 08:18 PM

Quote:

Originally Posted by digiot

So give $1 a value by passing in an argument.

my bad. thought it was awk's $1

slakmagik · 08-19-2007, 09:15 PM

Ah, yeah, that's easy to do. One of the most bothersome things about mixing shell and awk.

xaverius · 08-20-2007, 04:41 AM

This is what the final script looks like:

Code:

#!/bin/bash

file=~/.forumpje.txt
EOF="-o-"

if [ $# != 1 ]; then
        echo "Syntax: $0 <nr>"
        exit 1
fi

ubound=`cat $file | grep -c -e -o-`
test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null
if [ $? = 0 ]; then
        echo "Waarschuwing: de waarde van nr moet in het interval [1,$ubound] gelegen zijn."
        exit 1
fi

awk -v RS="$EOF" 'NR == '$1' { print }' $file

I think I don't really understand the '$1'-part, but it works, so that's okay

It's tested on GNU awk 3.0.4

Can someone explain the awk rule please?
If the current record = $1, then print this line - but what is the value of $1? Is it supposed to be the bash $1, or the awk $1? It's not really clear to me... The -v flag is used to set the recordseperator (RS) I guess...