LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Sed/Awk: print lines between n'th and (n+1)'th match of "foo" (https://www.linuxquestions.org/questions/programming-9/sed-awk-print-lines-between-nth-and-n-1-th-match-of-foo-577962/)

xaverius 08-18-2007 03:51 AM

Sed/Awk: print lines between n'th and (n+1)'th match of "foo"
 
I have a textfile, which may for example look like this:
Code:

blaaat
foo bar
some text here
-o-
more text
more foo's
and even more bar
-o-
.<only a dot>
-o-
...
....
...
...
...
-o-

There's a "^-o-$" between every 'record'.
The user enters a number (call it $1), my script should use sed and/or awk to print out the x-th record in the file.

This is what I've got so far, it's only input-checking actually:
Code:

ubound=`cat $file | grep -c -e -o-`
test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null
if [ $? = 0 ]; then
        echo "Warning: n should be in the range [1,$ubound]."
        exit 1
fi

Can anyone help me further on the sed/awk part please?

slakmagik 08-18-2007 04:13 AM

awk -v RS='-o-' 'NR == '$1' { print }' input_file

slakmagik 08-18-2007 04:44 AM

Quote:

Originally Posted by xaverius (Post 2862845)
Code:

ubound=`cat $file | grep -c -e -o-`
test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null
if [ $? = 0 ]; then
        echo "Warning: n should be in the range [1,$ubound]."
        exit 1
fi


Incidentally, why not
Code:

ubound=$(grep -c '^-o-$' $file)
if [[ $1 -lt 1 || $1 -gt $ubound ]]; then
        echo "Warning: n should be in the range [1,$ubound]."
        exit 1
fi


xaverius 08-18-2007 05:07 AM

Quote:

Originally Posted by digiot (Post 2862865)
Incidentally, why not
Code:

ubound=$(grep -c '^-o-$' $file)
if [[ $1 -lt 1 || $1 -gt $ubound ]]; then
        echo "Warning: n should be in the range [1,$ubound]."
        exit 1
fi


Because of the stderr-redirection: users might nog enter a real number, but rather something like "1q" or "foo"... doing it this way, the script will print a nice error message and just exit :)
edit2: I hadn't noticed the double [[/]] you used there... do they have the same effect?

edit: Why should I prefer your way of setting ubound above mine? (no offense ;) )
I agree it looks better, but are there other advantages? I never actually learned about the $( )-construction, so...

xaverius 08-18-2007 05:23 AM

Also, the solution you first advised does not seem to work correctly, here's a bash-session:
Code:

Macbook-2:~/ex_st xaverius$ cat getMessage
#!/bin/bash

file=~/.forumpje.txt
EOF="-o-"

if [ $# != 1 ]; then
        echo "Syntax: $0 <nr>"
        exit 1
fi

ubound=`cat $file | grep -c -e -o-`
test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null
if [ $? = 0 ]; then
        echo "Waarschuwing: de waarde van nr moet in het interval [1,$ubound] gelegen zijn."
        exit 1
fi

awk -v RS='-o-' 'NR == '$1' { print }' $file
Macbook-2:~/ex_st xaverius$ cat ~/.forumpje.txt

me
niets
blaat nieuwe regel enzo done
Fri Aug 17 15:13:46 CEST 2007
-o-
ikke
veel over weinig
niets dus ;-)
Fri Aug 17 15:14:01 CEST 2007
-o-
Macbook-2:~/ex_st xaverius$ getMessage 1

me
niets
blaat nieuwe regel enzo done
Fri Aug 17 15:13:46 CEST 2007

Macbook-2:~/ex_st xaverius$ getMessage 2
o
Macbook-2:~/ex_st xaverius$

Check out the last command, doesn't seem correct imo...

druuna 08-18-2007 06:16 AM

Hi,

I just tried (copy->paste) the test session in your previous post (#5), and it works.

I did notice something else: Your prompt says: Macbook-2. Are you trying this on Apple's OS X?
If so, check to see which awk is actually used and if it is posix compliant. Maybe you can use nawk instead.

Hope this helps.

xaverius 08-18-2007 07:01 AM

I have no idea which version of awk OSX is using, but I had access to another machine:
Code:

$ uname -a
SunOS <hostname> 5.8 Generic_108528-07 sun4u sparc SUNW,Ultra-4

The code runs smoothly there, and since this is the machine it's supposed to run on later, it's ok now :)
Thx!

Btw: still interested in a way to solve it using sed though :p

slakmagik 08-18-2007 03:55 PM

Quote:

Originally Posted by xaverius (Post 2862936)
I have no idea which version of awk OSX is using

You can get gawk for Macs but it may not be the default and awks other than gawk probably don't accept multichar regexes for RS.

Quote:

Originally Posted by xaverius (Post 2862874)
Because of the stderr-redirection: users might nog enter a real number, but rather something like "1q" or "foo"... doing it this way, the script will print a nice error message and just exit :)

Well, this is bash-specific, so maybe you're better off the way you had it. It's just that testing the return code with a test, when that's what test *does* kind of bothers me.
Code:

if [[ ! $1 =~ ^[0-9]+$ || $1 -lt 1 || $1 -gt $ubound ]]; then
        echo "Warning: n should be in the range [1,$ubound]."
        exit 1
fi

This will require one or more integers (and only integers) for the argument. || and -o both mean 'or', but work differently (or fail to work) with the test/[ builtin and the [[ command. And I just find || more readable than -o in general. The $(...) syntax is about the same as the `...` syntax, except that it nests better and is regarded as the more 'modern' way (though I still type `...` in interactive shells where possible). This is just me, though. The only important things were the removal of a useless cat in the assignment to 'ubound' and the combining two tests into one. I also tightened up the regex for grep (and I should have made it tighter for the (g)awk) (by anchoring it).

Quote:

Originally Posted by xaverius (Post 2862874)
edit: Why should I prefer your way of setting ubound above mine? (no offense ;) )

Anyway, no offense taken at all. :) My attitude is that, as long as it works, there's no reason to prefer one way over another. It's just that some things work in certain corner cases and fail in others, some things are microscopically more efficient than others (which can add up in certain scenarios), some things just look or feel better than others, etc. It's largely just a matter of taste, and that's why I asked 'why not' rather than said 'you should have' ;) Your point about input validation is a good one, when I wasn't initially thinking of even testing for non-integer input.

Quote:

Originally Posted by xaverius (Post 2862936)
Btw: still interested in a way to solve it using sed though :p

It's probably doable, but (g)awk is much better designed for this sort of task and using sed to that extent makes my head hurt.

ghostdog74 08-19-2007 05:12 AM

Quote:

Originally Posted by xaverius (Post 2862884)
Also, the solution you first advised does not seem to work correctly,

that's because the awk has a syntax error. The correct one should look something like this

Code:

# var=1
# awk -v input=$var 'BEGIN{RS="-o-"}NR==input{ print  }' file
blaaat
foo bar
some text here


slakmagik 08-19-2007 01:56 PM

Quote:

Originally Posted by ghostdog74 (Post 2863587)
that's because the awk has a syntax error. The correct one should look something like this

Code:

# var=1
# awk -v input=$var 'BEGIN{RS="-o-"}NR==input{ print  }' file
blaaat
foo bar
some text here


I've been completely brain-damaged lately but, still, if you're referring to this:
Code:

awk -v RS='-o-' 'NR == '$1' { print }' input_file
there is no syntax error and it works under both gawk and mawk (and surprisingly, to me, under original awk) and apparently under SunOS as well as Linux - dunno why it breaks on OSX, but it's OSX-specific (or 'whatever-OSX's-awk-is'-specific).
Code:

:cat foo.sh
ubound=$(grep -c '^-o-$' $2)
if [[ ! $1 =~ ^[0-9]+$ || $1 -lt 1 || $1 -gt $ubound ]]; then
        echo "Warning: n should be in the range [1,$ubound]."
        exit 1
fi
awk -v RS='-o-' 'NR == '$1' { print }' $2

:sh foo.sh 5q foo
Warning: n should be in the range [1,4].

:sh foo.sh 1 foo
blaaat
foo bar
some text here


ghostdog74 08-19-2007 06:09 PM

Quote:

Originally Posted by digiot (Post 2863919)
I've been completely brain-damaged lately but, still, if you're referring to this:
Code:

awk -v RS='-o-' 'NR == '$1' { print }' input_file
there is no syntax error and it works under both gawk and mawk ..

Code:

# more file
blaaat
foo bar
some text here
-o-
more text
more foo's
and even more bar
-o-
.<only a dot>
-o-
# awk -v RS='-o-' 'NR == '$1' { print }' file
awk: NR ==  { print }
awk:        ^ syntax error
# awk --version
GNU Awk 3.1.5
Copyright (C) 1989, 1991-2005 Free Software Foundation.


slakmagik 08-19-2007 06:19 PM

So give $1 a value by passing in an argument.

ghostdog74 08-19-2007 08:18 PM

Quote:

Originally Posted by digiot (Post 2864087)
So give $1 a value by passing in an argument.

my bad. thought it was awk's $1

slakmagik 08-19-2007 09:15 PM

Ah, yeah, that's easy to do. One of the most bothersome things about mixing shell and awk.

xaverius 08-20-2007 04:41 AM

This is what the final script looks like:
Code:

#!/bin/bash

file=~/.forumpje.txt
EOF="-o-"

if [ $# != 1 ]; then
        echo "Syntax: $0 <nr>"
        exit 1
fi

ubound=`cat $file | grep -c -e -o-`
test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null
if [ $? = 0 ]; then
        echo "Waarschuwing: de waarde van nr moet in het interval [1,$ubound] gelegen zijn."
        exit 1
fi

awk -v RS="$EOF" 'NR == '$1' { print }' $file

I think I don't really understand the '$1'-part, but it works, so that's okay :)

It's tested on GNU awk 3.0.4

Can someone explain the awk rule please?
If the current record = $1, then print this line - but what is the value of $1? Is it supposed to be the bash $1, or the awk $1? It's not really clear to me... The -v flag is used to set the recordseperator (RS) I guess...


All times are GMT -5. The time now is 03:28 PM.