Sed/Awk: print lines between n'th and (n+1)'th match of "foo"
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Sed/Awk: print lines between n'th and (n+1)'th match of "foo"
I have a textfile, which may for example look like this:
Code:
blaaat
foo bar
some text here
-o-
more text
more foo's
and even more bar
-o-
.<only a dot>
-o-
...
....
...
...
...
-o-
There's a "^-o-$" between every 'record'.
The user enters a number (call it $1), my script should use sed and/or awk to print out the x-th record in the file.
This is what I've got so far, it's only input-checking actually:
Code:
ubound=`cat $file | grep -c -e -o-`
test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null
if [ $? = 0 ]; then
echo "Warning: n should be in the range [1,$ubound]."
exit 1
fi
Can anyone help me further on the sed/awk part please?
ubound=`cat $file | grep -c -e -o-`
test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null
if [ $? = 0 ]; then
echo "Warning: n should be in the range [1,$ubound]."
exit 1
fi
Incidentally, why not
Code:
ubound=$(grep -c '^-o-$' $file)
if [[ $1 -lt 1 || $1 -gt $ubound ]]; then
echo "Warning: n should be in the range [1,$ubound]."
exit 1
fi
ubound=$(grep -c '^-o-$' $file)
if [[ $1 -lt 1 || $1 -gt $ubound ]]; then
echo "Warning: n should be in the range [1,$ubound]."
exit 1
fi
Because of the stderr-redirection: users might nog enter a real number, but rather something like "1q" or "foo"... doing it this way, the script will print a nice error message and just exit
edit2: I hadn't noticed the double [[/]] you used there... do they have the same effect?
edit: Why should I prefer your way of setting ubound above mine? (no offense )
I agree it looks better, but are there other advantages? I never actually learned about the $( )-construction, so...
Also, the solution you first advised does not seem to work correctly, here's a bash-session:
Code:
Macbook-2:~/ex_st xaverius$ cat getMessage
#!/bin/bash
file=~/.forumpje.txt
EOF="-o-"
if [ $# != 1 ]; then
echo "Syntax: $0 <nr>"
exit 1
fi
ubound=`cat $file | grep -c -e -o-`
test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null
if [ $? = 0 ]; then
echo "Waarschuwing: de waarde van nr moet in het interval [1,$ubound] gelegen zijn."
exit 1
fi
awk -v RS='-o-' 'NR == '$1' { print }' $file
Macbook-2:~/ex_st xaverius$ cat ~/.forumpje.txt
me
niets
blaat nieuwe regel enzo done
Fri Aug 17 15:13:46 CEST 2007
-o-
ikke
veel over weinig
niets dus ;-)
Fri Aug 17 15:14:01 CEST 2007
-o-
Macbook-2:~/ex_st xaverius$ getMessage 1
me
niets
blaat nieuwe regel enzo done
Fri Aug 17 15:13:46 CEST 2007
Macbook-2:~/ex_st xaverius$ getMessage 2
o
Macbook-2:~/ex_st xaverius$
Check out the last command, doesn't seem correct imo...
I just tried (copy->paste) the test session in your previous post (#5), and it works.
I did notice something else: Your prompt says: Macbook-2. Are you trying this on Apple's OS X?
If so, check to see which awk is actually used and if it is posix compliant. Maybe you can use nawk instead.
You can get gawk for Macs but it may not be the default and awks other than gawk probably don't accept multichar regexes for RS.
Quote:
Originally Posted by xaverius
Because of the stderr-redirection: users might nog enter a real number, but rather something like "1q" or "foo"... doing it this way, the script will print a nice error message and just exit
Well, this is bash-specific, so maybe you're better off the way you had it. It's just that testing the return code with a test, when that's what test *does* kind of bothers me.
Code:
if [[ ! $1 =~ ^[0-9]+$ || $1 -lt 1 || $1 -gt $ubound ]]; then
echo "Warning: n should be in the range [1,$ubound]."
exit 1
fi
This will require one or more integers (and only integers) for the argument. || and -o both mean 'or', but work differently (or fail to work) with the test/[ builtin and the [[ command. And I just find || more readable than -o in general. The $(...) syntax is about the same as the `...` syntax, except that it nests better and is regarded as the more 'modern' way (though I still type `...` in interactive shells where possible). This is just me, though. The only important things were the removal of a useless cat in the assignment to 'ubound' and the combining two tests into one. I also tightened up the regex for grep (and I should have made it tighter for the (g)awk) (by anchoring it).
Quote:
Originally Posted by xaverius
edit: Why should I prefer your way of setting ubound above mine? (no offense )
Anyway, no offense taken at all. My attitude is that, as long as it works, there's no reason to prefer one way over another. It's just that some things work in certain corner cases and fail in others, some things are microscopically more efficient than others (which can add up in certain scenarios), some things just look or feel better than others, etc. It's largely just a matter of taste, and that's why I asked 'why not' rather than said 'you should have' Your point about input validation is a good one, when I wasn't initially thinking of even testing for non-integer input.
Quote:
Originally Posted by xaverius
Btw: still interested in a way to solve it using sed though
It's probably doable, but (g)awk is much better designed for this sort of task and using sed to that extent makes my head hurt.
that's because the awk has a syntax error. The correct one should look something like this
Code:
# var=1
# awk -v input=$var 'BEGIN{RS="-o-"}NR==input{ print }' file
blaaat
foo bar
some text here
I've been completely brain-damaged lately but, still, if you're referring to this:
Code:
awk -v RS='-o-' 'NR == '$1' { print }' input_file
there is no syntax error and it works under both gawk and mawk (and surprisingly, to me, under original awk) and apparently under SunOS as well as Linux - dunno why it breaks on OSX, but it's OSX-specific (or 'whatever-OSX's-awk-is'-specific).
Code:
:cat foo.sh
ubound=$(grep -c '^-o-$' $2)
if [[ ! $1 =~ ^[0-9]+$ || $1 -lt 1 || $1 -gt $ubound ]]; then
echo "Warning: n should be in the range [1,$ubound]."
exit 1
fi
awk -v RS='-o-' 'NR == '$1' { print }' $2
:sh foo.sh 5q foo
Warning: n should be in the range [1,4].
:sh foo.sh 1 foo
blaaat
foo bar
some text here
I've been completely brain-damaged lately but, still, if you're referring to this:
Code:
awk -v RS='-o-' 'NR == '$1' { print }' input_file
there is no syntax error and it works under both gawk and mawk ..
Code:
# more file
blaaat
foo bar
some text here
-o-
more text
more foo's
and even more bar
-o-
.<only a dot>
-o-
# awk -v RS='-o-' 'NR == '$1' { print }' file
awk: NR == { print }
awk: ^ syntax error
# awk --version
GNU Awk 3.1.5
Copyright (C) 1989, 1991-2005 Free Software Foundation.
#!/bin/bash
file=~/.forumpje.txt
EOF="-o-"
if [ $# != 1 ]; then
echo "Syntax: $0 <nr>"
exit 1
fi
ubound=`cat $file | grep -c -e -o-`
test $1 -lt 1 -o $1 -gt $ubound 2>/dev/null
if [ $? = 0 ]; then
echo "Waarschuwing: de waarde van nr moet in het interval [1,$ubound] gelegen zijn."
exit 1
fi
awk -v RS="$EOF" 'NR == '$1' { print }' $file
I think I don't really understand the '$1'-part, but it works, so that's okay
It's tested on GNU awk 3.0.4
Can someone explain the awk rule please?
If the current record = $1, then print this line - but what is the value of $1? Is it supposed to be the bash $1, or the awk $1? It's not really clear to me... The -v flag is used to set the recordseperator (RS) I guess...
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.