[SOLVED] sed: alone [ /pattern/!d ] works; alone [ s ] works; together =don't work
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
sed: alone [ /pattern/!d ] works; alone [ s ] works; together =don't work
I try to parse an html code. The example source file:
Code:
root@bea # cat src
blah blah
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
surplus line
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>
some junk..
I want to get just the strings which I marked brown, and remove everything else.
The alone "d" command with inverted pattern address (delete everything except lines containing "myFunc(") works as expected. The alone "s" command works also as expected. But if I use them together (as two sed commands with "-e" switch, or as a single sed command, sub-commands concatenated by semicolon), it does not work:
Code:
root@bea # sed "/myFunc(/!d" src
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>
root@bea # sed "s#.*>\([^<]\+\)</a></p>$#\1#" src
blah blah
This is what I want
surplus line
Another wanted text
some junk..
root@bea # sed -e "/myFunc(/!d" -e "s#.*>\([^<]\+\)</a></p>$#\1#" src
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>
root@bea # sed -e "/myFunc(/!d;s#.*>\([^<]\+\)</a></p>$#\1#" src
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>
Well, there are known solutions to me:
Code:
root@bea # grep "myFunc(" src | sed "s#.*>\([^<]\+\)</a></p>$#\1#"
This is what I want
Another wanted text
root@bea # sed "/myFunc(/!d" src | sed "s#.*>\([^<]\+\)</a></p>$#\1#"
This is what I want
Another wanted text
root@bea # sed "s#.*>\([^<]\+\)</a></p>$#\1#;t;d" src
This is what I want
Another wanted text
I am just curious, why these sed commands work alone, but not together.
Well. Right after posting my question, I got an idea to try a little change in the "s#pattern...". I did a small change in this pattern, and voila! It works. The changed part of the "s" pattern is colored in magenta and bolded.
Code:
root@bea # sed -e "/myFunc(/!d" -e "s#.*>\([^<]\+\)</a></p>$#\1#" src
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>
root@bea # sed -e "/myFunc(/!d" -e "s#.*>\(.\+\)</a></p>$#\1#" src
This is what I want
Another wanted text
root@bea # sed -e "/myFunc(/!d;s#.*>\([^<]\+\)</a></p>$#\1#" src
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>
root@bea # sed -e "/myFunc(/!d;s#.*>\(.\+\)</a></p>$#\1#" src
This is what I want
Another wanted text
Though, in my opinion, the '[^<]\+' pattern should work too. Why it does not, while the '.\+' does work ?
So it seems you have been caught by a trap for young players
You will generally see most examples of sed use single quotes and the reason is to not allow for special characters to mess things up.
The one I see giving you trouble would be:
Code:
sed -e "/myFunc(/!d" -e "s#.*>\([^<]\+\)</a></p>$#\1#" src
In here you have the following - $#
In bash this is a special combination which I am surprised you got any output as mine throws an error because of it.
Changing your double quotes for single your example worked straightaway.
Here is a slight revision that helps get rid of the need for escaping too:
Yeah, I am caught by that nice toys called GNU tools ... yeah .. for windows!
Well, I admit, I did those test in windows (with GNU sed for win32) runinng in cmd, not in bash/Linux. Currently I have no access to my linux rig.
Aside from a different expansion mechanism between windows cmd and linux bash, the GNU sed itself should work the same way both in windows and in Linux, shouldn't it? After all, it is compiled from the same source, so the parsing algorithm should be the same. The special variable "$#" is not expanded by windows cmd, that's the reason, it didn't give me any strange error.
Now I tried the default separator character, the slash "/" for "s" command (of course escaping the slashes in closing html tags </a></p> by the backslash), and all forms gives exactly the same results as with the pound "#" separator. I even tried another separator, the at-sign "@", but again: nothing changed. "d" alone works, "s" alone works (regardless of separator character), together they do not work. Only with the simpler pattern for "s", they work together, regardless of the used separator character for "s" sed command.
OK, at the evening, I will try it in my OpenSuSE 11.3 setup.
For ghostdog74:
Thanks for suggestion, but currently I do not know awk at all. Maybe I will give it a try sometime in the future, if I will have some spare time (i.e. probably I never will manage it )
OK. Sorry guys for holding you back by such a silly freaky ideas, like scripting under Windows (even with GNU/win32).
After finishing my previous post, I got an idea of playing with the part of pattern, which causes these troubles. Shortly afterwards I found the cause. It seems to be completely unrelated to sed. It is a matter of REALLY DUMB expansion mechanism of windows cmd shell.
Here is the defective and the working variant, with the only difference (a single character "^") marked in magenta:
Code:
root@bea # sed -e "/myFunc(/!d" -e "s#.*>\([^<]\+\)</a></p>$#\1#" src
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>
root@bea # sed -e "/myFunc(/!d" -e "s#.*>\([^^<]\+\)</a></p>$#\1#" src
This is what I want
Another wanted text
Forget about it. Sorry for holding you back.
For site admin:
Feel free to remove that thread completely, if you want. It is not realted to Linux at all.
there is one other issue that you might encounter if you will ever want to execute this in bash/linux.
Code:
root@bea # sed -e "/myFunc(/!d" ...
^^ this will give you trouble
The exclamation mark has a special meaning in bash. It refers to the command history. So when parsed, bash will try to expand it to a command that you previously issued and that start with 'd'.
In this case the solution is simple. Either, you can use simple quotes as already suggested by grail or you can simply add a space after the '!'
Code:
root@bea # sed -e "/myFunc(/! d" ...
^^ I exaggerated the space; One whitespace is sufficient
For site admin:
Feel free to remove that thread completely, if you want. It is not related to Linux at all.
This is the perfect place for this thread, in the Programming forum. The Programming forum is for all sorts of programming-related questions, not necessarily Linux-related.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.