LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-23-2010, 03:36 AM   #1
klenot
LQ Newbie
 
Registered: Sep 2010
Location: Prague, Czech Republic
Posts: 13

Rep: Reputation: 0
sed: alone [ /pattern/!d ] works; alone [ s ] works; together =don't work


I try to parse an html code. The example source file:

Code:

root@bea # cat  src
blah blah
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
surplus line
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>
some junk..
I want to get just the strings which I marked brown, and remove everything else.

The alone "d" command with inverted pattern address (delete everything except lines containing "myFunc(") works as expected. The alone "s" command works also as expected. But if I use them together (as two sed commands with "-e" switch, or as a single sed command, sub-commands concatenated by semicolon), it does not work:

Code:

root@bea # sed  "/myFunc(/!d"  src
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>

root@bea # sed  "s#.*>\([^<]\+\)</a></p>$#\1#"  src
blah blah
This is what I want
surplus line
Another wanted text
some junk..

root@bea # sed  -e "/myFunc(/!d"  -e "s#.*>\([^<]\+\)</a></p>$#\1#"  src
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>

root@bea # sed  -e "/myFunc(/!d;s#.*>\([^<]\+\)</a></p>$#\1#"  src
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>
Well, there are known solutions to me:

Code:

root@bea # grep  "myFunc("  src     |  sed  "s#.*>\([^<]\+\)</a></p>$#\1#"
This is what I want
Another wanted text

root@bea # sed  "/myFunc(/!d"  src  |  sed  "s#.*>\([^<]\+\)</a></p>$#\1#"
This is what I want
Another wanted text

root@bea # sed  "s#.*>\([^<]\+\)</a></p>$#\1#;t;d"  src
This is what I want
Another wanted text
I am just curious, why these sed commands work alone, but not together.

Last edited by klenot; 10-23-2010 at 03:58 AM.
 
Old 10-23-2010, 03:54 AM   #2
klenot
LQ Newbie
 
Registered: Sep 2010
Location: Prague, Czech Republic
Posts: 13

Original Poster
Rep: Reputation: 0
More curious findings...

Well. Right after posting my question, I got an idea to try a little change in the "s#pattern...". I did a small change in this pattern, and voila! It works. The changed part of the "s" pattern is colored in magenta and bolded.

Code:

root@bea # sed  -e "/myFunc(/!d"  -e "s#.*>\([^<]\+\)</a></p>$#\1#"  src
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>

root@bea # sed  -e "/myFunc(/!d"  -e "s#.*>\(.\+\)</a></p>$#\1#"  src
This is what I want
Another wanted text

root@bea # sed  -e "/myFunc(/!d;s#.*>\([^<]\+\)</a></p>$#\1#"  src
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>

root@bea # sed  -e "/myFunc(/!d;s#.*>\(.\+\)</a></p>$#\1#"  src
This is what I want
Another wanted text
Though, in my opinion, the '[^<]\+' pattern should work too. Why it does not, while the '.\+' does work ?

Last edited by klenot; 10-23-2010 at 03:56 AM.
 
Old 10-23-2010, 04:01 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
So it seems you have been caught by a trap for young players

You will generally see most examples of sed use single quotes and the reason is to not allow for special characters to mess things up.
The one I see giving you trouble would be:
Code:
sed  -e "/myFunc(/!d"  -e "s#.*>\([^<]\+\)</a></p>$#\1#"  src
In here you have the following - $#

In bash this is a special combination which I am surprised you got any output as mine throws an error because of it.
Changing your double quotes for single your example worked straightaway.

Here is a slight revision that helps get rid of the need for escaping too:
Code:
sed -r '/myFunc/!d;s/.*>([^<]+).*/\1/' src
 
1 members found this post helpful.
Old 10-23-2010, 05:07 AM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
and with awk,

Code:
awk -vRS="</a>" -F">" '/href/{print $NF}' file
 
Old 10-23-2010, 05:53 AM   #5
klenot
LQ Newbie
 
Registered: Sep 2010
Location: Prague, Czech Republic
Posts: 13

Original Poster
Rep: Reputation: 0
OK, thanks for your input guys.

For grail:

Yeah, I am caught by that nice toys called GNU tools ... yeah .. for windows!

Well, I admit, I did those test in windows (with GNU sed for win32) runinng in cmd, not in bash/Linux. Currently I have no access to my linux rig.

Aside from a different expansion mechanism between windows cmd and linux bash, the GNU sed itself should work the same way both in windows and in Linux, shouldn't it? After all, it is compiled from the same source, so the parsing algorithm should be the same. The special variable "$#" is not expanded by windows cmd, that's the reason, it didn't give me any strange error.

Now I tried the default separator character, the slash "/" for "s" command (of course escaping the slashes in closing html tags </a></p> by the backslash), and all forms gives exactly the same results as with the pound "#" separator. I even tried another separator, the at-sign "@", but again: nothing changed. "d" alone works, "s" alone works (regardless of separator character), together they do not work. Only with the simpler pattern for "s", they work together, regardless of the used separator character for "s" sed command.

OK, at the evening, I will try it in my OpenSuSE 11.3 setup.

For ghostdog74:

Thanks for suggestion, but currently I do not know awk at all. Maybe I will give it a try sometime in the future, if I will have some spare time (i.e. probably I never will manage it )

Last edited by klenot; 10-23-2010 at 06:15 AM.
 
Old 10-23-2010, 06:12 AM   #6
klenot
LQ Newbie
 
Registered: Sep 2010
Location: Prague, Czech Republic
Posts: 13

Original Poster
Rep: Reputation: 0
Solved

Ehh.. argh.. cough .. uhh.

OK. Sorry guys for holding you back by such a silly freaky ideas, like scripting under Windows (even with GNU/win32).

After finishing my previous post, I got an idea of playing with the part of pattern, which causes these troubles. Shortly afterwards I found the cause. It seems to be completely unrelated to sed. It is a matter of REALLY DUMB expansion mechanism of windows cmd shell.

Here is the defective and the working variant, with the only difference (a single character "^") marked in magenta:

Code:

root@bea # sed  -e "/myFunc(/!d"  -e "s#.*>\([^<]\+\)</a></p>$#\1#"  src
<p><a href=xx onClick="myFunc('xx');">This is what I want</a></p>
<p><a href=xx onClick="myFunc('xx');">Another wanted text</a></p>

root@bea # sed  -e "/myFunc(/!d"  -e "s#.*>\([^^<]\+\)</a></p>$#\1#"  src
This is what I want
Another wanted text
Forget about it. Sorry for holding you back.

For site admin:
Feel free to remove that thread completely, if you want. It is not realted to Linux at all.

Last edited by klenot; 10-23-2010 at 06:16 AM.
 
Old 10-23-2010, 09:11 AM   #7
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
FYI

Hi,

there is one other issue that you might encounter if you will ever want to execute this in bash/linux.
Code:
root@bea # sed  -e "/myFunc(/!d"  ...
                             ^^ this will give you trouble
The exclamation mark has a special meaning in bash. It refers to the command history. So when parsed, bash will try to expand it to a command that you previously issued and that start with 'd'.
In this case the solution is simple. Either, you can use simple quotes as already suggested by grail or you can simply add a space after the '!'
Code:
root@bea # sed  -e "/myFunc(/!   d"  ...
                              ^^ I exaggerated the space; One whitespace is sufficient

Last edited by crts; 10-23-2010 at 09:13 AM.
 
1 members found this post helpful.
Old 10-23-2010, 09:45 AM   #8
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 556Reputation: 556Reputation: 556Reputation: 556Reputation: 556Reputation: 556
Quote:
Originally Posted by klenot View Post
For site admin:
Feel free to remove that thread completely, if you want. It is not related to Linux at all.
This is the perfect place for this thread, in the Programming forum. The Programming forum is for all sorts of programming-related questions, not necessarily Linux-related.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Strangest FTP error on RHEL5. gFTP works. RHEL4 works too. GuitsBoy Linux - Software 1 04-10-2008 08:29 AM
SED script almosts works but with too many newlines.. cannontrodder Linux - General 2 01-16-2008 08:00 AM
XAWTV works, videodog works, motion works but how to code my own? rylan76 Linux - Hardware 0 01-06-2006 06:30 AM
SED command doesn't works fine on AIX JSnake AIX 4 07-01-2004 12:32 PM
Echo /devPrinting doesn't work, echo /usb/lp0 works, Testpage works, Printing doesn't Hegemon Linux - General 3 08-15-2002 01:13 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration