LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   BASH get marked RegEx result: "foo s bar" | /foo(.*)bar/ 1 -> " s " (https://www.linuxquestions.org/questions/programming-9/bash-get-marked-regex-result-foo-s-bar-%7C-foo-%2A-bar-1-s-594044/)

hansschmucker 10-23-2007 06:58 PM

Solved: BASH get marked RegEx result: "foo s bar" | /foo(.*)bar/ 1 -> " s "
 
Hi everybody,

I did my best to try to describe my need in the title and hopefully somebody who already knows the answer is able to give me an answer.

A more lenghty description of my problem.

I have an input string which I get from a cURLed website:

Quote:

"One Two One Two One Two <div>Thread Title:Hello World</div> One Two One Two One Two"
Now, I want to get "Hello World" from this.

I can get the it including the "<div>Thread Title: ... </div>" part using pcregrep:

Code:

echo "$content" | pcregrep -o -e "<div>Thread Title:.*?<\/div>"
-> <div>Thread Title:Hello World</div>

But how can I only get Hello World? In Javascript I'd do
Code:

("One Two One Two One Two <div>Thread Title:Hello World</div> One Two One Two One Two").exec(/<div>Thread Title:(.*?)<\/div>/g)[1]
-> Hello World

But is there a tool that lets me do that on BASH? SED doesn't seem to be able to output anything but whole lines...

something like
Code:

echo $data|regexec "/<div>Thread Title:(.*?)<\/div>/" "1"
would be great!

Thank you in advance
Hans Schmucker
Mannheim
Germany

Tinkster 10-23-2007 07:04 PM

Just use the match as a replacement string ...

Code:

echo $data|sed -r "/<div>Thread Title:(.*?)<\/div>/\1/g"
I'm not 100% certain whether sed knows the ? quantifier, if it
doesn't, try

Code:

echo $data|sed -r "/<div>Thread Title:([^<]+)<\/div>/\1/g"

Cheers,
Tink

hansschmucker 10-23-2007 07:10 PM

Hmmm.... that doesn't work .... Sed complains about an unknown character "\", probably because there's no command, did you mean

Code:

echo $data|sed -r "s/<div>Thread Title:([^<]+)<\/div>/\1/g"
because that works, however it still prints the full line...

Tinkster 10-23-2007 07:26 PM

Errrh ... that was what I meant, and I'm having a blonde day :D
Code:

echo $data|sed -r "s/.*<div>Thread Title:([^<]+)<\/div>.*/\1/g"
Try that

hansschmucker 10-23-2007 07:29 PM

Ah you're matching against the whole line and then replacing .... clever, I didn't think of that...

Thank you very much and a special thanx for your patience :)

angrybanana 10-23-2007 08:33 PM

the expr command would also do this:
Code:

$ expr "$data" : ".*<div>Thread Title:\(.*\)<\/div>.*"
Hello World

My favorite way for something like that is Perl (if it's an option)

Code:

echo $data|perl -lne 'print $1 if /<div>Thread Title:(.*?)<\/div>/'
Hello World


hansschmucker 10-23-2007 10:34 PM

I found another interesting option, which is my favourite so far. I've found an archived EXE build (yeah, I'm under Windows right now, and while I'm running bash, mencoder and hundreds of other Linux applications that still means that building applications is a pain, so I have to resort to builds created by somebody else) of Spidermonkey (that's Mozilla Javascript engine). It's only 500k and has virtually no dependencies.
http://209.85.135.104/search?q=cache...e%3DJavaScript

All I need to do is something like this:
js -e "print((/Hello(.*?)World/).exec('Hello You World')[1]);"

Not quite as fast as SED, but a lot more comfortable for me...


All times are GMT -5. The time now is 01:01 AM.