How do I search for for text between difficult strings, like"yellow"/></w:rP..."
I am trying to capture and change text between two keyword strings in Perl, but the keyword strings are fragments of XML, mostly non-word characters. Thanks to replies to earlier posts, I managed to do it using real words, like "start" and "end" For example, this works.
while ($text=~ s/\bstart\b (.*?) \bend\b//) { print $1, "\n"; $term = "term_$1_term"; } However, my actual start keyword string is "yellow"/></w:rPr><w:t>VAR</w:t></w:r>" and the closing string is "<". Using lots of escape characters, I can find the two keywords like this: $text=~m/yellow\"\/\>\<\/w:rPr\>\<w:t\>/; $text=~m/\>/; But substituting these strings for "start" and "end" in the while statement (and removing the \b character) doesn't work. In other words, this doesn't work: while ($text=~s /yellow\"\/\>\<\/w:rPr\>\<w:t\>(.*?) \<//) { print $1, "\n"; $term = "term_$1_term"; } Can anyone tell me how to put these resistant strings into the while statement? |
The number of meta-characters to quote is large, and difficult to read. Consider something like (untested):
Code:
my $regexStart = quotemeta '"yellow"/></w:rPr><w:t>VAR</w:t></w:r>"'; # not sure which double-quotes you actually need, here --- rod |
Quote:
Anyway, if you are parsing HTML - just don't. I.e. use an existing parser: http://search.cpan.org/search?query=...arser&mode=all . And, I think, I've already given http://search.cpan.org/~adamk/Text-B...xt/Balanced.pm - this module is a generic solution for whatever BEGIN .. END markers - not just plain words |
Quote:
while ($text=~ s/\bstart\b (.*?) \bend\b//) Don't have a lot of insight into why this works, but it does. |
Quote:
|
Quote:
|
All times are GMT -5. The time now is 03:02 PM. |