LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   sed regex match (http://www.linuxquestions.org/questions/linux-newbie-8/sed-regex-match-4175430656/)

ted_chou12 10-05-2012 06:30 AM

sed regex match
 
I don't get why this matches the text of the upper but not the lower example
Code:

torrentlink=($(sed -rn '/target=/s/.*href="([^"]+)".*>.*\.torrent.*/\1/p' "$html"))
                                torrentname=($(sed -rn "/target=/s/.*>(.*)\.torrent.*/\1/p" "$html"))

Text to be matched:
Code:

<a href="forum.php?mod=attachment&amp;aid=NzkxNzl8ZGZhMjViZjB8MTM0OTQzMjkyOXw5MDM3MnwxODkxNTQ%3D" target="_blank">[DMG][Dog Days`][04][1280x720][BIG5].mp4.torrent</a>
<a href="forum.php?mod=attachment&amp;aid=ODIzNDV8Nzc2OTFhZmR8MTM0OTQzMjk2N3w5MDM3MnwxNzM5ODI%3D" target="_blank">[dmfans][Shirokuma_Cafe][27][848480][BIG5].rmvb.torrent</a>

The two are completely identical in terms of the matching but it matches only the first but not the second line, I wonder if this is some kind fo a bug.
Thanks,
Ted

henrycoffin 10-05-2012 10:19 AM

I might be wrong but I think that by default sed will only match the first instance! try adding a g on the end to make it a global search!!!

David the H. 10-07-2012 03:45 PM

You've marked this solved, but didn't explain why. Did you figure it out? And it would've probably helped if you posted the output you got, as well as what you wanted to get.

As far as I can see, the regex isn't a problem, really. But the way you're setting the arrays is. The command substitution will split on whitespace (unless you reset IFS to avoid it). But that would affect the first line and not the second.

I suggest using mapfile instead in any case (assuming bash). It's much safer and cleaner.

We can probably also simplify the sed expressions a bit.

Code:

mapfile -t torrentlink < <( sed -rn '/target=/ { s/.*href="// ; s/".*//p }' <"$html" )
mapfile -t torrentname < <( sed -rn '/target=/ { s/[^>]+>// ; s/[.]torrent<.*//p }' <"$html" )

#test print the lines:
printf '(%s)\n' "${torrentlink[@]}"
echo
printf '(%s)\n' "${torrentname[@]}"

This is the output I get for the above:
Code:

(forum.php?mod=attachment&amp;aid=NzkxNzl8ZGZhMjViZjB8MTM0OTQzMjkyOXw5MDM3MnwxODkxNTQ%3D)
(forum.php?mod=attachment&amp;aid=ODIzNDV8Nzc2OTFhZmR8MTM0OTQzMjk2N3w5MDM3MnwxNzM5ODI%3D)

([DMG][Dog Days`][04][1280x720][BIG5].mp4)
([dmfans][Shirokuma_Cafe][27][848480][BIG5].rmvb)



All times are GMT -5. The time now is 10:14 PM.