LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   looking for html to text renderer that keeps links (https://www.linuxquestions.org/questions/linux-software-2/looking-for-html-to-text-renderer-that-keeps-links-671885/)

onepostonly 09-23-2008 04:34 PM

looking for html to text renderer that keeps links
 
What I am looking for, is an html2text application, that can do something like this

example 1 (newsbeuter)
Code:

I’ve moved this post to be a page, but I didn’t want to delete the comments associated with this older post.

You can find the page [1]here.

Links:
[1]: http://www.psychocats.net/ubuntucat/livecdaward/ (link)

example 2 (mutt)
Code:

    If this message is not displaying properly, please view the [1]online
                                    version.
  [2][IMG]  [3][IMG]                                              [4][IMG]
                                                          [6]Adobe
                              [7]23-09-08
                YOU ARE ONE DAY AWAY FROM THE BIG DAY.
  [5][IMG]    Come back tomorrow at 11:00 CET (10:00 BST)          [10][IMG]
                              to join us.
                      It's going to be brilliant.
                  [8]www.adobe.com/go/brilliantevent
                                                          [9]Adobe
  [11][IMG] [12][IMG]                                              [13][IMG]
  This is an advertising message from Adobe Systems UK, its affiliates and
  agents ("Adobe"), 3 Roundwood Ave, Stockley Park, Uxbridge, UB11 1AY
  United Kingdom. If you'd prefer not to receive e-mail like this from Adobe
  in the future, please [14]unsubscribe or send an e-mail to
  [15]unsubscribe@adobe-direct.com. Alternatively, you may mail your
  unsubscribe request to:

  UNSUBSCRIBE
  Adobe Direct
  Postbus 20622
  1001 NP Amsterdam
  The Netherlands

  Your privacy is important to us. Please review Adobe's online Privacy
  Policy by clicking here:
  [16]http://www.adobe.com/uk/misc/privacy.html.

  Adobe and the Adobe logo are either registered trademarks or trademarks of
  Adobe Systems Incorporated, in the United States and/or other countries.

  [17][IMG]

References

  Visible links
  1. http://mail.adobe-direct.com/v?xJvJvlnTn
  8. http://mail.adobe-direct.com/r?xJvJvEcvlncn
  14. http://mail.adobe-direct.com/p2?xPPvPEcPlnTP
  15. mailto:unsubscribe@adobe-direct.com
  16. http://mail.adobe-direct.com/r?xJvJcPHJvlncT

I apologise for the length of the second example, even after I removed some text, but I think it is a good example, basically, what I would like is that said html to text renderer, kept the links in this organized manner, it would not stop the flow of reading, and would stay useful, by keeping links I could want to see later, please note however, that the newsbeuter approach is slightly better, because it also keeps links for images (even though that is not obvious in my example), thanks in advance for you help

unSpawn 09-23-2008 05:14 PM

'links -dump proto://some/page'?

onepostonly 09-24-2008 02:28 PM

actually, even before I posted that question, I was wondering why mutt was rendering the html like that, when (I though), it wasn't supposed to, now that you gave that answer, I got it, it's because of the way I have my .mailcap file set up "text/html; elinks -default-mime-type text/html -dump -dump-charset %{charset} %s; copiousoutput", it was elinks that was doing that, 'elinks -dump' works (almost) as I wanted to (no links to images), actually 'links -dump' isn't doing it for me, for some reason (it's not keeping the links), neither is w3m (it's doing the same as links), and neither is lynx (it's just outputting the htmls source code), all of them I tried simply with the '-dump' option, and nothing more, so maybe I am missing something there, but at least elinks is doing it almost the way I want, and that is the one I usually use (because of the tabbed browsing capabilities), thank you for your help

EDIT - searched a bit on links options, and only had to set '-html-numbered-links 1' and it worked, this time, even better than elinks, since it kept all links, even images, and not only that, it informs me what links are images, once again, thanks for your help


All times are GMT -5. The time now is 11:20 PM.