looking for html to text renderer that keeps links
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
looking for html to text renderer that keeps links
What I am looking for, is an html2text application, that can do something like this
example 1 (newsbeuter)
Code:
I’ve moved this post to be a page, but I didn’t want to delete the comments associated with this older post.
You can find the page [1]here.
Links:
[1]: http://www.psychocats.net/ubuntucat/livecdaward/ (link)
example 2 (mutt)
Code:
If this message is not displaying properly, please view the [1]online
version.
[2][IMG] [3][IMG] [4][IMG]
[6]Adobe
[7]23-09-08
YOU ARE ONE DAY AWAY FROM THE BIG DAY.
[5][IMG] Come back tomorrow at 11:00 CET (10:00 BST) [10][IMG]
to join us.
It's going to be brilliant.
[8]www.adobe.com/go/brilliantevent
[9]Adobe
[11][IMG] [12][IMG] [13][IMG]
This is an advertising message from Adobe Systems UK, its affiliates and
agents ("Adobe"), 3 Roundwood Ave, Stockley Park, Uxbridge, UB11 1AY
United Kingdom. If you'd prefer not to receive e-mail like this from Adobe
in the future, please [14]unsubscribe or send an e-mail to
[15]unsubscribe@adobe-direct.com. Alternatively, you may mail your
unsubscribe request to:
UNSUBSCRIBE
Adobe Direct
Postbus 20622
1001 NP Amsterdam
The Netherlands
Your privacy is important to us. Please review Adobe's online Privacy
Policy by clicking here:
[16]http://www.adobe.com/uk/misc/privacy.html.
Adobe and the Adobe logo are either registered trademarks or trademarks of
Adobe Systems Incorporated, in the United States and/or other countries.
[17][IMG]
References
Visible links
1. http://mail.adobe-direct.com/v?xJvJvlnTn
8. http://mail.adobe-direct.com/r?xJvJvEcvlncn
14. http://mail.adobe-direct.com/p2?xPPvPEcPlnTP
15. mailto:unsubscribe@adobe-direct.com
16. http://mail.adobe-direct.com/r?xJvJcPHJvlncT
I apologise for the length of the second example, even after I removed some text, but I think it is a good example, basically, what I would like is that said html to text renderer, kept the links in this organized manner, it would not stop the flow of reading, and would stay useful, by keeping links I could want to see later, please note however, that the newsbeuter approach is slightly better, because it also keeps links for images (even though that is not obvious in my example), thanks in advance for you help
actually, even before I posted that question, I was wondering why mutt was rendering the html like that, when (I though), it wasn't supposed to, now that you gave that answer, I got it, it's because of the way I have my .mailcap file set up "text/html; elinks -default-mime-type text/html -dump -dump-charset %{charset} %s; copiousoutput", it was elinks that was doing that, 'elinks -dump' works (almost) as I wanted to (no links to images), actually 'links -dump' isn't doing it for me, for some reason (it's not keeping the links), neither is w3m (it's doing the same as links), and neither is lynx (it's just outputting the htmls source code), all of them I tried simply with the '-dump' option, and nothing more, so maybe I am missing something there, but at least elinks is doing it almost the way I want, and that is the one I usually use (because of the tabbed browsing capabilities), thank you for your help
EDIT - searched a bit on links options, and only had to set '-html-numbered-links 1' and it worked, this time, even better than elinks, since it kept all links, even images, and not only that, it informs me what links are images, once again, thanks for your help
Last edited by onepostonly; 09-24-2008 at 02:45 PM.
Reason: making a correction
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.