LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 09-23-2008, 04:34 PM   #1
onepostonly
LQ Newbie
 
Registered: Apr 2008
Posts: 6

Rep: Reputation: 0
looking for html to text renderer that keeps links


What I am looking for, is an html2text application, that can do something like this

example 1 (newsbeuter)
Code:
I’ve moved this post to be a page, but I didn’t want to delete the comments associated with this older post.

You can find the page [1]here.

Links:
[1]: http://www.psychocats.net/ubuntucat/livecdaward/ (link)
example 2 (mutt)
Code:
     If this message is not displaying properly, please view the [1]online
                                    version.
   [2][IMG]  [3][IMG]                                               [4][IMG]
                                                           [6]Adobe
                               [7]23-09-08
                 YOU ARE ONE DAY AWAY FROM THE BIG DAY.
   [5][IMG]    Come back tomorrow at 11:00 CET (10:00 BST)          [10][IMG]
                               to join us.
                       It's going to be brilliant.
                   [8]www.adobe.com/go/brilliantevent
                                                           [9]Adobe
   [11][IMG] [12][IMG]                                              [13][IMG]
   This is an advertising message from Adobe Systems UK, its affiliates and
   agents ("Adobe"), 3 Roundwood Ave, Stockley Park, Uxbridge, UB11 1AY
   United Kingdom. If you'd prefer not to receive e-mail like this from Adobe
   in the future, please [14]unsubscribe or send an e-mail to
   [15]unsubscribe@adobe-direct.com. Alternatively, you may mail your
   unsubscribe request to:

   UNSUBSCRIBE
   Adobe Direct
   Postbus 20622
   1001 NP Amsterdam
   The Netherlands

   Your privacy is important to us. Please review Adobe's online Privacy
   Policy by clicking here:
   [16]http://www.adobe.com/uk/misc/privacy.html.

   Adobe and the Adobe logo are either registered trademarks or trademarks of
   Adobe Systems Incorporated, in the United States and/or other countries.

   [17][IMG]

References

   Visible links
   1. http://mail.adobe-direct.com/v?xJvJvlnTn
   8. http://mail.adobe-direct.com/r?xJvJvEcvlncn
  14. http://mail.adobe-direct.com/p2?xPPvPEcPlnTP
  15. mailto:unsubscribe@adobe-direct.com
  16. http://mail.adobe-direct.com/r?xJvJcPHJvlncT
I apologise for the length of the second example, even after I removed some text, but I think it is a good example, basically, what I would like is that said html to text renderer, kept the links in this organized manner, it would not stop the flow of reading, and would stay useful, by keeping links I could want to see later, please note however, that the newsbeuter approach is slightly better, because it also keeps links for images (even though that is not obvious in my example), thanks in advance for you help
 
Old 09-23-2008, 05:14 PM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 27,279
Blog Entries: 54

Rep: Reputation: 2852Reputation: 2852Reputation: 2852Reputation: 2852Reputation: 2852Reputation: 2852Reputation: 2852Reputation: 2852Reputation: 2852Reputation: 2852Reputation: 2852
'links -dump proto://some/page'?
 
Old 09-24-2008, 02:28 PM   #3
onepostonly
LQ Newbie
 
Registered: Apr 2008
Posts: 6

Original Poster
Rep: Reputation: 0
actually, even before I posted that question, I was wondering why mutt was rendering the html like that, when (I though), it wasn't supposed to, now that you gave that answer, I got it, it's because of the way I have my .mailcap file set up "text/html; elinks -default-mime-type text/html -dump -dump-charset %{charset} %s; copiousoutput", it was elinks that was doing that, 'elinks -dump' works (almost) as I wanted to (no links to images), actually 'links -dump' isn't doing it for me, for some reason (it's not keeping the links), neither is w3m (it's doing the same as links), and neither is lynx (it's just outputting the htmls source code), all of them I tried simply with the '-dump' option, and nothing more, so maybe I am missing something there, but at least elinks is doing it almost the way I want, and that is the one I usually use (because of the tabbed browsing capabilities), thank you for your help

EDIT - searched a bit on links options, and only had to set '-html-numbered-links 1' and it worked, this time, even better than elinks, since it kept all links, even images, and not only that, it informs me what links are images, once again, thanks for your help

Last edited by onepostonly; 09-24-2008 at 02:45 PM. Reason: making a correction
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Firefox 3: From HTML Renderer To Information Broker LXer Syndicated Linux News 0 01-04-2007 02:21 PM
html relative links slzckboy Programming 4 05-03-2006 08:35 AM
html links from various sources rblampain Programming 2 11-24-2005 10:32 PM
how to convert text(html) back to html. d1l2w3 Linux - Software 4 04-08-2005 08:16 PM
HTML links in signatures? BajaNick LQ Suggestions & Feedback 3 12-20-2004 02:32 AM


All times are GMT -5. The time now is 07:27 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration