LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Tags used in this thread
Popular LQ Tags , , ,

Reply
 
Thread Tools
Old 05-24-2009, 01:26 PM   #1
ENRIQUESTEFANINI
Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware 12.0
Posts: 114
Thanked: 0
Converting html to ascii?


[Log in to get rid of this advertisement]
Hi:
GNU, Slack 12.
Is there a gnu utility to transform html to plain ascii?
Any sgestion would be gladly received. Thanks for reading.
ENRIQUESTEFANINI is offline  
Tag This Post , ,
Reply With Quote
Old 05-24-2009, 02:17 PM   #2
bathory
Guru
 
Registered: Jun 2004
Location: Piraeus
Distribution: Slackware
Posts: 5,040
Thanked: 222
You can use lynx:
Code:
lynx -dump file.html > file.txt
bathory is offline     Reply With Quote
Old 05-24-2009, 03:25 PM   #3
knudfl
Senior Member
 
Registered: Jan 2008
Location: Copenhagen, Denmark
Distribution: pclos2009.2, slack13, Debian Lenny (+30 others, for test only)
Posts: 2,914
Thanked: 268
I prefer this script

http://comp.eonworks.com/scripts/
http://comp.eonworks.com/scripts/html2txt
http://comp.eonworks.com/scripts/html2txt.gz

Simpler command : 'html2txt <file.html>'
will save the text file as <file.txt>
.....
Attached Files
File Type: txt html2txt.txt (3.0 KB, 4 views)
knudfl is offline  
Tag This Post
Reply With Quote
Old 05-24-2009, 05:32 PM   #4
ENRIQUESTEFANINI
Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware 12.0
Posts: 114
Thanked: 0

Original Poster
Thank you so much. I made 'slocate -i html2' and got no results, but I didn't make a complete installation. Your script ends up doing same thing as bathory's command but, not only is it very didactic but also more powerful. So, I'd
very much like to have a means of running it line by line (debugging). Is that possible? Regards.

E.S.
ENRIQUESTEFANINI is offline     Reply With Quote
Old 05-25-2009, 07:04 AM   #5
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 6,739
Blog Entries: 2
Thanked: 216
Why not just use a text editor search and replace ... that does it line-by-line ... takes forever, but it's line-by-line.
H_TeXMeX_H is offline     Reply With Quote
Old 05-25-2009, 09:13 AM   #6
AwesomeMachine
Senior Member
 
Registered: Jan 2005
Location: USA
Distribution: Debian Squeeze, SuSE 11.0, F8, F10, F11 x86_64
Posts: 1,130
Thanked: 16
There's a gizillion java jar files for stripping formatting out of html, and leaving the text. Seriously, html is text, albeit not plain English prose. It's the web browser that puts the spin on html. And I agree, when I want stripped out html, I use search and replace in a text editor. That leaves the text part roughly the way it was.
AwesomeMachine is offline     Reply With Quote
Old 05-25-2009, 07:07 PM   #7
ENRIQUESTEFANINI
Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware 12.0
Posts: 114
Thanked: 0

Original Poster
Very helpful of you. Now, having solved the HTML conversion problem, I'm now interested in script writing itself. And taking knudfl's script as a starting point, I would like to study it. For that purpose, being able to debug it would be great fun.
There's many tutorials around there. Also, the shell man page, which is huge. And I'm a little lazy or rather impatient. So, beginning some point at the middle, is what I
aim at. And the question? How to debug a script. Oh, yes. I know little more than nothing about shell scripts.

Regards.

Last edited by ENRIQUESTEFANINI; 05-25-2009 at 07:21 PM.. Reason: courtesy
ENRIQUESTEFANINI is offline  
Tag This Post
Reply With Quote
Old 05-26-2009, 01:06 PM   #8
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 6,739
Blog Entries: 2
Thanked: 216
Well there's the usually places to learn bash:

http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html
http://tldp.org/LDP/Bash-Beginners-Guide/html/
http://tldp.org/LDP/abs/html/
http://www.grymoire.com/Unix/

I learned mostly from 'Sams teach yourself shell programming in 24 hours', I don't really know too much about its status (free or otherwise), but I got it for free online.
H_TeXMeX_H is offline     Reply With Quote
Old 05-26-2009, 01:09 PM   #9
senseproof
Member
 
Registered: May 2009
Distribution: Fedora 10
Posts: 31
Blog Entries: 5
Thanked: 2
w3m -dump does the same like lynx but imho better.
senseproof is offline     Reply With Quote
Thanked by:
Old 05-26-2009, 01:51 PM   #10
ENRIQUESTEFANINI
Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware 12.0
Posts: 114
Thanked: 0

Original Poster
Thanks a lot. E.S.
ENRIQUESTEFANINI is offline     Reply With Quote
Old 05-26-2009, 08:14 PM   #11
chrism01
Guru
 
Registered: Aug 2004
Location: Brisbane
Distribution: Centos 5.4
Posts: 7,428
Thanked: 325
This is also very good: http://rute.2038bug.com/index.html.gz
chrism01 is online now     Reply With Quote
Old 05-28-2009, 12:09 PM   #12
ENRIQUESTEFANINI
Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware 12.0
Posts: 114
Thanked: 0

Original Poster
Thanks, chrism01.
ENRIQUESTEFANINI is offline     Reply With Quote

Reply

Bookmarks


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting extended ascii (ë,ô) in bash script Hko Programming 3 06-01-2009 10:03 AM
Assembly language? converting ascii to decmial matt123 Programming 6 04-27-2009 02:15 AM
Converting ISO-8859-1 to plain ASCII hsocasnavarro Linux - Software 9 12-18-2007 12:49 AM
Converting float to ascii in C C_to_be Programming 2 10-29-2007 07:41 PM
Converting ASCII to Binary? Darx Linux - Software 0 04-08-2004 11:43 AM


All times are GMT -5. The time now is 12:24 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
RSS2  LQ Podcast
RSS2  LQ Radio
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration