LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 05-24-2009, 12:26 PM   #1
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 4,442

Rep: Reputation: 76
Converting html to ascii?


Hi:
GNU, Slack 12.
Is there a gnu utility to transform html to plain ascii?
Any sgestion would be gladly received. Thanks for reading.
 
Old 05-24-2009, 01:17 PM   #2
bathory
LQ Guru
 
Registered: Jun 2004
Location: Piraeus
Distribution: Slackware
Posts: 13,074

Rep: Reputation: 1971Reputation: 1971Reputation: 1971Reputation: 1971Reputation: 1971Reputation: 1971Reputation: 1971Reputation: 1971Reputation: 1971Reputation: 1971Reputation: 1971
You can use lynx:
Code:
lynx -dump file.html > file.txt
 
Old 05-24-2009, 02:25 PM   #3
knudfl
LQ 5k Club
 
Registered: Jan 2008
Location: Copenhagen DK
Distribution: PCLinuxOS2023 CentOS7.9 + 50+ other Linux OS, for test only.
Posts: 17,486

Rep: Reputation: 3635Reputation: 3635Reputation: 3635Reputation: 3635Reputation: 3635Reputation: 3635Reputation: 3635Reputation: 3635Reputation: 3635Reputation: 3635Reputation: 3635
I prefer this script

http://comp.eonworks.com/scripts/
http://comp.eonworks.com/scripts/html2txt
http://comp.eonworks.com/scripts/html2txt.gz

Simpler command : 'html2txt <file.html>'
will save the text file as <file.txt>
.....
Attached Files
File Type: txt html2txt.txt (3.0 KB, 17 views)
 
Old 05-24-2009, 04:32 PM   #4
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 4,442

Original Poster
Rep: Reputation: 76
Thank you so much. I made 'slocate -i html2' and got no results, but I didn't make a complete installation. Your script ends up doing same thing as bathory's command but, not only is it very didactic but also more powerful. So, I'd
very much like to have a means of running it line by line (debugging). Is that possible? Regards.

E.S.
 
Old 05-25-2009, 06:04 AM   #5
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Why not just use a text editor search and replace ... that does it line-by-line ... takes forever, but it's line-by-line.
 
Old 05-25-2009, 08:13 AM   #6
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,521

Rep: Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015
There's a gizillion java jar files for stripping formatting out of html, and leaving the text. Seriously, html is text, albeit not plain English prose. It's the web browser that puts the spin on html. And I agree, when I want stripped out html, I use search and replace in a text editor. That leaves the text part roughly the way it was.
 
Old 05-25-2009, 06:07 PM   #7
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 4,442

Original Poster
Rep: Reputation: 76
Very helpful of you. Now, having solved the HTML conversion problem, I'm now interested in script writing itself. And taking knudfl's script as a starting point, I would like to study it. For that purpose, being able to debug it would be great fun.
There's many tutorials around there. Also, the shell man page, which is huge. And I'm a little lazy or rather impatient. So, beginning some point at the middle, is what I
aim at. And the question? How to debug a script. Oh, yes. I know little more than nothing about shell scripts.

Regards.

Last edited by stf92; 05-25-2009 at 06:21 PM. Reason: courtesy
 
Old 05-26-2009, 12:06 PM   #8
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Well there's the usually places to learn bash:

http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html
http://tldp.org/LDP/Bash-Beginners-Guide/html/
http://tldp.org/LDP/abs/html/
http://www.grymoire.com/Unix/

I learned mostly from 'Sams teach yourself shell programming in 24 hours', I don't really know too much about its status (free or otherwise), but I got it for free online.
 
Old 05-26-2009, 12:09 PM   #9
senseproof
Member
 
Registered: May 2009
Distribution: Fedora 10
Posts: 31
Blog Entries: 5

Rep: Reputation: 16
w3m -dump does the same like lynx but imho better.
 
Old 05-26-2009, 12:51 PM   #10
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 4,442

Original Poster
Rep: Reputation: 76
Thanks a lot. E.S.
 
Old 05-26-2009, 07:14 PM   #11
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 18,239

Rep: Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712
This is also very good: http://rute.2038bug.com/index.html.gz
 
Old 05-28-2009, 11:09 AM   #12
stf92
Senior Member
 
Registered: Apr 2007
Location: Buenos Aires.
Distribution: Slackware
Posts: 4,442

Original Poster
Rep: Reputation: 76
Thanks, chrism01.
 
  


Reply

Tags
ascii, html, scripts


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting extended ascii (ë,ô) in bash script Hko Programming 4 12-29-2012 03:42 AM
Assembly language? converting ascii to decmial matt123 Programming 6 04-27-2009 01:15 AM
Converting ISO-8859-1 to plain ASCII hsocasnavarro Linux - Software 9 12-17-2007 11:49 PM
Converting float to ascii in C C_to_be Programming 2 10-29-2007 06:41 PM
Converting ASCII to Binary? Darx Linux - Software 0 04-08-2004 10:43 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 05:03 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration