Need help extracting text from .htm files
I downloaded (using wget) almost 3000 .htm files from a dictionary web site. Now I want to write a script that will extract the text from these .htm files. I'm a total newbie with awk/sed/perl/grep. Any suggestions?
|
There's a utility called html2text. It's probably available in the repos of whatever distro you're using. Probably the package is called html2text. You might even have it installed already
Code:
$ which html2text |
Thanks
OK, I will try html2txt. Thanks!
|
This version of html2txt works perfect. ( html2text doesn't.)
http://www.linuxquestions.org/questi...5&d=1269459223 |
All times are GMT -5. The time now is 11:19 AM. |