LinuxQuestions.org - [SOLVED] Need help extracting text from .htm files

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Need help extracting text from .htm files (https://www.linuxquestions.org/questions/linux-newbie-8/need-help-extracting-text-from-htm-files-865302/)

roBuntu1967

02-27-2011 07:55 AM

Need help extracting text from .htm files

I downloaded (using wget) almost 3000 .htm files from a dictionary web site. Now I want to write a script that will extract the text from these .htm files. I'm a total newbie with awk/sed/perl/grep. Any suggestions?

arizonagroovejet

02-27-2011 08:00 AM

There's a utility called html2text. It's probably available in the repos of whatever distro you're using. Probably the package is called html2text. You might even have it installed already

Code:

$ which html2text

roBuntu1967

03-07-2011 05:51 AM

Thanks

OK, I will try html2txt. Thanks!

knudfl

03-07-2011 06:24 AM

This version of html2txt works perfect. ( html2text doesn't.)

http://www.linuxquestions.org/questi...5&d=1269459223

All times are GMT -5. The time now is 11:19 AM.