LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Need help extracting text from .htm files (https://www.linuxquestions.org/questions/linux-newbie-8/need-help-extracting-text-from-htm-files-865302/)

roBuntu1967 02-27-2011 07:55 AM

Need help extracting text from .htm files
 
I downloaded (using wget) almost 3000 .htm files from a dictionary web site. Now I want to write a script that will extract the text from these .htm files. I'm a total newbie with awk/sed/perl/grep. Any suggestions?

arizonagroovejet 02-27-2011 08:00 AM

There's a utility called html2text. It's probably available in the repos of whatever distro you're using. Probably the package is called html2text. You might even have it installed already


Code:

$ which html2text

roBuntu1967 03-07-2011 05:51 AM

Thanks
 
OK, I will try html2txt. Thanks!

knudfl 03-07-2011 06:24 AM

This version of html2txt works perfect. ( html2text doesn't.)

http://www.linuxquestions.org/questi...5&d=1269459223


All times are GMT -5. The time now is 11:19 AM.