shell script to remove tags
Can somebody give me a shell script which will remove all tags and scripts(java script/vb script code) from an HTML file ?
This might look like a homework question, but I am a c++ programmer and I need to bipass these things . |
There are several programs available that will convert html to plain text; html2text and unhtml are two that I know of. There are doubtless many more out there on the web.
If this doesn't satisfy your requirements then you'll have to be more specific about what you want to do. |
Quote:
Otherwise, provide sample input and output data, and what you've written/done so far, and where you're stuck, and we can try to help. If you're a C++ programmer, you should know these things are needed to get any help with code writing. Also, based on your posting history, you've done lots with bash scripting in the past...this should be trivial for you to do. |
'html2text' : I would usually prefer "the other one" : html2txt
http://www.linuxquestions.org/questi...6&d=1292610942 |
All times are GMT -5. The time now is 11:28 PM. |