How to extract Text from RTF files (or even DOC)
I regularly get files in very broken RTF format that I need to do a quick and dirty conversion to HTML. My current process is:
1) Open file in Open Office. Mark all, Copy to clipboard.
2) Open new, blank file in vi. Paste text into file.
3) Run sed script against file to do some quick and dirty formatting.
Step three runs great, I am quite happy with it. I would like to eliminate or simplify steps one and two.
Is there a way to extract all the text from a document from a command line that will work in a similar fashion to what I describe.