ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
That may be tough... especially given the proprietary nature of .doc files, and how few programs are actually capable of reading them. RTF might be a little easier though (there's rtf-converter which I found with a quick search, which converts to HTML).
I've used "antiword" to convert from Word (.doc) to text files. It is available from <http://www.winfield.demon.nl/>. This has worked very well for me in the past.
Don't know of any similar utility for .RTF files - please post something if you find one!
If you have OpenOffice Writer (http://www.openoffice.nl) Installed, you can open the doc/rtf file in it and save it as html or a variety of other formats. As of now, I do not think there's a way to do it: from the command line. But I strongly recommend it, since the conversion is perfect. OpenOffice saves all the embedded image files, and formats everything correctly.
In fact, I converted some 150+ doc/rtf files in this way. It was time consuming, but the result was super.
I had previously tried AbiWord (http://www.abisource.com), but it was very very slow on my machine for some reason, in opening a doc/rtf file. However, I AbiWord has a plugin that permits AbiWord to be used from the command line (www.abisource.com/download/plugins.phtml)
Distribution: Kubuntu 14.04 (Dell Linux-preinstalled laptop + 2 other laptops)
Posts: 117
Rep:
Success! Use Abiword to convert .rtf, .doc on command-line
I have had success using Abiword for a command-line converter! You can even use it in a batch file.
This was surprising since I've always thought of this flagship word processor of the GNOME world as its GUI. But I found that all I needed to do was, for example,
Abiword will automatically pick the file format depending on what extension your output filename will have. It never enters a graphical mode; no window pops up or anything, and basically it behaves just like a command-line utility. Cool!
So a simple way to convert all *.doc files to *.txt might be:
Code:
#!/bin/sh
for Filename in *.doc
do
BaseFilename=${Filename%.*}
# The above removes from $Filename everything after the last dot,
# so "MyFile.doc" becomes "MyFile"
abiword "--to=$BaseFilename.txt" "$Filename.doc"
done
[The above script file was edited 2011-04-11; the previous version worked but had poor programming habits.]
Since Abiword is also available for MS Windows, presumably you can do something similar on MS Windows.
Last edited by KWTm; 04-11-2011 at 01:37 PM.
Reason: corrected sloppy script programming!
Distribution: Debian /Jessie/Stretch/Sid, Linux Mint DE
Posts: 5,195
Rep:
Quote:
Originally Posted by chrism01
and if you wanted to write your own I'd guess the src to OO-Writer would have what you need.
No, you can implement a macro in OOWriter and call it from the command line is almost the same way as using AbiWord.
Only with OOWriter the call to the macro on the command line is at least 120 characters long.
There are a zillion examples in the OOforums on how to paste this macro in OOWriter. Like this one (Note that this macro is for exporting to PDF. You have to choose a different filter for text, you get the idea anyway) They differ on some minor and incomprehensible details, and show different levels of failure depending on each different minor OOwriter version change.
In other words, OOwriter has the ability without recompiling, but given the impossible macro language you might be better off with AbiWord.
What I meant was that OO can read/write .doc (usually) and .rtf and .txt files, so you could actually take that source code and use it to write your own converter.
As in, copy the relevant C (C++ ?) routines and write your own converter program. I wasn't implying calling OO for anything.
Sorry if that wasn't clear.
I read previous articles, but i work in a project to converter RTF (with UTF8 characters) to TXT, without intermediate tools. I check in files rtf generate by wordpad, ms word 2003, ms word 2007, abiword and openoffice write, and i obtain the follow results:
1- in abiword is all good because inside the rtf file the character UTF8 is represented by a valid utf8 code (example \'53396 )
2- in wordpad, ms word 2003, ms word 2007 is all good because the russian character, is representanted by your code en codepage 1251 of windows (example \'c4)
the problem is in rtf archives generated by openoffice writer, the character utf8 is representated by two pair of character hexadecimal(example \'84\'7e), and i don't how interpretated it?.
i need too to converter the chinesse character of the windows's rtf editors.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.