LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   a .doc , .rtf converter (https://www.linuxquestions.org/questions/programming-9/a-doc-rtf-converter-57760/)

debdas 05-01-2003 11:57 PM

a .doc , .rtf converter
 
Can anyone suggest a converter for ms office .doc and .rtf files to a plain text file, that i can execute from my c program

I have searched for one but have not got a satisfactory one

wapcaplet 05-02-2003 09:11 AM

That may be tough... especially given the proprietary nature of .doc files, and how few programs are actually capable of reading them. RTF might be a little easier though (there's rtf-converter which I found with a quick search, which converts to HTML).

debdas 05-05-2003 06:13 AM

re:.doc converter
 
Thanks for the help...I have obtained a .doc converter that converts .doc to a text file.... but i have not come across a rtf to text converter yet...

I need .txt files for further processing in my project.

I hope i will come across a converter.

yrraja 05-05-2003 06:36 AM

From where did you get this .doc to .txt converter? Can you post the link.

gregory76 05-05-2003 06:53 AM

Hi all.

I've used "antiword" to convert from Word (.doc) to text files. It is available from <http://www.winfield.demon.nl/>. This has worked very well for me in the past.

Don't know of any similar utility for .RTF files - please post something if you find one!

yrraja 05-05-2003 07:23 AM

Is anyone aware of it reverse? I mean is there some utitlity that converts a text file into .doc file.

Microsoft COM interface for Word provides interfaces to write to a doc file but it is pretty tedious.

DoubleLetter 05-06-2003 03:32 AM

If you have OpenOffice Writer (http://www.openoffice.nl) Installed, you can open the doc/rtf file in it and save it as html or a variety of other formats. As of now, I do not think there's a way to do it: from the command line. But I strongly recommend it, since the conversion is perfect. OpenOffice saves all the embedded image files, and formats everything correctly.

In fact, I converted some 150+ doc/rtf files in this way. It was time consuming, but the result was super. :)

I had previously tried AbiWord (http://www.abisource.com), but it was very very slow on my machine for some reason, in opening a doc/rtf file. However, I AbiWord has a plugin that permits AbiWord to be used from the command line (www.abisource.com/download/plugins.phtml)

Regards, Ahsan

winner83 11-27-2006 12:52 AM

Actually, I’m using RTF TO XML Converter by Novosoft LLC. It is really easy to use and powerful converter.

firstfire 11-28-2006 02:02 AM

Hi!

Try `catdoc'. It can convert *.doc to plain text (without any formatting except tables). Maybe you can adapt sourcecode of catdoc for your purposes.

gnashley 11-28-2006 03:36 AM

You might also look at the sources for the word processor 'ted', which I believe includes a script for rtf-to-text conversion.

KWTm 10-10-2008 01:33 PM

Success! Use Abiword to convert .rtf, .doc on command-line
 
I have had success using Abiword for a command-line converter! You can even use it in a batch file.

This was surprising since I've always thought of this flagship word processor of the GNOME world as its GUI. But I found that all I needed to do was, for example,

abiword --to=NameOfFileToBeCreated.html NameOfOriginalFile.rtf

or

abiword --to=NameOfFileToBeCreated.txt NameOfOriginalFile.doc

Abiword will automatically pick the file format depending on what extension your output filename will have. It never enters a graphical mode; no window pops up or anything, and basically it behaves just like a command-line utility. Cool!

So a simple way to convert all *.doc files to *.txt might be:

Code:

#!/bin/sh
for Filename in *.doc
do
  BaseFilename=${Filename%.*}
  # The above removes from $Filename everything after the last dot,
  # so "MyFile.doc" becomes "MyFile"
  abiword "--to=$BaseFilename.txt" "$Filename.doc"
done

[The above script file was edited 2011-04-11; the previous version worked but had poor programming habits.]

Since Abiword is also available for MS Windows, presumably you can do something similar on MS Windows.

chrism01 10-10-2008 07:38 PM

and if you wanted to write your own I'd guess the src to OO-Writer would have what you need.

jlinkels 10-11-2008 09:06 AM

Quote:

Originally Posted by chrism01 (Post 3306529)
and if you wanted to write your own I'd guess the src to OO-Writer would have what you need.

No, you can implement a macro in OOWriter and call it from the command line is almost the same way as using AbiWord.

Only with OOWriter the call to the macro on the command line is at least 120 characters long.

There are a zillion examples in the OOforums on how to paste this macro in OOWriter. Like this one (Note that this macro is for exporting to PDF. You have to choose a different filter for text, you get the idea anyway) They differ on some minor and incomprehensible details, and show different levels of failure depending on each different minor OOwriter version change.

In other words, OOwriter has the ability without recompiling, but given the impossible macro language you might be better off with AbiWord.

jlinkels

chrism01 10-11-2008 10:05 PM

What I meant was that OO can read/write .doc (usually) and .rtf and .txt files, so you could actually take that source code and use it to write your own converter.
As in, copy the relevant C (C++ ?) routines and write your own converter program. I wasn't implying calling OO for anything.
Sorry if that wasn't clear.
:)

ubunTUX 03-04-2010 09:07 AM

converter of RTF to TXT
 
I read previous articles, but i work in a project to converter RTF (with UTF8 characters) to TXT, without intermediate tools. I check in files rtf generate by wordpad, ms word 2003, ms word 2007, abiword and openoffice write, and i obtain the follow results:

1- in abiword is all good because inside the rtf file the character UTF8 is represented by a valid utf8 code (example \'53396 )
2- in wordpad, ms word 2003, ms word 2007 is all good because the russian character, is representanted by your code en codepage 1251 of windows (example \'c4)

the problem is in rtf archives generated by openoffice writer, the character utf8 is representated by two pair of character hexadecimal(example \'84\'7e), and i don't how interpretated it?.

i need too to converter the chinesse character of the windows's rtf editors.

excuse my bad english, please.

i need help very soon.


All times are GMT -5. The time now is 05:51 PM.