LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-01-2003, 11:57 PM   #1
debdas
LQ Newbie
 
Registered: May 2003
Posts: 19

Rep: Reputation: 0
a .doc , .rtf converter


Can anyone suggest a converter for ms office .doc and .rtf files to a plain text file, that i can execute from my c program

I have searched for one but have not got a satisfactory one
 
Old 05-02-2003, 09:11 AM   #2
wapcaplet
LQ Guru
 
Registered: Feb 2003
Location: Colorado Springs, CO
Distribution: Gentoo
Posts: 2,018

Rep: Reputation: 48
That may be tough... especially given the proprietary nature of .doc files, and how few programs are actually capable of reading them. RTF might be a little easier though (there's rtf-converter which I found with a quick search, which converts to HTML).
 
Old 05-05-2003, 06:13 AM   #3
debdas
LQ Newbie
 
Registered: May 2003
Posts: 19

Original Poster
Rep: Reputation: 0
re:.doc converter

Thanks for the help...I have obtained a .doc converter that converts .doc to a text file.... but i have not come across a rtf to text converter yet...

I need .txt files for further processing in my project.

I hope i will come across a converter.
 
Old 05-05-2003, 06:36 AM   #4
yrraja
Member
 
Registered: Sep 2002
Distribution: RH, FC, Ubuntu, Solaris, AIX
Posts: 114

Rep: Reputation: 15
From where did you get this .doc to .txt converter? Can you post the link.
 
Old 05-05-2003, 06:53 AM   #5
gregory76
LQ Newbie
 
Registered: May 2003
Location: Aus
Distribution: Redhat 8.0
Posts: 10

Rep: Reputation: 0
Hi all.

I've used "antiword" to convert from Word (.doc) to text files. It is available from <http://www.winfield.demon.nl/>. This has worked very well for me in the past.

Don't know of any similar utility for .RTF files - please post something if you find one!
 
Old 05-05-2003, 07:23 AM   #6
yrraja
Member
 
Registered: Sep 2002
Distribution: RH, FC, Ubuntu, Solaris, AIX
Posts: 114

Rep: Reputation: 15
Is anyone aware of it reverse? I mean is there some utitlity that converts a text file into .doc file.

Microsoft COM interface for Word provides interfaces to write to a doc file but it is pretty tedious.
 
Old 05-06-2003, 03:32 AM   #7
DoubleLetter
Member
 
Registered: Jun 2001
Location: Sharjah, United Arab Emirates
Distribution: Mandrake Linux 10.1
Posts: 132

Rep: Reputation: 15
If you have OpenOffice Writer (http://www.openoffice.nl) Installed, you can open the doc/rtf file in it and save it as html or a variety of other formats. As of now, I do not think there's a way to do it: from the command line. But I strongly recommend it, since the conversion is perfect. OpenOffice saves all the embedded image files, and formats everything correctly.

In fact, I converted some 150+ doc/rtf files in this way. It was time consuming, but the result was super.

I had previously tried AbiWord (http://www.abisource.com), but it was very very slow on my machine for some reason, in opening a doc/rtf file. However, I AbiWord has a plugin that permits AbiWord to be used from the command line (www.abisource.com/download/plugins.phtml)

Regards, Ahsan
 
Old 11-27-2006, 12:52 AM   #8
winner83
LQ Newbie
 
Registered: Nov 2006
Posts: 1

Rep: Reputation: 0
Actually, I’m using RTF TO XML Converter by Novosoft LLC. It is really easy to use and powerful converter.
 
Old 11-28-2006, 02:02 AM   #9
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Hi!

Try `catdoc'. It can convert *.doc to plain text (without any formatting except tables). Maybe you can adapt sourcecode of catdoc for your purposes.
 
Old 11-28-2006, 03:36 AM   #10
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,928

Rep: Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612
You might also look at the sources for the word processor 'ted', which I believe includes a script for rtf-to-text conversion.
 
Old 10-10-2008, 01:33 PM   #11
KWTm
Member
 
Registered: Jan 2004
Distribution: Kubuntu 14.04 (Dell Linux-preinstalled laptop + 2 other laptops)
Posts: 117

Rep: Reputation: 21
Success! Use Abiword to convert .rtf, .doc on command-line

I have had success using Abiword for a command-line converter! You can even use it in a batch file.

This was surprising since I've always thought of this flagship word processor of the GNOME world as its GUI. But I found that all I needed to do was, for example,

abiword --to=NameOfFileToBeCreated.html NameOfOriginalFile.rtf

or

abiword --to=NameOfFileToBeCreated.txt NameOfOriginalFile.doc

Abiword will automatically pick the file format depending on what extension your output filename will have. It never enters a graphical mode; no window pops up or anything, and basically it behaves just like a command-line utility. Cool!

So a simple way to convert all *.doc files to *.txt might be:

Code:
#!/bin/sh
for Filename in *.doc
do
  BaseFilename=${Filename%.*}
  # The above removes from $Filename everything after the last dot, 
  # so "MyFile.doc" becomes "MyFile"
  abiword "--to=$BaseFilename.txt" "$Filename.doc"
done
[The above script file was edited 2011-04-11; the previous version worked but had poor programming habits.]

Since Abiword is also available for MS Windows, presumably you can do something similar on MS Windows.

Last edited by KWTm; 04-11-2011 at 01:37 PM. Reason: corrected sloppy script programming!
 
Old 10-10-2008, 07:38 PM   #12
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,355

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
and if you wanted to write your own I'd guess the src to OO-Writer would have what you need.
 
Old 10-11-2008, 09:06 AM   #13
jlinkels
LQ Guru
 
Registered: Oct 2003
Location: Bonaire, Leeuwarden
Distribution: Debian /Jessie/Stretch/Sid, Linux Mint DE
Posts: 5,195

Rep: Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043
Quote:
Originally Posted by chrism01 View Post
and if you wanted to write your own I'd guess the src to OO-Writer would have what you need.
No, you can implement a macro in OOWriter and call it from the command line is almost the same way as using AbiWord.

Only with OOWriter the call to the macro on the command line is at least 120 characters long.

There are a zillion examples in the OOforums on how to paste this macro in OOWriter. Like this one (Note that this macro is for exporting to PDF. You have to choose a different filter for text, you get the idea anyway) They differ on some minor and incomprehensible details, and show different levels of failure depending on each different minor OOwriter version change.

In other words, OOwriter has the ability without recompiling, but given the impossible macro language you might be better off with AbiWord.

jlinkels
 
Old 10-11-2008, 10:05 PM   #14
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,355

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
What I meant was that OO can read/write .doc (usually) and .rtf and .txt files, so you could actually take that source code and use it to write your own converter.
As in, copy the relevant C (C++ ?) routines and write your own converter program. I wasn't implying calling OO for anything.
Sorry if that wasn't clear.
 
Old 03-04-2010, 09:07 AM   #15
ubunTUX
LQ Newbie
 
Registered: Mar 2010
Posts: 1

Rep: Reputation: 0
Unhappy converter of RTF to TXT

I read previous articles, but i work in a project to converter RTF (with UTF8 characters) to TXT, without intermediate tools. I check in files rtf generate by wordpad, ms word 2003, ms word 2007, abiword and openoffice write, and i obtain the follow results:

1- in abiword is all good because inside the rtf file the character UTF8 is represented by a valid utf8 code (example \'53396 )
2- in wordpad, ms word 2003, ms word 2007 is all good because the russian character, is representanted by your code en codepage 1251 of windows (example \'c4)

the problem is in rtf archives generated by openoffice writer, the character utf8 is representated by two pair of character hexadecimal(example \'84\'7e), and i don't how interpretated it?.

i need too to converter the chinesse character of the windows's rtf editors.

excuse my bad english, please.

i need help very soon.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to DOC Converter? vivek.sonny.abraham Linux - Software 20 11-30-2011 03:36 PM
Converting *.doc or *.rtf to PDF files Paulo Góes Programming 7 12-08-2006 09:07 AM
how to convert wordpad(rtf) doc to linux text file? kpachopoulos Linux - General 2 10-28-2005 10:53 AM
command line utility to convert between formats like .doc, .sxw, .rtf and others? bigtpumped Linux - Software 1 09-12-2005 09:54 PM
Converting PDF to text, rtf doc format saurya_s Linux - Software 2 06-16-2005 01:48 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration