LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 09-01-2004, 11:44 AM   #1
ljqu_happy
LQ Newbie
 
Registered: Aug 2004
Posts: 18

Rep: Reputation: 0
problem in reading Microsoft word as a binary file


I need to open a Microsoft word document as a binary file, and read it, but i cann't find the relation of the code of this binary file with this word document, for example, i didn't know where is the end of word document.
I'm very puzzled that i can't find it from Microsoft's web. Who can help me with some documents on the Microsoft word coding format or with the method to find them? Thanks!
 
Old 09-01-2004, 11:47 AM   #2
ljqu_happy
LQ Newbie
 
Registered: Aug 2004
Posts: 18

Original Poster
Rep: Reputation: 0
Maybe a suggestion on other good forums can also do me a favor!
 
Old 09-01-2004, 01:43 PM   #3
jhorvath
Member
 
Registered: Sep 2002
Location: OH, USA
Distribution: 2.6.16-1.2096_FC5 #1
Posts: 245

Rep: Reputation: 30
perhaps the OpenOffice.org sourcecode could help? or maybe they have some docs?
 
Old 09-01-2004, 04:50 PM   #4
itsme86
Senior Member
 
Registered: Jan 2004
Location: Oregon, USA
Distribution: Slackware
Posts: 1,246

Rep: Reputation: 56
Try this page and look at the files for DOC:
http://www.wotsit.org/search.asp?s=binary
 
Old 09-02-2004, 06:35 AM   #5
mhearn
Guru
 
Registered: Nov 2002
Location: Durham, England
Distribution: Fedora Core 4
Posts: 1,565

Rep: Reputation: 47
I'm afraid you cannot read binary word DOC files precisely, this is a task which still eludes highly professional developers working on it for years. The format is *extremely* complex - if you have to ask, you probably can't read it.

There is a libwv but this is GPLd. It's also fairly basic. It may help though.
 
Old 09-02-2004, 09:48 AM   #6
ljqu_happy
LQ Newbie
 
Registered: Aug 2004
Posts: 18

Original Poster
Rep: Reputation: 0
to itsme86:
thank you for your help, I download "Microsoft Word 6.0 Binary File Format", and have read it for a few hours, but I think it's a bit difficult for me, because i'm a begginer. I will continue on reading it, and maybe I still need your help.
 
Old 09-02-2004, 09:57 AM   #7
ljqu_happy
LQ Newbie
 
Registered: Aug 2004
Posts: 18

Original Poster
Rep: Reputation: 0
to mhearn:
I think I understand your meanings. Yes, I know the format of word DOC file is *extremely* complex, so I don't want to know the every bit of the file.What I want to know is only where is the begin of the document, and where is the end, and if i insert a picture in the document, what changes will happen on the document.
 
Old 09-02-2004, 10:12 AM   #8
ljqu_happy
LQ Newbie
 
Registered: Aug 2004
Posts: 18

Original Poster
Rep: Reputation: 0
Now I use Microsoft office word 2003. I open the DOC file with Ultraedit, and after many tests, I found the begin of the document is 0x00000a00, and guess(only guess, I don't know the end) the document is stored in binary file in turn, but if I insert a picture or a control button or a macro, I didn't know what will happen, where is the message of these things, what changes happen on the document file. What I want is to know these things, to find the begin and the end of the document, and find the message of the picture, the other messages are not important to me.
 
Old 09-03-2004, 07:36 AM   #9
ljqu_happy
LQ Newbie
 
Registered: Aug 2004
Posts: 18

Original Poster
Rep: Reputation: 0
After reading "Microsoft Word 6.0 Binary File Format", I think now I have known much more about the binary file format of Microsoft word than before, I have found my questions' answers now. What I need to do is to find as many as docs about all versions of Microsoft word. However, thanks you for all your helps!
 
Old 09-03-2004, 09:46 AM   #10
ljqu_happy
LQ Newbie
 
Registered: Aug 2004
Posts: 18

Original Poster
Rep: Reputation: 0
Hello, everyone! I'm back for helps. I can't find the binary file format of Microsoft word 2002,2003. I have searched in google, and also gone to OpenOffice.org, but can't found anything useful. Is there somebody know that whether Microsoft Corporation have release this? Who can show me these files? Thanks!
 
Old 09-04-2004, 03:49 AM   #11
ljqu_happy
LQ Newbie
 
Registered: Aug 2004
Posts: 18

Original Poster
Rep: Reputation: 0
I'm crying Why nobody answer me? Is there somebody tell me whether Microsoft Corporation have released these documents?
 
Old 09-04-2004, 06:27 AM   #12
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5
Posts: 16,086

Rep: Reputation: 1994Reputation: 1994Reputation: 1994Reputation: 1994Reputation: 1994Reputation: 1994Reputation: 1994Reputation: 1994Reputation: 1994Reputation: 1994Reputation: 1994
You might find the src to this: http://search.cpan.org/~janpaz/Docse...n/docclient.pl useful.
See also Perl module WIN32::OLE , http://www.adp-gmbh.ch/perl/word.html , http://www.znark.com/tech/resumeword.html .
 
Old 09-06-2004, 04:40 AM   #13
mhearn
Guru
 
Registered: Nov 2002
Location: Durham, England
Distribution: Fedora Core 4
Posts: 1,565

Rep: Reputation: 47
The last publically available specs for the office file formats were for Office 97.

I'm afraid the address you found for the start of the document will change from file to file. Your best bet is libwv.
 
Old 09-07-2004, 05:34 AM   #14
ljqu_happy
LQ Newbie
 
Registered: Aug 2004
Posts: 18

Original Poster
Rep: Reputation: 0
Thank mhearn's help, but what's libwv, where can I get it?
As you say, I haven't found any document for Office versions after 97.
 
Old 09-07-2004, 08:00 AM   #15
mhearn
Guru
 
Registered: Nov 2002
Location: Durham, England
Distribution: Fedora Core 4
Posts: 1,565

Rep: Reputation: 47
http://wvware.sourceforge.net/
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Microsoft Word won't word wrap Micro420 General 1 06-13-2005 04:36 PM
Problem in reading/writing binary data in Linux esi-eric Linux - Hardware 3 07-20-2004 04:21 PM
Microsoft Word:WordArt::OpenOffice:? piggysmile Linux - Software 5 07-12-2004 06:59 AM
Microsoft Word compatible document vasanthraghavan Linux - Software 6 07-01-2004 02:29 PM
using microsoft word in wine santasballz Linux - Newbie 5 02-26-2004 07:13 PM


All times are GMT -5. The time now is 01:35 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration