Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Is there any way to "READ" OpenOffice .doc document?
I'm running LFS LiveCD to build LFS. The former runs Xfce as destop without word processing software built-in. My problem is how to read .doc documents stored on the HD. After mounting the corresponding partition on the HD I can retrieve .doc document but could not open/read it.
I'm not prepared to Remaster the LiveCD. Is there any solution?
I would suggest burning a Knoppix or other LiveCD that has OpenOffice......
I have other LiveCD with OO included available. The reason for me preferring LFS LiveCD is taking shorter time to boot because of the light-weight desktop 'xfce' . Besides it has the same fs as LFS 6.1 to be built. I found some opposite information about using Knoppix to build LFS. For such reasons I'm trying to solve the problem in making use of the packages included on the HD, the Host.
You mention "OpenOffice .doc document". Technically, .doc is an M$ format that OpenOffice.org supports. The native OpenOffice.org formats are all compressed XML (".sx?"). If by any chance you misspoke & actually need to read a .sxw file, then all you should have to do is unpack it before using your favorite/available text editor.
For everyone's (esp. mine) future reference, here is a table I copied & pasted from the OOo 1.1.3 help:
XML file format names
OpenOffice.org uses the following XML
Application File extension
OpenOffice.org Writer *.sxw
OpenOffice.org Writer templates *.stw
OpenOffice.org Calc *.sxc
OpenOffice.org Calc templates *.stc
OpenOffice.org Impress *.sxi
OpenOffice.org Impress templates *.sti
OpenOffice.org Draw *.sxd
OpenOffice.org Draw templates *.std
OpenOffice.org Math *.sxm
Master documents *.sxg
From a little further down in the "XML File Formats" section of OOo help:
XML file structure
The OpenOffice.org XML file formats are compressed according to the ZIP method. Use an unpacking program of your choice to unpack the content of an XML file with its subdirectories. You see a structure similar to the following illustration.
<could not paste image>
The text content of the document is located in content.xml.
By default, content.xml is stored without formatting elements like indentation or line breaks to minimize the time for saving and opening the document. On the Tools - Options - Load/Save - General tab page you can activate the use of indentations and line breaks by clearing the check box Size optimization for XML format (no pretty printing).
The file meta.xml contains the meta information of the document, which you can enter under File - Properties.
If you save a document with a password, all XML files except meta.xml will be encrypted.
The file settings.xml contains further information about the settings for this document.
In styles.xml, you find the Styles applied to the document that can be seen in the Stylist.
The meta-inf/manifest.xml file describes the structure of the XML file.
Additional files can be contained in the packed file format. For example, illustrations can be contained in a Pictures subdirectory, Basic code in a Basic subdirectory, and linked Basic libraries in further subdirectories of Basic.
unzip works, I tried it.
Warning, the actual text is in content.xml which is 1 line of XML id & 1 loooong line of "content". The actual words of the content are almost at the end in a series of "<text:p ... >" tags. Good luck reading it. less is probably just as good as vi.
$ unzip -d <extract_dir> <target_file>.sxw
$ cd <extract_dir>
$ less content.xml
If you need to work w/ the text, you probably should load Knoppix 1ce & save your target files in .txt format. Otherwise I see vi macros, or sed or awk scripts, with hairy regex's in your future.