LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 06-27-2002, 07:54 AM   #1
saravanan1979
Member
 
Registered: Jan 2002
Posts: 163

Rep: Reputation: 30
PHP Script to parse Word/RTF Documents


Hello
Does anybody know of some (PHP) scripts that I could use to parse a Word/RTF documents to HTML format.I run PHP on Red Hat Linux 6.2.For business reasons i am not in a position to install any additional libraries for this purpose.So can any one of u kindly tell me any PHP scripts for this purpose
 
Old 06-27-2002, 10:11 AM   #2
Mik
Senior Member
 
Registered: Dec 2001
Location: The Netherlands
Distribution: Ubuntu
Posts: 1,316

Rep: Reputation: 46
You are looking for a PHP script to display rtf documents in html format?? Does that mean you want to convert them every time a user requests the document?
I don't know of any PHP scripts for that but there are converters for rtf to html. http://www.geocities.com/tuorfa/unrtf.html
You could run that on all your rtf docs so they can be read as html. If however your documents keep changing and you want to convert them each time (great resource hog), you could probably write a php script which runs the converter and then jumps to the generated html script. I think in that case you would be better off running a cron script at night which converts all the changed documents every so often. That will take up double disk space but will be a lot faster then having to convert the complete document each time the user requests it.

But those are just my ideas, maybe someone has already implemented a nicer solution for that.
 
Old 06-27-2002, 10:24 AM   #3
saravanan1979
Member
 
Registered: Jan 2002
Posts: 163

Original Poster
Rep: Reputation: 30
Dear Mik
Thanks a lot for ur reply my requirement is not that.My requirement is almost the same,but i want to read the Word/Excel/RTF documents uploaded by the user using PHP scripts and display it in the HTML format to the user.For this since i run PHP on linux i am not able to do that,on Windows i coud do this by calling the MS word object or Excel object.Can u plz help me out to solve the problem
 
Old 06-30-2002, 09:57 AM   #4
Enforcer
LQ Newbie
 
Registered: Apr 2002
Location: Sweden
Distribution: FBSD 4.10, FBSD 5.2.1 , OBSD 3.4
Posts: 27

Rep: Reputation: 15
Hi!

Check out this script..

http://px.sklar.com/code.html?code_id=413

..or do a search at..

http://www.phpbuilder.com

..for something like "rft html", you will find some entries you may found interesting..
 
Old 07-02-2002, 03:39 PM   #5
amp2000
Member
 
Registered: Oct 2001
Location: Dublin, Ireland
Distribution: Mandrake 9.0 mostly!
Posts: 303

Rep: Reputation: 30
Quote:
My requirement is almost the same,but i want to read the Word/Excel/RTF documents uploaded by the user using PHP scripts and display it in the HTML format to the user.For this since i run PHP on linux i am not able to do that,on Windows i coud do this by calling the MS word object or Excel object.Can u plz help me out to solve the problem
I have the exact same problem, check out http://www.wvware.com/

I havent checked it out yet but it claims to do what we want.
Can you let me know if this works for you as I wont get a chance to try it out for a couple of days.

Cheer's
 
Old 07-03-2002, 01:19 AM   #6
saravanan1979
Member
 
Registered: Jan 2002
Posts: 163

Original Poster
Rep: Reputation: 30
wuware is a Linux utility u need to install in Linux and call the shell through PHP executable fucntions.But this does not resolve the problem i am looking for something native in PHP,since my application runs on 3-4 servers and the library needs to be insatlled in all the servers which is an impposible task.I am
still not sucessfull.Check out this
http://www.phpbuilder.com/columns/nair20020523.php3
it does gives u an guidance.I was able to partially trap the Bold ,Italics,Undeline in the Word Document.Please if u come across some source please do tell me i will also do the same.
Saravanan
Saravanan
 
Old 07-08-2002, 07:11 PM   #7
amp2000
Member
 
Registered: Oct 2001
Location: Dublin, Ireland
Distribution: Mandrake 9.0 mostly!
Posts: 303

Rep: Reputation: 30
I dont know if this will be of any help but here's a bit of an e-mail that might be worth looking at, I'm still looking for a solution so post back here with anything you have saravanan1979
Cheer's

-----------------------------------------------------------------------------------
>However, all you have to do is make sure you set the content type
>properly:
>
>header('Content-Type: application/msword');
>readfile('file.doc');
>
>XXXXXXX
>
>ps. If application/msword doesn't work, try application/octet-stream.
>
>pps. For a full list of mime types, the mime.types file in your apache
>configuration.
>
>> Hi,
>>
>> I am trying to view Word Docs on our intranet.
>>
>> All I can get is a load of code.
>>
>> Someone suggested that I use wvHtml - www.wvware.com - to convert
>> the files
>> but I am not sure how to tell PHP to use it??
>>
>> I have used readfile but still junk!
-----------------------------------------------------------------------------------
 
Old 07-09-2002, 01:19 AM   #8
saravanan1979
Member
 
Registered: Jan 2002
Posts: 163

Original Poster
Rep: Reputation: 30
Dear Amp2000
ur above code will not work.Since Linux can can never read MS anbd RTF files.Ofcourse header( can be used for transfer word documents from once palce to another but it does not help to read the word document.
But i have done the 2 using 2 utilites
1.unrtf: RTF-->HTML
2.WvHTML:MSWORD--->HTML

The above 2 utilites have to be installed in Linux and we need to call them in PHP using PHP executable functions.The above 2 utilited are 2 good
 
Old 02-17-2010, 04:45 PM   #9
javier.camacho
LQ Newbie
 
Registered: Feb 2010
Posts: 1

Rep: Reputation: 0
Wink PHP script

You can try PHPWordLib:

The PHPWordLib is a piece of PHP software which is intended to convert MS Word (.DOC) and Rich Text Format (.RTF or .DOC) files to plain text. The PHP library is self contained and does not require absolutelly anything external in order to run. The library has two simple functions to use - LoadFile and GetPlainText. The library does all necessary checking internally and its functions will always return FALSE if something seems to be wrong with the input file.

It works over Linux, Windows, Solaris, just need PHP... google it
 
Old 02-17-2010, 08:35 PM   #10
kabirkhanna
LQ Newbie
 
Registered: Feb 2010
Posts: 1

Rep: Reputation: 0
Wordlib

Quote:
Originally Posted by javier.camacho View Post
You can try PHPWordLib:

The PHPWordLib is a piece of PHP software which is intended to convert MS Word (.DOC) and Rich Text Format (.RTF or .DOC) files to plain text. The PHP library is self contained and does not require absolutelly anything external in order to run. The library has two simple functions to use - LoadFile and GetPlainText. The library does all necessary checking internally and its functions will always return FALSE if something seems to be wrong with the input file.

It works over Linux, Windows, Solaris, just need PHP... google it
I looked at the site of Wordlib and tried to use their demo but it did not work. Have you tried using wordlib successfully ? If not, any other alternatives. - thanks
 
Old 02-18-2010, 08:25 AM   #11
saravanan1979
Member
 
Registered: Jan 2002
Posts: 163

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by kabirkhanna View Post
I looked at the site of Wordlib and tried to use their demo but it did not work. Have you tried using wordlib successfully ? If not, any other alternatives. - thanks
Tried the ones I had mentioned earlier, thet work fine as well
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Q reg. word documents!! arunsri Linux - General 2 03-14-2005 07:42 PM
kopete icq can't parse xml-documents ungua Suse/Novell 2 10-24-2004 11:19 AM
use php script to parse a file. blackzone Linux - Software 1 07-07-2004 05:43 AM
MS-Word Documents into MySQL dkroft Linux - Software 0 06-11-2004 02:45 PM
How to read ans parse MS word file using a Linux Shell script. Alek Linux - General 2 11-10-2003 03:07 PM


All times are GMT -5. The time now is 11:28 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration