LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 11-21-2003, 04:22 AM   #1
J_Szucs
Senior Member
 
Registered: Nov 2001
Location: Budapest, Hungary
Distribution: SuSE 6.4-11.3, Dsl linux, FreeBSD 4.3-6.2, Mandrake 8.2, Redhat, UHU, Debian Etch
Posts: 1,126

Rep: Reputation: 58
PHP: text search in StarOffice and OpenOffice documents, how to do it fast?


I added a text search capability via apache+PHP to one of our unix servers, to search text in StarOffice (sdw) and OpenOffice (sxw) documents.
It is a must, since Windows XP does not seem to search text in those files (Win98 did it in sdw!).

Searching text in sdw under unix turned out to be as easy as calling grep and building the result page by PHP. It rocks: ways faster then WinXP's own search util with .doc files.

I have problem with OpenOffice documents, which are zipped xml files. The present search method seems to be very-very slow.
Here is how I do it now:
1. Run find from PHP to apply some search conditions on filename (this step is fast, no need for tweaking)
2. In the case of each found file, PHP passes the filename and the search pattern ('words') to a shell script (shell_exec), that
- calls unzip to extract 'content.xml' from the sxw file and
- pipes it to sed to remove the xml tags from the text, then
- calls grep as many times as many search words there are (this is fast)
- returns the filename if all search words were found, and does not return anything if no, or not all of the search words were found.

Could you give me an idea, how to tweak the above method to be fast, or if there is a command line tool that can 'grep' OpenOffice documents in one step? (zgrep does not seem to do it).

Last edited by J_Szucs; 11-22-2003 at 06:35 AM.
 
Old 11-22-2003, 06:37 AM   #2
J_Szucs
Senior Member
 
Registered: Nov 2001
Location: Budapest, Hungary
Distribution: SuSE 6.4-11.3, Dsl linux, FreeBSD 4.3-6.2, Mandrake 8.2, Redhat, UHU, Debian Etch
Posts: 1,126

Original Poster
Rep: Reputation: 58
On the way to sink... But I thought I might give it a last try.

So, anyone having an idea, how to do it fast?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
what tool can search for documents containing certain text? jacksonscottsly Linux - Software 3 07-19-2004 01:44 AM
Can StarOffice and OpenOffice co-exist satimis Linux - Software 2 12-11-2003 03:27 PM
OpenOffice, Staroffice, or KOffice? h1tman Linux - Software 3 08-12-2003 01:21 AM
OpenOffice||StarOffice Cyth Linux - General 1 11-15-2001 06:06 PM
StarOffice 6.0 and OpenOffice netbob General 4 11-01-2001 01:36 PM


All times are GMT -5. The time now is 09:49 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration