Search engine for Linux?
Hello,
Looking for a complete search engine for linux to crawl and index a company intranet...
Company is mostly microsquish products, so it would need to parse msword, excel, pdf...
Any recommendations?
Has google desktop search wiped the linux world of search projects?
So far i've tried
htidg (looks like a zombie'd project)
easy to setup
searching in minutes
LFS was a pain to figure out
project is dead
Database is wiped out and fully re-indexed; no partial indexing
search fuzzy algorithms are er special...
spider takes forever
parsers lock up on excel files; memory leak somewhere (ate up 10 gigs of ram on a 2 megabyte file)
IF the spider / pareser locks up you lose an entire nights work of indexing...
Nutch
Appears to have support for parsing word documents / excel; cannot get to work
meant to be more of an api?
ugh, tomcat and sun java
hard to get working; still cannot receive search results
documentation is bit hard to dig up
|