htdig problem

		LinuxQuestions.org > Forums > Linux Forums > Linux - Software
htdig problem

Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Welcome to LinuxQuestions.org, a friendly and active Linux Community.

You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!

Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.

Are you new to LinuxQuestions.org? Visit the following links:
Site Howto | Site FAQ | Sitemap | Register Now

If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.

Having a problem logging in? Please visit this page to clear all LQ-related cookies.

Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.

Exclusive for LQ members, get up to 45% off per month. Click here for more info.

Search this Thread

12-02-2004, 05:09 AM	#1
merlin23 Member Registered: Dec 2004 Location: Vienna Posts: 46 Rep:	htdig problem Hi anyone, I don't know if this thread would be better put into the networking section, but nevermind, here's my problem: I have an apache installed with htdig to parse a big amount of html,doc,pdf files and so on... (about a few ten-thousands of files)... Every night the whole things get updated with crontab, but every morning I find hundreds of errors in my log-Files ... Mostly these are pdf-files, that get scanned with an external parser (pdftotext) or doc and ppt files... What makes me wonder is, that if I parse the pdfs which were not merged by hand with pdftotext, there sometimes does not seem to be any problem. Why did htdig had problems? Is this maybe due to a lack of memory? Is htdig overloaded with so much data? Also, what makes me wonder, is that very often files that did not get parsed at the last merge, get now merged, and some that were parsed the last time, now don't. Maybe there would be solutions with making 2 databases, and every merge copy the files with problems somewhere else and rescan them, afterwards merge the 2 databases together? Has somebody some experience or could give me a hint how to solve the problem in another way maybe? Last edited by merlin23; 12-02-2004 at 05:13 AM.

Posting Rules
You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is Off HTML code is Off Forum Rules

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
htdig install	MichaelHall	Slackware - Installation	2	03-20-2004 08:57 PM
perl problem? apache problem? cgi problem?	WorldBuilder	Linux - Software	1	09-17-2003 07:45 PM
why my "htdig" only search serail words?	beelzebub888	Linux - Software	0	12-28-2002 05:20 AM

All times are GMT -5. The time now is 01:20 PM.

Main Menu

(Con't)

My LQ

Write for LQ

LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.

Main Menu

Syndicate

Latest Threads

LQ News

Twitter: @linuxquestions