Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Some time back (mid-October) we noticed a dramatic increase in CPU load during a weekend. On researching this we were able to trace it back to a file on an ext3 filesystem that would cause any process that attempted to access it to hang (thereby adding to the runq). Since the slocate cron job ran every day as did an incremental or full backup they too would get hung and add to the runq. This was the first time such an issue had been noted in over a year of running this RH AS 3 system. The issue was solved by rebooting the server. The file in question was easily accessible after the reboot.
Overnight our incremental backup failed and I see once again that slocate/updatedb has hung on a file but it is not the same file nor even the same filesystem as the prior one though it is an ext3 filesystem.
We did a reboot to clear the problem but I’m wondering if this is being caused by slocate/updatedb or is it just the first thing that finds it. If the latter; what is causing the initial file lock?
P.S. Before anyone suggests it – of course we had tried doing progressive kills on all processes referencing the file – the kills including kill -9 do not work.
I had similar problems. I appeared to be having increasing problems with disk accesses. I wasted a lot of time exercising the disks in numerous different ways. I changed the disks around, changed the jumpers, etc. etc. When I put the apparently defective disks on a different computer they worked okay. It turned out that my motherboard was malfunctioning. New motherboard -> problem fixed.
Last edited by stress_junkie; 01-10-2007 at 05:39 PM.
This is on a Dell PowerEdge rather than a build my own system. Doesn't mean the motherboard can't be a problem but since it has a SCSI (PERC) adapter for the drives and they're mirrored it doesn't seem likely. Maybe it's glitches in the PERC. I just wanted to see if there were any known issues with slocate/updatedb causing this kind of thing on occasion.
What I eventually figured out was that this was a heat issue. We saw this on other systems as well. All of them were in the same rack which was fairly full and in the center of the data center. Often we would see battery issues reported on the system LED for the PERC cards.
Although the motherboards had monitors for heat the PERC card doesn't. Dell wouldn't admit the PERC was more heat sensitive than the motherboard but we were able to prove it to ourselves by observation. Simply by opening the door of the rack we were able to make the battery message go away and by closing the door we were able to make it come back. As time has gone on every time there has been a heat event in the data center we saw the same file locking. There was never a time I saw it that I wasn't able to trace it back to increased heat. (In one event I found someone had turned off the fan in the top of the rack - for no apparent reason.) We have mostly mitigated this by adding additional fans to the back of the rack door itself.