LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Total loss of power on HD seek ??? (https://www.linuxquestions.org/questions/linux-hardware-18/total-loss-of-power-on-hd-seek-174568/)

KendersPlace 04-25-2004 11:13 PM

Total loss of power on HD seek ???
 
A customer's machine is running RH 8.0. It is basically a database server running mysql. It is a very stripped down installation - database, security, and web.

The drive went out, sent it for warranty replacement - a new drive (checked mfr date, it is indeed new, not the repaired old drive) arrives back.

Hard drive: Western Digital 250GB, 7200rpm, 8mb buffer "SE" edition.


SINCE REPLACING THE DRIVE....

Loaded up fresh install of RH on the new drive from the RH install CD's. About the 3rd time the box booted up - it would hit the partition the mysql tables are on during the boot process (listing out on the screen) and would suddenly drop all power when it hit this one partition - as if someone had pulled the power cable out of the wall.

Pressing the power button or reset button again had no effect - the box was totally dead. I flipped the switch on the POWER SUPPLY on the back of the box off and on, unplugged and re-plugged the power cable, then hit the front power button, and it started right back up with a fresh boot.

Out of the next 4 or 5 boots - each time it would hit this partition, the same thing happened. Finally, on the 6th or so boot (I was trying to see what was on the screen just before it died each time) it started up just fine. Worked perfectly.

Now, about every week or so, seemingly randomly - the box will just drop power again - same symptom, but happens after it's been running for a while. Have to cycle the power supply again, and it starts back up - a week later - another loss of power. IT ALWAYS SEEMS TO HAPPEN RIGHT AFTER KICKING OFF A LARGE SQL QUERY.

I replaced the power supply (was a 300w, replaced with brand new 400w). The exact same thing is still happening.


The question:

I've never seen anything like this, and have built 3 IDENTICAL machines for 3 different customers running IDENTICAL datbases, all have been fine for over a year, now this problem with this one box - SINCE replacing the hard drive.

Because this seems to happen during access only to one SQL partition - is it possible that the hard drive is somehow shorting out and tripping the power supply? This doesn't seem like a motherboard problem - seems to be directly related to this one partition on the hard drive.

Is there anything else that could cause this? Should I send this drive also back to Western Digital for replacement??


Box Specs:
Athlon XP 2000+
128 MB RAM
Single hard drive (noted above)
FIC motherboard w/ onboard video and NIC. (onboard sound and other periphrials disabled via Bios).

That's the only thing in this box. A processor, ram, motherboard, and a hard drive. No floppy, no CD, no anything - strictly a network accessible database server.


Thanks much in advance, sorry for long post - wanted to include all the details!

-K

kilgoretrout 04-26-2004 08:03 AM

Download the diagnostic utility from the WD website and thoroughly check the drive. You'll need to do that to rma the drive anyway. If the drive checks out OK, I'd suspect an overheating problem. Check to make sure your fans are working properly and the heatsink is properly mounted. Also try swapping out the drive cable.

J.W. 04-26-2004 03:02 PM

The only variable seems to be with that one HD, so Yes, I'd agree with kigoretrout that you need to return it. Based on your description, it definitely sounds defective.

The only other possibility that I could think of would be if you were OC'ing the Athlon. Since the behavior only seems to manifest itself when executing CPU-intensive and RAM-intensive queries, then OC'ing could introduce some instability leading to a hung state. Along these lines, if you are using multiple sticks of RAM, are they all the same speed, and is the RAM speed matched to the CPU? -- J.W.

KendersPlace 04-27-2004 01:12 AM

Thanks for the replies.

1. The CPU is not overclocked.
2. The RAM is a single stick, PC2100.
3. RAM and CPU speeds are matched as far as I know. I always clear the CMOS with the jumper when building a new box, and I didn't change anything in the BIOS as far as RAM speed, so it should be default.

(The box is still at customer location - picking it up in a couple days).

The diagnostic utility from W.D. only runs under windows - the drive can be swapped to a W box, but was hoping they put out something basic and low-level I could just run under L. Guess not.

I hadn't considered the ribbon cable. That is possible as I had to fight with the drive mounting bracket and may well have damaged the cable. I'll try swapping that out w/ another EIDE cable first and if it happens again I guess it goes back.


Thanks again.
-K

J.W. 04-27-2004 02:03 AM

There may be some underlying issue with the RAM. It may be useful to run the diagnostics from here: http://memtest86.com/ -- J.W.


All times are GMT -5. The time now is 03:32 PM.