Strange (but fatal) recurring rackmount problems
I'd like some input... here's the situation.
I had a normal desktop box in a normal tower case on which I setup a Centos 6 DHCP, DNS and a Samba PDC.
We have a rack mount setup at work which already contains 12 other servers. All are built into rack mount trays.
I had the board, hdd and power supply built into a standard rack mount tray by an external provider, and they installed the tray into the rack for me. The tray is the exact same as the trays that contain our 12 other servers.
So I started the machine, and all was fine - it was doing DHCP, DNS and PDC duties. I tested it for several hours, then went home at end of business.
Came in the next morning and it was dead. Pulled the tray, had it taken apart, and it had melted - you can actually see where the traces on the motherboard melted and flowed together. The power supply is fine, had it measured and it is outputting as it should. So...
Replaced the motherboard, put in another power supply, CPU, and HDD with a different model. Reinstalled Centos, re-setup the DHCP, DNS and PDC servers. Had it all installed in the same physical tray.
Came back, melted a SECOND time. Same parameters, motherboard totally destroyed, CPU gone, and HDD dead.
Other 12 machines are fine and running 100%. All the network switches and routers mounted in the same rack also fine.
Only factor is the tray itself, and the rack - I went over it with a fine tooth comb, there are no projections or irregularities - it is properly spaced, so it appears not to be a short-to-case or something similar.
Thing is as well, it WORKS fine for about 12 or 14 hours, but leave it anything longer than 24 hours and hardware in that tray is promptly destroyed.
It is getting quite expensive... any ideas what I can try / do? All that is left to change is the tray itself, but if the tray is the culprit, why fail after an indeterminate amount of time - not immediately, if it is a short or something similar?
It is a properly stabilized server room with an ambient temp of about 16 deg C and stabilized, protected powersupply with auto-start generator backup. There have been no electrical events anywhere nearby, no need to fall back to generator, or any other salient events. All the other machines (even the one in the adjacent tray, about 20 centimeters lower, vertically) are fine and running 100%.
Any ideas or comments? What the flaming h...l could be going on that keeps smoking the hardware I try to add to the rack?
Last edited by rylan76; 11-20-2012 at 05:05 AM.