GeneralThis forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I had a normal desktop box in a normal tower case on which I setup a Centos 6 DHCP, DNS and a Samba PDC.
We have a rack mount setup at work which already contains 12 other servers. All are built into rack mount trays.
I had the board, hdd and power supply built into a standard rack mount tray by an external provider, and they installed the tray into the rack for me. The tray is the exact same as the trays that contain our 12 other servers.
So I started the machine, and all was fine - it was doing DHCP, DNS and PDC duties. I tested it for several hours, then went home at end of business.
Came in the next morning and it was dead. Pulled the tray, had it taken apart, and it had melted - you can actually see where the traces on the motherboard melted and flowed together. The power supply is fine, had it measured and it is outputting as it should. So...
Replaced the motherboard, put in another power supply, CPU, and HDD with a different model. Reinstalled Centos, re-setup the DHCP, DNS and PDC servers. Had it all installed in the same physical tray.
Came back, melted a SECOND time. Same parameters, motherboard totally destroyed, CPU gone, and HDD dead.
Other 12 machines are fine and running 100%. All the network switches and routers mounted in the same rack also fine.
Only factor is the tray itself, and the rack - I went over it with a fine tooth comb, there are no projections or irregularities - it is properly spaced, so it appears not to be a short-to-case or something similar.
Thing is as well, it WORKS fine for about 12 or 14 hours, but leave it anything longer than 24 hours and hardware in that tray is promptly destroyed.
It is getting quite expensive... any ideas what I can try / do? All that is left to change is the tray itself, but if the tray is the culprit, why fail after an indeterminate amount of time - not immediately, if it is a short or something similar?
It is a properly stabilized server room with an ambient temp of about 16 deg C and stabilized, protected powersupply with auto-start generator backup. There have been no electrical events anywhere nearby, no need to fall back to generator, or any other salient events. All the other machines (even the one in the adjacent tray, about 20 centimeters lower, vertically) are fine and running 100%.
Any ideas or comments? What the flaming h...l could be going on that keeps smoking the hardware I try to add to the rack?
Very simply sounds like a simple problem of rather massive overheating.
Have you checked all the heatsink and case fans, as well as the ventilation path in the rack itself?
If this is the top unit in the rack, you need to ensure it's not being unduly heated by the units below it. Improper air circulation can turn a rack cabinet into a small blast furnace and the top unit gets the brunt of the hot air flow.
Whatever's going on, it's certainly impressive. Could replace the motherboard and other components with a frozen pizza? If it cooks you'll have ruled out any electrical short in the case, as well as finding a good use for the heat.
It sounds unlikely. What kind of CPUs were installed ? Most are able to throttle down or at least cut power before melting down. Can you identify the source of the meltdown ? Maybe it wasn't the CPU...