LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 07-15-2021, 03:29 PM   #1
ccj4467
Member
 
Registered: Jan 2009
Posts: 34

Rep: Reputation: 5
Memory issues with Dell Poweredge servers


Hi all, need a little advice

The project I am working has a bunch of Dell Poweredge servers. We mostly administer these servers remotely. One of the servers a R330 consistently fails to reboot. Someone has to go to the server room and either power the system down manually or re-seat the DIMMS to get the server back up and running. Today another server a R720 exhibited the same problem.

My question is, would taking the DIMMS out and cleaning all the contacts(DIMM and slot) correct this problem?

Has anyone else ever come across problems like this?

If it matters we are running Ubuntu 20.04 server.

Last edited by ccj4467; 07-15-2021 at 03:31 PM.
 
Old 07-15-2021, 07:30 PM   #2
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,759

Rep: Reputation: 5930Reputation: 5930Reputation: 5930Reputation: 5930Reputation: 5930Reputation: 5930Reputation: 5930Reputation: 5930Reputation: 5930Reputation: 5930Reputation: 5930
As someone that has worked with electronics and avionics for a long time I've seen some some strange things happen and also illogical fixes...

Typically memory modules and their slots are gold plated so oxidation should not build up if manufactured properly. If the server room is air conditioned and kept at the proper humidity level something like this should not repeatedly happen if at all.

I have purchased many Dell computers over the years and have never had to clean the contacts. At least for me the PSU dies or typically the capacitors on the motherboard go bad. By that time they have outlived their usefulness and time to buy the next used/refurbished unit.
 
Old 07-15-2021, 07:44 PM   #3
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
Power supplies that are marginal can certainly show weird symptoms. Depending upon age it may be worth simply installing a new replacement power supply in one of the systems that is having the issue. If that fixes the issue then you can run for a lot longer on that system. If it does not fix it then you are not out a large investment and can try something else.

It is also worth inspecting the caps the next time the system is down just in case. Certain brands and styles are known to have failures that take down the system in strange ways. I don't remember the details, but in the late 90s and early 2000s there were some brands of motherboards known for cap failures that would take out the system.
 
Old 07-15-2021, 07:54 PM   #4
jefro
Moderator
 
Registered: Mar 2008
Posts: 22,005

Rep: Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629
The company I used to work had a two fail rule. Fail once, get going. Fail twice, replace.

Replacing the memory is likely an ESD damage but could be any issue. I assume that the system is not going into full power down. Instead of subjecting the system to abuse by playing with ram try this next time. Unplug power supply(s) and then press power button a few time. Return power and see if it powers up correctly.
 
Old 07-15-2021, 10:57 PM   #5
obobskivich
Member
 
Registered: Jun 2020
Posts: 596

Rep: Reputation: Disabled
Quote:
Originally Posted by computersavvy View Post
It is also worth inspecting the caps the next time the system is down just in case. Certain brands and styles are known to have failures that take down the system in strange ways. I don't remember the details, but in the late 90s and early 2000s there were some brands of motherboards known for cap failures that would take out the system.
This was/is known as the 'cap plague' and was an upstream manufacturing issue at some large Taiwanese suppliers. I forget exactly what 'went wrong' but it had to do with bad batches of electrolyte (it only affects electrolytic capacitors), but I don't remember if that was just a chemistry mix-up or environmental (e.g. being produced somewhere with different weather/humidity/etc). It affected a massive range of hardware more or less indiscriminately, if they used caps from any of these suppliers (I forget which all suppliers were on the list, I believe Teapo was one of them though). This is also why 'Japanese capacitors' have become a marketing point (because those factories were largely unaffected by it), despite the issue being largely resolved in the mid-2000s. Wikipedia has an article about it: https://en.wikipedia.org/wiki/Capacitor_plague

Depending on the age of these servers, this is absolutely something to consider, but newer systems tend to have much less to worry about in terms of 'the plague.'

As far as cleaning the contacts on the RAM - I've seen PCIe graphics cards that refuse to engage/negotiate at the full x16 (and instead settle for x2 or x4 in an x16 slot), and after cleaning the card's contacts and blowing dust out of the motherboard slot, everything worked again. FWIW, I'd give it a try if it isn't too tedious to do (I know some servers can have silly numbers of individual DIMMs to deal with). Something else to consider, if these are really 'big' servers, if the RAM and/or CPUs are on risers, those can come unseated or (presumably) need their contacts blown out from dust/debris too - I've seen a handful of Compaq Proliant machines brought down just by risers being slightly unseated due to being moved.

I also agree with jefro's suggestion and would add that I don't envy having to troubleshoot this.
 
Old 07-16-2021, 05:18 AM   #6
ccj4467
Member
 
Registered: Jan 2009
Posts: 34

Original Poster
Rep: Reputation: 5
Thank you all. Its been a while since I have really dealt with a lot of electronic stuff. All of your suggestions are very helpful and give me at least a path for troubleshooting the problems. As for the capacitors that is a good one, I remember I had some Dell Poweredge 1950 servers, there was a cap on the HBA for the internal drives that would periodically go bad. Had a great electronics supplier close by so replacing the caps was a breeze.

Thanks again. I am marking this solved.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
SUSE COMAPTIBILITY WITH HARDWARE Dell PowerEdge Family POWEREDGE 6650 James0806 Linux - Hardware 1 10-26-2015 03:02 AM
LXer: Dell, Canonical Partner on Ubuntu for PowerEdge Servers LXer Syndicated Linux News 0 03-13-2013 01:31 PM
Running KickSTart for Dell PowerEdge T100 is FAILING in Dell PowerEdge T100 II rosarion Linux - Server 4 02-20-2012 06:56 PM
Dell Poweredge R410 NIC driver issues with Lenny DaFakaMatt Linux - Hardware 2 02-26-2010 10:23 AM
Memory consumption for RHEL4 on Dell PowerEdge 2800 yenonn Linux - Enterprise 4 01-13-2006 10:05 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 07:05 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration