rjo98 09-18-2013 02:36 AM

soft lockup CPU stuck for seconds, server won't restart or start up
Just had someone do a yum update on one of our RHEL servers, and when he went to restart it, he got a "soft lockup - CPU#0 stuck for 67s! [migration/0:5]"
He tried a ctrl+atl+delete warm reboot and it came up with a similar error.
Trying a hard power down and reboot now.
Can anyone shed some light on that message? the server is a Dell R520 running a 6.something of RHEL. I tried googling it but all I could find were some bug posts from 2009.

business_kid 09-18-2013 10:52 AM

/shot in the dark.

Could be a hardware reset problem.

/history of that guess.
A long time ago, I had a cpu (a PIC) on a small smd board and I went with a big reset capacitor, but was running ultra-slow (32.768khz) so it did not get the requisite number of cycles of a reset.

Effects were: it apparently came up, but any writes to ram did not stick - if I wrote 0xf3 to a byte and then read it, the result was never 0xf3. My solution was to write zeroes to the ram I was using, and then it worked

rjo98 09-19-2013 09:40 AM

Thanks. after hard powering it down and back up, it seemed to come up ok. just wondering if it was something we did to cause it, or how we can avoid that in the future (in case we have to restart it remotely).

business_kid 09-19-2013 02:31 PM

You haven't given much detail on your system, just going on symptoms.

/Another shot in the dark
If it doesn't repeat, it just have been a brown out effect on one of the power suoply lines if there was a sudden power surge, as caused by a reboot. I'm presuming UPS in GWO and not one of the cheapo ones.

rjo98 09-24-2013 09:34 AM

Not sure what details are relevant. It's a dell r520 running RHEL 6.something.

No power conditions happened while this was going on, it happened on the reboot.

Thanks for the guesses though. Guess I'll just have to keep my eye on it next time we reboot.

business_kid 09-24-2013 12:17 PM

It's s sympyom of a UPS not kicking in on time.

rjo98 09-24-2013 12:47 PM

OK. I wonder if it has to do something with the advanced power features of the R520 then, as there were no power events that morning.

