[SOLVED] BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0]
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0]
I have this DELL PE2950 running Asterisk 18.104.22.168 on RHEL 5 with no problems since Dec last year. We are using Digium TE412P to connect to an E1 ISDN line. Since Dec last year, we did not add or delete any software or hardware. We also did not do any "yum update".
The linux kernel is 2.6.18-92.1.22.el5
Last week, the users reported that people from outside could not dial in but users can dial out. We rebooted the box and everything was fine.
Suddenly, starting this week, the box froze several times a day with a "BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0]" error message on the console. Before it freezes, I can see a continuous stream of error message "timing source auto card 0!" coming up on the machine.
We rebooted and it became okay for a few hours and we had to reboot it again in order to clear the problem.
Q1. A strange thing is I could not find this error message in /var/log/messages or dmesg. The soft lockup error message can only be found on the machine itself.
Q2. Could it be kernel incompatibility problem? However, we did not ever change anything since it was installed.
Q3. From the error message, how do I know it is a software (kernel?) or hardware problem?
I would appreciate if someone could give me any suggestions.
In general, something must have changed to cause a problem. Two possibilities spring to mind: either a piece of hardware has failed, or someone has broken in to your unpatched system.
I'm not sure if this is what you're seeing, but there is a kernel bug (http://bugzilla.kernel.org/show_bug.cgi?id=10753) which was fixed in 2.6.27 that caused similar error messages. It may be a good idea to update your system and see if the problem goes away.
Something which I did not tell you in the post - the Asterisk server was connected to a Rhino Channel Bank via a port on the Digium TE412P card. We found out that system would not freeze if we disconnect the Rhino Channel Bank.
We opened a call with Rhino but the guys were very confident that the problem was not caused by Rhino.
Anyway, as we were running out of ideas, we quickly ordered a new Rhino Channel Bank and connected it to the Asterisk server.
Guess what - problem disappeared!