LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   Server unresponsive--high load ave, minimal CPU usage (http://www.linuxquestions.org/questions/linux-server-73/server-unresponsive-high-load-ave-minimal-cpu-usage-4175445898/)

Enteleki 01-16-2013 12:50 PM

Server unresponsive--high load ave, minimal CPU usage
 
Version: Red Hat Enterprise Linux ES release 3 (Taroon Update 9)
Kernel: Linux 2.4.21-47.0.1.ELsmp i686

Following the new year, users have complained that they have been unable to make a remote connection via SSH. The connection is refused immediately, PuTTY closes, and they don't even make it to the login prompt.

Using VNC, I am able to remote to the desktop (Gnome), start a terminal and run TOP. This will typcally show a load average of 23 or higher (I have seen 52+) with very little CPU usage. There are four processors and they all show 97% idle or higher.

Looking at the memory, I see nothing unusual. No swap memory is being used.

To get the server functioning again I have to force a restart. This fixes the problem for 24 to 48 hours when it arises again.

This server runs a progress database and hosts QAD's MFG/Pro. Last year, we migrated to a different ERP, so this is being kept alive for reference. It gets very little use.

How do I go about trouble shooting this problem? So far, I have shutdown all MFG/Pro batch queues thinking something is running and has been flaked-out by the change in year. No change.

Any assistance would be appreciated.

unSpawn 01-16-2013 01:36 PM

List the actual processes that consume resources, check your system and daemon log files for clues, run any SAR so you know resource usage over time, check, re-check and check service access restrictions again (its for reference so a select few should access it to begin with) and FCOL please decommission this box RSN (like yesterday).

Enteleki 03-20-2013 12:31 PM

The problem finally made itself known.

The SAR auditing, which had been running for years, was interfering with password authentication. Turned it off and everything is up and running without a hitch.

chrism01 03-20-2013 08:43 PM

Can you expand on that please; I've no idea how sar could interfere with authentication, they're not remotely related (afaik)...


All times are GMT -5. The time now is 01:05 PM.