server load high
Hello
this morning my server load was high , over 10 . I was monitoring procs with ps aux and top I stopped apache and mysql. I noticed that the server load decreased to 3 , however it was not going down. Then I noticed that top was reporting and user usage of 60%-70% . However I was not able to see any user process eating all cpu neither with top , neither with ps aux | grep -v "root" There could be some hidden process (not visible by top or ps) which can explain that 60% used by user ? Looking at top procs list or ps there was nothing to explain that 60% us Thank you! |
There are a few things that might cause this:
1) A poorly designed program or script that isn't allowing adequate time between runs of commands. For example: Code:
while true It is easy to solve this by simply adding a sleep statement sufficient to make it wait for completion of first command. Sometimes a simple "sleep 1" for a 1 second wait is sufficient. For something like the example I'd probably do a sleep 120 to give it 2 minutes to list all the files. 2) Similar to the above is the case where multiple users are running the same command that eats up all the same resources. For example if you had 100 users all running "top" at the same time you'd see an issue because they'd all be competing for the same information. 3) Less common: I've seen on a couple of Dell servers with PERC cards that the PERC cards are very sensitive to heat. When they heat up they might cause a file/directory to lock. From that point on all attempts to access that file/directory hang because they are trying to complete and this will increase the cpu load ad infinitum as new processes are kicked off (e.g. locate command from cron). The only solution for this is to reboot the server to clear the lock. Of course then you also have to address the heat issue to prevent recurrence. Note that Dell doesn't admit there is any heat sensitivity to the PERC cards so my comments are based on observation. I've seen this even when the system's diagnostics don't show any heat threshold being exceeded but the PERC card has no heat sensor. The PERC is made by LSI so presumably this could be an issue in any other LSI controller based system like the Dell PowerEdge machines. |
In case 1) why I can't see this script running ps aux ?
In case 2) , no I can't see all this procs investigating with ps 3) is not my case. |
Whats your wa level on top showing? My load averages were doing similar a while ago, turns out the hard drive was failing. Of course, did you take a look at incoming connections, or open files? There could be all kinds of things happening inside or outside of your box. I find iftop is a good tool for monitoring connections and incoming data. lsof, of course, is to list open files. Track back from either of those and look for a rogue process.
|
Quote:
I experienced too an hard drive failure some month ago where wa was always at the top, it's not this case (luckly!). Thanks for iftop I will look it !! |
Quote:
|
All times are GMT -5. The time now is 09:18 PM. |