LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   server load high (https://www.linuxquestions.org/questions/linux-general-1/server-load-high-711004/)

graziano1968 03-12-2009 05:40 AM

server load high
 
Hello

this morning my server load was high , over 10 .

I was monitoring procs with
ps aux
and
top

I stopped apache and mysql. I noticed that the server load decreased to
3 , however it was not going down. Then I noticed that top was reporting and user usage of 60%-70% .

However I was not able to see any user process eating all cpu neither with top , neither with

ps aux | grep -v "root"

There could be some hidden process (not visible by top or ps) which can explain that 60% used by user ?
Looking at top procs list or ps there was nothing to explain that 60% us


Thank you!

MensaWater 03-12-2009 07:30 AM

There are a few things that might cause this:

1) A poorly designed program or script that isn't allowing adequate time between runs of commands.
For example:
Code:

while true
do ls -lR /
done

The above script would still be doing the first ls -lR when it invoked the next one and then the next one might see the first 2 etc...
It is easy to solve this by simply adding a sleep statement sufficient to make it wait for completion of first command. Sometimes a simple "sleep 1" for a 1 second wait is sufficient. For something like the example I'd probably do a sleep 120 to give it 2 minutes to list all the files.

2) Similar to the above is the case where multiple users are running the same command that eats up all the same resources. For example if you had 100 users all running "top" at the same time you'd see an issue because they'd all be competing for the same information.

3) Less common: I've seen on a couple of Dell servers with PERC cards that the PERC cards are very sensitive to heat. When they heat up they might cause a file/directory to lock. From that point on all attempts to access that file/directory hang because they are trying to complete and this will increase the cpu load ad infinitum as new processes are kicked off (e.g. locate command from cron). The only solution for this is to reboot the server to clear the lock. Of course then you also have to address the heat issue to prevent recurrence.
Note that Dell doesn't admit there is any heat sensitivity to the PERC cards so my comments are based on observation. I've seen this even when the system's diagnostics don't show any heat threshold being exceeded but the PERC card has no heat sensor. The PERC is made by LSI so presumably this could be an issue in any other LSI controller based system like the Dell PowerEdge machines.

graziano1968 03-12-2009 10:02 AM

In case 1) why I can't see this script running ps aux ?

In case 2) , no I can't see all this procs investigating with ps

3) is not my case.

jamescondron 03-12-2009 11:40 AM

Whats your wa level on top showing? My load averages were doing similar a while ago, turns out the hard drive was failing. Of course, did you take a look at incoming connections, or open files? There could be all kinds of things happening inside or outside of your box. I find iftop is a good tool for monitoring connections and incoming data. lsof, of course, is to list open files. Track back from either of those and look for a rogue process.

graziano1968 03-12-2009 11:46 AM

Quote:

Originally Posted by jamescondron (Post 3473348)
Whats your wa level on top showing? My load averages were doing similar a while ago, turns out the hard drive was failing. Of course, did you take a look at incoming connections, or open files? There could be all kinds of things happening inside or outside of your box. I find iftop is a good tool for monitoring connections and incoming data. lsof, of course, is to list open files. Track back from either of those and look for a rogue process.

it's not the case too , wa was under 5% I'm sure of this.
I experienced too an hard drive failure some month ago where wa was always at the top, it's not this case (luckly!).

Thanks for iftop I will look it !!

MensaWater 03-12-2009 01:32 PM

Quote:

Originally Posted by graziano1968 (Post 3473249)
In case 1) why I can't see this script running ps aux ?

In case 2) , no I can't see all this procs investigating with ps

3) is not my case.

If it were a script you'd not see the script running but rather the commands within the script at the time they were running. That is to say for my example you'd see the "ls -l" rather than whatever you'd named the script. It is a little difficult to track this kind of thing down because usually you'll have multiple commands running in a script so at any given moment the commands in the process tree may be different.


All times are GMT -5. The time now is 09:18 PM.