LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 09-17-2010, 06:00 AM   #1
enid
Member
 
Registered: May 2009
Posts: 42

Rep: Reputation: 2
High load and high cpu kernel usage


Hello to all,

On one Debian GNU/Linux 4.0 server , running different servers like dns/bind, sendmail, apache etc, I'm having high load and with top command there is not anything abnormal, but with htop I can see that kernel cpu usage is getting around 100% for all the cores (showing the bars in red) and also the total load average of the server is getting above 100

The nr of processes and RAM usage seems ok.

Where can I look for any problem related with this?

Thanks,
Enid
 
Old 09-17-2010, 10:13 AM   #2
adamwonski
LQ Newbie
 
Registered: Aug 2010
Distribution: debian, centos
Posts: 20

Rep: Reputation: 0
So top doesn't show high load and htop does? Maybe your version of htop is broken? Maybe look here to tell which one is right:
% watch -n1 cat /proc/stat
- what's the maximum value of procs_running after some observation?
- same for procs_blocked
- which columns of cpu* lines are growing in the fastest pace?

And what's the nr of processes?
 
Old 09-20-2010, 03:35 AM   #3
enid
Member
 
Registered: May 2009
Posts: 42

Original Poster
Rep: Reputation: 2
Quote:
Originally Posted by adamwonski View Post
So top doesn't show high load and htop does? Maybe your version of htop is broken? Maybe look here to tell which one is right:
% watch -n1 cat /proc/stat
- what's the maximum value of procs_running after some observation?
- same for procs_blocked
- which columns of cpu* lines are growing in the fastest pace?

And what's the nr of processes?
Hi adamwonski,

The command watch -n1 cat /proc/stat shows that cpu2 and cp3 are growing higher than the others, than cpu0 and cpu1.

I ment about htop and top, that they show exactly the same load average but the cpu usage about kernel (showed in red at htop) it isn't shown with top command.

Max value of procs_running is below 10, and also the procs_blocked is below 10, and also below the value of procs_running.

Because the server went "Kernel Panic" I rebooted and suspect that the high load is because the I/O operations (hdd's configured as Raid5), and the nr of procs now is around 800000

Thanks,
Enid

Last edited by enid; 09-20-2010 at 03:37 AM.
 
Old 09-21-2010, 01:18 AM   #4
adamwonski
LQ Newbie
 
Registered: Aug 2010
Distribution: debian, centos
Posts: 20

Rep: Reputation: 0
800.000 processes?

If you can use sar, you can run this to observe CPU usage by I/O requests:
Code:
% sar -d 1 0
%util - of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.
 
Old 09-21-2010, 04:58 AM   #5
enid
Member
 
Registered: May 2009
Posts: 42

Original Poster
Rep: Reputation: 2
Hi adamwonski,

Indeed the %util is very close to 100 most of the time.
What exactly does this mean and how can it be improved?

Thanks again
Enid
 
Old 09-22-2010, 11:58 AM   #6
adamwonski
LQ Newbie
 
Registered: Aug 2010
Distribution: debian, centos
Posts: 20

Rep: Reputation: 0
that means your devices are saturated / overwhelmed with requests

Quote:
Originally Posted by enid View Post
hdd's configured as Raid5
is that software RAID? Do you see any drive broken, or the RAID un-synced? syncing?

i think that having constant load of 10 procs for 4 CPUs is not too bad, although if most of them are also blocked all the time (as I understand from your previous post), then either you have a problem with disks, or your applications (or 1 of them) use them extensively. Is your disk space schrinking fast? You can run something like this to observe:
Code:
watch -n1 -dc df
maybe it's swapping?
Code:
vmstat 1
how do the swap-si/so columns look like?
what does the 'free' command show in Swap line?

add -p parameter to see easier to understand device names:
Code:
sar -dp 1 0
which drives/partitions belong to RAID? which are most loaded? does reading or writing prevail? what other interesting numbers can you observe?

do you see anything particular in logs?

when exactly the problems began? did you change anything prior to that time? ANYTHING? even completely unrelated in your opinion?
 
Old 09-23-2010, 09:03 AM   #7
enid
Member
 
Registered: May 2009
Posts: 42

Original Poster
Rep: Reputation: 2
Quote:
Originally Posted by adamwonski View Post
that means your devices are saturated / overwhelmed with requests


is that software RAID? Do you see any drive broken, or the RAID un-synced? syncing?
No it is HW Raid, all the drives are showing OK, and the RAID seems working OK.

Quote:
Originally Posted by adamwonski
i think that having constant load of 10 procs for 4 CPUs is not too bad, although if most of them are also blocked all the time (as I understand from your previous post), then either you have a problem with disks, or your applications (or 1 of them) use them extensively. Is your disk space schrinking fast? You can run something like this to observe:
Code:
watch -n1 -dc df
I see that especially the /var partition is growing faster than the others but not at a very high rate.

Quote:
Originally Posted by adamwonski
maybe it's swapping?
Code:
vmstat 1
how do the swap-si/so columns look like?
what does the 'free' command show in Swap line?
Most of the time si/so show zero, and free (swap around 160MB used from 3800MB)

total used free shared buffers cached
Mem: 2060388 2030080 30308 0 29260 753480
-/+ buffers/cache: 1247340 813048
Swap: 3895720 167896 3727824


Quote:
Originally Posted by adamwonski
add -p parameter to see easier to understand device names:
Code:
sar -dp 1 0
which drives/partitions belong to RAID? which are most loaded? does reading or writing prevail? what other interesting numbers can you observe?

do you see anything particular in logs?

when exactly the problems began? did you change anything prior to that time? ANYTHING? even completely unrelated in your opinion?
As I said the Raid is HW and the all the hard drives (5 HDD's)are shown as 1 big HDD ~1.3TB, partitioned in several partitions.
I think writing prevail most of the time.

I do mention that I did some changes to /etc/fstab (addedd noatime and nodiratime to the /var and /home partitions)
This increased significantly the performance but although the problems seems not to have gone away completely, the load keeps going 100 but at lower rate.
I did an upgrade of the popd/imapd server (dovecot) suspecting that it was causing the problem, which was showing error logs like segfault and now they have gone away.

The problem began about two weeks ago, and I'm sure that no change was made to the server, as concerning to the configuration or anything else, except that I noticed the partition /var and /home growing (not too much although) and the load kept increasing (but always below 20 - 30) not 100.

Thanks,
Enid

Last edited by enid; 09-23-2010 at 09:04 AM.
 
Old 09-27-2010, 02:53 PM   #8
adamwonski
LQ Newbie
 
Registered: Aug 2010
Distribution: debian, centos
Posts: 20

Rep: Reputation: 0
If you have ext2/ext3 file system and can install blktrace on the server you can try to gather more info with it. Manual has examples, the simplest use is:
Code:
btrace /dev/sda
if you get this error:
Code:
mount -t debugfs debugfs /sys/kernel/debug
mount debugfs:
Code:
mount -t debugfs debugfs /sys/kernel/debug

Last edited by adamwonski; 09-27-2010 at 02:55 PM.
 
Old 09-30-2010, 03:33 AM   #9
enid
Member
 
Registered: May 2009
Posts: 42

Original Poster
Rep: Reputation: 2
I did an upgrade of the kernel from vanilla-kernel 2.6.35.5, compile/make/make install, because suspecting of any bug or raid driver malfunctioning.

Now the load average is lower but when the memory usage increases, also the io wait % of cpu increases and the load average also. (lower rates than before)
I plan to increase RAM also and see how it will go.

Regards,
Enid
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Load is very high but CPU usage is almost zero in top! mam2 Linux - Server 3 12-18-2009 03:53 PM
High load, high RAM usage and unresponsive VPS saeed22 Linux - Server 1 08-20-2009 11:58 AM
load average and cpu usage too high, why could i do? v_fone Linux - Newbie 5 07-02-2009 03:17 AM
high load average, low cpu usage ! jimmyjiang Red Hat 8 02-08-2008 12:28 AM
why high load, but no cpu usage? JustinHoMi Linux - General 6 01-11-2006 10:43 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 10:04 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration