Very high CPU load, but nothing significant in top
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Very high CPU load, but nothing significant in top
I'm running Ubuntu Linux 12.04.1, with VirtualMin 4.08.gpl GPL and 2 CPU cores.
Pretty much all the time for the last few weeks, it's been running at well above load average of 5, usually up closer to 10, sometimes reaching 20.
Right now, CPU load averages: 9.20 (1 min) 8.20 (5 mins) 7.81 (15 mins)
At the same time, VirtualMin returns:
Virtual Memory: 996 MB total, 15.44 MB used Real Memory: 3.80 GB total, 972.43 MB used Local disk space: 915.94 GB total, 116.03 GB used
Have restarted (shutdown -rf now) the machine a few times and sure enough sooner or later we're back up with high CPU loads.
Running top (or htop) returns nothing significant at all running at high CPU - in fact watching it for a few minutes and the highest item would maybe high 3% CPU.
The load avg tells you about the jobs in a runnable state, not whether they are cpu bound (a different qn).
A high %wa means waiting; probably for disk and/or DB access eg long running SQL queries are typical.
Check top cmd and look for processes in 'S' or (worse) 'D' state
Is this a virtual instance ?. What kernel level are you running ?.
Have a look at your primary "disk" - probably /dev/sda - with sar or similar. The flush (kernel) tasks are just that, they flush pending I/O - they are started as needed hence the PID changing. You disk isn't responding by the looks of it.
Sorry forgive my ignorance, I'm a bit lost here...
Is this a virtual instance ?.
Hmmz, it's a physical machine, running VirtualMin for a heap of VirtualHosts.
What kernel level are you running ?.
This help...? Kernel and CPU Linux 3.2.0-63-generic-pae on i686
Have a look at your primary "disk" - probably /dev/sda - with sar or similar.
What do I need to look at?
The flush (kernel) tasks are just that, they flush pending I/O - they are started as needed hence the PID changing. You disk isn't responding by the looks of it.
Disk is responding ok, we can (and do) access it all the time as we have PCs mapping the home directory as network drives, as we use it for a development server - i.e. we work directly on the files on the server / HDD. Sometimes it hangs a bit when accessing files, hence me starting to look into the high load issues.
You disk isn't responding by the looks of it.
Disk is responding ok, we can (and do) access it all the time as we have PCs mapping the home directory as network drives, as we use it for a development server - i.e. we work directly on the files on the server / HDD. Sometimes it hangs a bit when accessing files, hence me starting to look into the high load issues.
Sorry, poorly worded by me. I meant the disk isn't reponding appropriately (in computer metrics, not human), not that it isn't responding at all.
The sysstat package has iostat as a component - look at the manpage(s) for help, but you want to know the avg read/write rates and response times for each. There are other more finely sampled tools available - collectl for instance. The mere mention of it will likely prod the author to appear with helpful hints. Always good to get knowledgable input.
Some thoughts (without a lot of hard data to back them up):
- all those status "D" tasks are probably waiting on disk I/O - and count directly to loadavg, as well as %wa.
- it looks like you only have one (active) physical disk. That's a bottleneck - spread your I/O load over more disks.
- check SMART data for the disk to ensure it isn't starting to fail. As well as software like sar/collectl/whatever.
- don't run updatedb when anything else is hitting the disk if possible. 02:00 is usually ok for non-worldwide access.
- 32-bit PAE kernels are so last century. Get onto 64-bit hardware (you may be already) and current 64-bit kernel if possible.
basically from here it's a matter of checking all the data.
I ran the short test on SMART tools and it seems to get stuck with 10% remaining.
Whilst it doesn't report progress, it indicates 2 minute run time, after 10 minutes, it doesn't report any results.
Then I made it to run the short test again, and then the original test appears in the log as 'aborted' (presumably because I started a new one), aborted with 10% remaining.
Have done this three times, and all seem to hang at 10% remaining:
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Aborted by host 10% 7259 -
# 2 Short offline Aborted by host 10% 7259 -
# 3 Short offline Aborted by host 10% 7259 -
Is this a bad sign?!
I could run a long test overnight...
And I'm currently shopping to potentially replace it with HP ProLiant MicroServer Gen8 as a result of all this...
I think the CPU iowait or just wa in top terms is one of the most confusing metrics there is. In sort, all it tells you is there is some I/O going on somewhere and the cpu isn't busy, it's spending most of it's idle time waiting for I/O.
Another way to look at this is on a completely idle system, iowait should be at or close to zero. Now fire up a process that creates or maybe copies a large file while watching it with collectl, had to get that in for syg00. Since this is almost exclusively I/O bound you know it won't use much cpu time, yet iowait goes to a very high number, at least on the cpu doing the I/O.
If you were to look at a busy nfs server, it typically has a high load average because some many processes are active, though waiting on I/O, and also shows a high iowait.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.