-   Linux - Newbie (
-   -   High CPU load, but low CPU usage (high idle CPU) (

baffy 09-12-2011 01:17 PM

High CPU load, but low CPU usage (high idle CPU)

I'm relatively new to Linux. I moved from a DBA role to a role where I now combine DBA duties with Linux admin work and therefore will need help with the following Linux question:
I'm in a situation where my developers run a particular SQL about 200 millions times a day (very unnecessary), this SQL is run between 10pm and 2 am everyday. At the time when this SQL is being run, load average is very high and everything becomes slower on the database. I'm trying to get the developers to tune this process but they say it's going to take too much effort, but most importantly, they keep pointing to 'idle time' on the CPU which could be as high as 72% (idle) even when the load is 50 (on a sixteen CPU box).

My question: Since the DB becomes much slower when the load average is high, how do I explain that the 72% idle time is a little misleading and the load average is very important in this instance (rem that all other quesries/SQL run slowly during the period explained above)

PTrenholme 09-12-2011 01:40 PM

Have you looked at the I/O usage whilst the SQL task is running? Perhaps you could tune your system to use more "idle" RAM to cache the tables being accessed, or use a larger RAID array for the data base so the access could be distributed over many different disk drives. Have you considered high efficiency disk drive replacements? (e.g., RAM disks, bubble memory drives, etc.)

Have you asked the developers why they're running 200,000,000 queries a day in a production data base instead of a development copy of the data base? It seems like a very strange organizational structure that mixes development and production. (And where's QA in the mix? QA should, at a minimum, have tested any development work before it impacts a production system.)

chrism01 09-12-2011 06:28 PM

As with above (and your statement) 200M sounds definitely wrong. I'd also be surprised if its that difficult to fix; sounds more like laziness to me (& I've worked with RDBMSes for many yrs).
In any case, the DB is likely IO bound as mentioned above. check the process statuses.

You can use top to get page fault cnts or use iostat for more IO specific details.
Given the scenario described, I'd also have a look at the SQL code; they may not be using the correct indexes or there may not be an index that matches their needs.
How much RAM do you have; a large SGA may help to cache the data the they keep asking for.
If its a lot of large queries over many new rows, a re-write sounds good as you won't be able to take advantage of caching.

syg00 09-12-2011 06:47 PM

In Linux (high) loadavg has usually nothing to do with CPU%. (Almost) all the articles you see merely carry forward an old Unix mantra about loadavg. Linux defines it differently.

Has to be I/O waits - caused most probably by poor application design, but I'll leave that to the guys above as I don't "do" databases.
Run the following to determine what is probably contributing to the loadavg count - your mate might recognise the tasks listed. Post the output file here if you like.

top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D (I/O wait probably): "count}' > topsave.txt

baffy 09-13-2011 11:41 AM


Fantastic replies. Thanks to you all. Load averages vs CPU usage is now clearer to me. (I'll run the top script during the period this SQL is being run and see what comes up. Note that these SQLs are not as a result of developers testing scripts, but these have been put into production and come in via the app severs). Thanks all.

LittleMaster 03-13-2013 09:24 AM

First of all sorry for high jacking the thread.

I read the post of syg00 .In below post .I gone through few article all the things says increase of loadavg leads to high cpu utilization.

But above post of sygoo really suprise me .Sygoo can u share your knowledge, how could load does not reflect cpu utilization.

@syg00 .In Linux (high) loadavg has usually nothing to do with CPU%. (Almost) all the articles you see merely carry forward an old Unix mantra about loadavg. Linux defines it differently.

All times are GMT -5. The time now is 03:51 PM.