LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 07-14-2010, 07:34 AM   #1
ordaolmayanadam
LQ Newbie
 
Registered: Jul 2010
Posts: 6

Rep: Reputation: 0
High Load Average, Low CPU, Low IO Wait


Hi,
I need help about the strange output of the top command
in my debian server. We see high load average with relatively
low cpu usage. Also iowait seems normal.
I think this is something about Java socket threading but
i don't know how to discover and fix exactly what is causing
the issue.

We are running a Java socket server on this Debian machine.
It's a Dual-Core AMD Opteron(tm) Processor 1210 processor
with 8GB RAM.

and java -version output:
java version "1.5.0_16"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b02)
Java HotSpot(TM) Server VM (build 1.5.0_16-b02, mixed mode)

and the top output:
top - 15:18:06 up 125 days, 2:41, 4 users, load average: 17.94, 15.81, 16.38
Tasks: 554 total, 2 running, 551 sleeping, 0 stopped, 1 zombie
Cpu(s): 8.9%us, 2.6%sy, 0.0%ni, 87.1%id, 0.0%wa, 0.7%hi, 0.7%si, 0.0%st
Mem: 8315176k total, 6809904k used, 1505272k free, 484384k buffers
Swap: 1879596k total, 0k used, 1879596k free, 3920524k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16654 root 16 0 1600m 287m 11m S 9 3.5 9:52.00 java
19461 www-data 15 0 36496 7464 3372 S 1 0.1 0:00.10 apache2
14302 root 16 0 2824 1448 864 S 1 0.0 0:57.34 top
4386 www-data 15 0 36752 6832 2744 S 1 0.1 0:00.19 apache2
16244 www-data 15 0 36752 6832 2744 S 1 0.1 0:00.14 apache2
31830 www-data 20 0 36752 6848 2760 S 1 0.1 0:00.17 apache2
21110 www-data 18 0 36752 6688 2736 S 1 0.1 0:00.05 apache2
21451 www-data 15 0 36516 6488 2736 S 1 0.1 0:00.03 apache2
23991 root 15 0 2696 1420 860 R 1 0.0 0:00.12 top

Any help will be appreciated.. thanks in advance
 
Old 07-14-2010, 07:51 AM   #2
kbp
Senior Member
 
Registered: Aug 2009
Posts: 3,790

Rep: Reputation: 653Reputation: 653Reputation: 653Reputation: 653Reputation: 653Reputation: 653
I don't think there's too much to worry about, there seems to be quite a few idle processes so you could possibly tune things a little. Maybe httpd is configured to spawn lots of children ... ?

cheer
 
Old 07-14-2010, 07:52 AM   #3
yooy
Senior Member
 
Registered: Dec 2009
Posts: 1,387

Rep: Reputation: 174Reputation: 174
High bandwith usage alone won't produce great processor power.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
update:it means that just dl/uploading without use of hard drive wont use much of cpu
example:router can take a lot of traffic on its slow processor

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Last edited by yooy; 07-15-2010 at 11:52 AM.
 
Old 07-14-2010, 07:55 AM   #4
ordaolmayanadam
LQ Newbie
 
Registered: Jul 2010
Posts: 6

Original Poster
Rep: Reputation: 0
When we shut down the java socket server, load average decreases to 2.0-3.0
so i thought it's about java. also we have another server (higher traffic)
with same apache configuration and it seems fine.
 
Old 07-14-2010, 07:56 AM   #5
ordaolmayanadam
LQ Newbie
 
Registered: Jul 2010
Posts: 6

Original Poster
Rep: Reputation: 0
yooy, so it is about network latency? is there a way to measure it?
 
Old 07-14-2010, 07:58 AM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
yooy, what the hell is that supposed to mean ?.
OP, try this and post the output
Code:
top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'
Use code tags when posting output ...

Last edited by syg00; 07-14-2010 at 08:00 AM. Reason: code tags comment
 
Old 07-14-2010, 08:02 AM   #7
ordaolmayanadam
LQ Newbie
 
Registered: Jul 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by syg00 View Post
yooy, what the hell is that supposed to mean ?.
OP, try this and post the output
Code:
top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'
Use code tags when posting output ...
This is the output:

Code:
top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'
top - 16:00:38 up 125 days,  3:23,  4 users,  load average: 20.40, 27.84, 26.75
Tasks: 328 total,   3 running, 325 sleeping,   0 stopped,   0 zombie
Cpu(s): 11.0%us,  0.6%sy,  0.0%ni, 88.1%id,  0.0%wa,  0.2%hi,  0.2%si,  0.0%st
Mem:   8315176k total,  6428408k used,  1886768k free,   484464k buffers
Swap:  1879596k total,        0k used,  1879596k free,  3980404k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
Total status D:
 
Old 07-14-2010, 08:10 AM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
O.K., I'm confused - that should list all uninteruptible sleep tasks (which contribute to loadavg). I had a look at how loadavg is accumulated a while back - seemed straightforward. What kernel are you on ("uname -a") ?.
 
Old 07-14-2010, 08:14 AM   #9
ordaolmayanadam
LQ Newbie
 
Registered: Jul 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Kernel version: Linux 2.6.18-5-686-bigmem #1 SMP Sat Dec 1 23:58:00 UTC 2007 i686 GNU/Linux
 
Old 07-14-2010, 08:18 AM   #10
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Keep trying that - in a loop maybe, redirecting to a file. You might have a lot of short-lived processes. Dunno at this point.
 
Old 07-14-2010, 08:33 AM   #11
ordaolmayanadam
LQ Newbie
 
Registered: Jul 2010
Posts: 6

Original Poster
Rep: Reputation: 0
I tried few times and once catch something like this:

Code:
top - 16:31:07 up 125 days,  3:54,  4 users,  load average: 27.03, 17.73, 18.92
Tasks: 328 total,   1 running, 327 sleeping,   0 stopped,   0 zombie
Cpu(s): 11.0%us,  0.6%sy,  0.0%ni, 88.1%id,  0.0%wa,  0.2%hi,  0.2%si,  0.0%st
Mem:   8315176k total,  6171800k used,  2143376k free,   484916k buffers
Swap:  1879596k total,        0k used,  1879596k free,  3721952k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
31763 www-data  15   0 36736 6836 2756 D    2  0.1   0:00.10 apache2
 1105 root      10  -5     0    0    0 D    0  0.0   2:39.10 kjournald
20325 www-data  15   0 36516 6592 2760 D    0  0.1   0:00.10 apache2
Total status D: 3
 
Old 07-15-2010, 06:06 AM   #12
markseger
Member
 
Registered: Jul 2003
Posts: 244

Rep: Reputation: 26
That's one of the major problems with a tool like top - no history. You see what you see and that's it. PLUS you only see what top wants you to see.

You could always try out collect, either interactively or as a daemon. Dy default in daemon mode it samples everything but processes every 10 seconds and processes every 60 - extra overhead.

BUT if you really want to see what's happening over a relatively short period of time edit /etc/collectl.conf and add "-i1:1" to the line 'DaemonCommands' and that will monitor everything once a second.


"service collectl start" and let it run for a few minutes and then "service collectl stop". now play back the data it collected - too many options to list - but if you run:

collectl -p /var/log/collectl/filename -sxxx -oT

you'll see data for the subsystems specified with 'xxx' along with time stamps. 'c' will show CPU, 'd' disk, etc. "collectl --showsbsys" for a complete listing.

if you want to look at your top processes over time, which is what got me started, you can:

collectl -p filename --top

and you'll see the top 10 processes for every second!!! if you want to see more or less of them "collectl -x" and see the options for --top.

if you "collectl -p filename -sc --verbose -oT" you'll see the load averages along with the number of running processes AND the number of process creations/sec if that is a concern.

for more, just go to soureforge and look at the documentation http://collectl.sourceforge.net

have fun...

-mark
 
Old 07-15-2010, 06:20 AM   #13
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Who is this guy ???.
Seems to want to push collectl pretty hard.
.
.
.
.
Hey Mark, back again ...

I too like his little toy - unfortunately not everyone seems to want to use it.
 
Old 07-15-2010, 06:55 AM   #14
markseger
Member
 
Registered: Jul 2003
Posts: 244

Rep: Reputation: 26
hey back - yes I know you're a fan. I've seen previous posts by you recommending it. I do realize not everyone is on board with it but I also realize not everybody is convinced monitoring is important. I was talking to someone the other day who was a sar user. Nothing wrong with sar, just that people use a monitoring interval that's much too high. I suggested if that they at least drop the monitoring frequency down to 10 seconds as 10 minutes is pretty worthless. They said their vendor told them not to go below a minute and I told them their vendor is wrong! If collectl generates less that 0.1% cpu load running at 10 second monitoring and it's written in perl, SAR had got to have a lighter footprint. But some people just don't get it.

I would wonder why people don't use collectl:
- they don't believe in proactive monitoring
- they're happy with what the have
- they're scared of it

If the first, they're flat out wrong. If the second, that's fine as long as they monitor frequently. If the third I can help if they ask.

I believe EVERYONE should continuously monitor their systems at 5-15 second frequencies. There are a very few situations where I've seen monitoring have on impact on performance - applications that run at 100% cpuloads and are fine-grained parallel jobs running on 1000 cores or more. If you don't know what a find-grained parallel job is, you don't have to worry about collectl! First of all not many people run parallel jobs, let along fine-grained ones, and even less run on 1K cores or more. Even those who do run on that many cores still find a slight performance hit is worth it be able to have the data available if something goes wrong.

-mark
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
High load average, low cpu usage on CentOS 5.4 64-bit lancherider Linux - Server 4 06-01-2010 04:08 PM
load average and cpu usage too high, why could i do? v_fone Linux - Newbie 5 07-02-2009 03:17 AM
high load average, low cpu usage ! jimmyjiang Red Hat 8 02-08-2008 12:28 AM
CPU load high, top processes very low? Thinking Linux - Software 12 03-19-2007 12:59 AM
High Load, Low CPU/RAM/iowait ? newlinuxnewbie Linux - Server 1 09-22-2006 09:24 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 01:49 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration