LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 10-20-2008, 10:35 AM   #1
DotHQ
Member
 
Registered: Mar 2006
Location: Ohio, USA
Distribution: Red Hat, Fedora, Knoppix,
Posts: 548

Rep: Reputation: 33
Server with high load average and no obvious reason.


I'm running a DB server (actually a number of DB servers). One of the servers has a load average of 12.00. Yesterday it was a load average of 11. Looking back to July the load average has constanly inched upward for no obvious reason.

I've suggested rebooting but it is a production server so that is not an easy alternative at this juncture.

Anyone else ever see load average go high while CPU's are 98% idle with no other indicators of what might be causing the load?

sar -q shows 1, 2 or 5 processes in the que but reports a load average of 12.00. Crazy.
Any ideas on where to look to solve this one?
 
Old 10-20-2008, 12:14 PM   #2
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
Are you seeing this load with any command you run to view the load averages or with sar only?
 
Old 10-20-2008, 01:09 PM   #3
DotHQ
Member
 
Registered: Mar 2006
Location: Ohio, USA
Distribution: Red Hat, Fedora, Knoppix,
Posts: 548

Original Poster
Rep: Reputation: 33
Yes with any command such as Top, sar & Cacti monitor all show high load average and low CPU, memory & disk stats.
 
Old 10-20-2008, 01:30 PM   #4
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
So what does cat /proc/loadavg tell you? I'd imagine it's a mistake in the proc file getting properly updated. How long as this system been running without a reboot? And is the /proc/loadavg file getting updated at all by looking at it's latest timestamp?
 
Old 10-20-2008, 01:46 PM   #5
DotHQ
Member
 
Registered: Mar 2006
Location: Ohio, USA
Distribution: Red Hat, Fedora, Knoppix,
Posts: 548

Original Poster
Rep: Reputation: 33
Good questions TK!!

Here are the answers:

[root@xxxxxxx sa]# cat /proc/loadavg
12.83 12.56 12.53 4/868 6932
[root@xxxxxxx sa]# ls -l /proc/loadavg
-r--r--r-- 1 root root 0 Oct 20 14:44 /proc/loadavg
[root@xxxxxxx sa]# uptime
14:45:02 up 95 days, 1:14, 2 users, load average: 12.95, 12.61, 12.55
[root@xxxxxxxx sa]#
 
Old 10-20-2008, 03:20 PM   #6
trickykid
LQ Guru
 
Registered: Jan 2001
Posts: 24,149

Rep: Reputation: 269Reputation: 269Reputation: 269
Try restarting services then you know are safe. The only other option would probably be to reboot to see if the problem comes back. Schedule some downtime since it's a production machine. I've seen this myself, a rather large load average that wasn't accurate, reboot fixed and I never saw it come back.
 
Old 10-20-2008, 04:41 PM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,120

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Quote:
Originally Posted by trickykid View Post
I've seen this myself, a rather large load average that wasn't accurate, reboot fixed and I never saw it come back.
What makes you think it wasn't accurate ???.
Loadavg (in Linux) is not just the runq - it also includes tasks in uninterruptible sleep. This is usually disk wait, but not necessarily. Poorly designed code will place threads in uniterruptible sleep and "forget" about them.
I use the following to track down anything like this - stick it in a loop in need.
Code:
top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'
 
Old 10-21-2008, 05:05 AM   #8
DotHQ
Member
 
Registered: Mar 2006
Location: Ohio, USA
Distribution: Red Hat, Fedora, Knoppix,
Posts: 548

Original Poster
Rep: Reputation: 33
Good input guys, Thanks!

I have seen a bogus load average before and rebooted and it cleared. So, I understand where TK is coming from.

TK what services would you restart?

I can schedule a down time but management thinks of that as sweeping it under the rug. If we have an issue we would like to find it rather than hide it only to have it rear it's head again after the reboot.
Since there is no reason found for the load I'm leaning to the reboot camp, but have agreed to look further and ask folks like you all if you've seen stuff like this. We run 20+ DB servers running an oracle database. These servers are in what oracle calls a RAC environment (much like clustered). So in this particular RAC I have three database servers running exactly the same code but only one of them shows the high load average symptom.

I ran the command you supplied Syg00. Thanks!!!!

Here is the output of that command:

[root@xxxxxxxx sa]# top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'
top - 05:56:09 up 95 days, 16:25, 2 users, load average: 12.05, 12.16, 12.20
Tasks: 588 total, 2 running, 584 sleeping, 0 stopped, 2 zombie
Cpu(s): 3.9% us, 0.8% sy, 0.0% ni, 92.8% id, 2.0% wa, 0.0% hi, 0.4% si
Mem: 16479084k total, 13289944k used, 3189140k free, 497220k buffers
Swap: 14647288k total, 416k used, 14646872k free, 11171240k cached

PID USER PR NI %CPU TIME+ %MEM VIRT RES SHR S COMMAND
Total status D:

Thoughts?
 
Old 10-21-2008, 05:15 AM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,120

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
You've messed with your columns in "top" (or I have ... ).
Change the script to ($11 == "D")
 
Old 10-22-2008, 08:00 AM   #10
DotHQ
Member
 
Registered: Mar 2006
Location: Ohio, USA
Distribution: Red Hat, Fedora, Knoppix,
Posts: 548

Original Poster
Rep: Reputation: 33
Okay, I tried it with the correct column plugged in.
Sorry for the delay in response ....I'm in a class ...

Here is the output:

]# top -b -n 1 | awk '{if (NR <=7) print; else if ($11 == "D") {print; count++} } END {print "Total status D: "count}'
top - 08:56:50 up 96 days, 19:25, 3 users, load average: 13.56, 13.87, 13.47
Tasks: 674 total, 4 running, 668 sleeping, 0 stopped, 2 zombie
Cpu(s): 3.9% us, 0.8% sy, 0.0% ni, 92.9% id, 2.0% wa, 0.0% hi, 0.4% si
Mem: 16479084k total, 13892488k used, 2586596k free, 500092k buffers
Swap: 14647288k total, 416k used, 14646872k free, 11190876k cached

PID USER PR NI %CPU TIME+ %MEM VIRT RES SHR S COMMAND
606 root 15 0 0 0:00.03 0.0 0 0 0 D scsi_eh_0
607 root 15 0 0 0:20.91 0.0 0 0 0 D usb-storage
772 root 18 0 0 0:00.00 0.0 2684 1160 1004 D IbmDup
3698 root 16 0 0 6:24.83 0.0 17424 3992 1544 D hald
4025 root 18 0 0 0:00.00 0.0 2684 1160 1004 D IbmDup
9462 root 18 0 0 0:00.00 0.0 2684 1160 1004 D IbmDup
22121 root 18 0 0 0:00.00 0.0 2684 1160 1004 D IbmDup
23496 root 18 0 0 0:00.00 0.0 2684 1160 1004 D IbmDup
25507 root 18 0 0 0:00.00 0.0 2684 1160 1004 D IbmDup
29496 root 18 0 0 0:00.00 0.0 2684 1160 1004 D IbmDup
29859 root 18 0 0 0:00.00 0.0 2684 1160 1004 D IbmDup
31556 root 18 0 0 0:00.00 0.0 2684 1160 1004 D IbmDup
Total status D: 12


I do not believe this explains the high load average ... notice the load average is now up to 13.
 
Old 10-22-2008, 04:23 PM   #11
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,120

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
On the contrary, it directly explains the load average. That status of "D" is uninterruptible sleep; loadavg = (runq + uninterruptable).
If you constantly have say 12 "D" tasks, how can the loadavg ever drop below around 13 or 14.

Better check where those IBMDup processes are being generated - must be a hell of a lot of them, look at the PIDs.

Edit: as this illustrates, an unusual loadavg isn't necessarily an indicator of a (performance) problem - at least under Linux.
Sure there's a problem, but it likely isn't directly impacting your ability to service your users. However, if it's a symptom of something else (like a flakey disk say), you'd do well to pay it some attention.

Last edited by syg00; 10-22-2008 at 04:49 PM. Reason: Musings
 
Old 10-24-2008, 08:45 AM   #12
DotHQ
Member
 
Registered: Mar 2006
Location: Ohio, USA
Distribution: Red Hat, Fedora, Knoppix,
Posts: 548

Original Poster
Rep: Reputation: 33
Thank you very much Syg00! You are correct in saying that it does not effect system performance over all, but we were concerned and wondering if it was bogus.
Looks as if we do indeed have an issue. I sure appreciate your help!!!!
 
Old 10-24-2008, 09:11 AM   #13
DotHQ
Member
 
Registered: Mar 2006
Location: Ohio, USA
Distribution: Red Hat, Fedora, Knoppix,
Posts: 548

Original Poster
Rep: Reputation: 33
I have since found out that those processes are part of Dell Open Manage. Duh!!!!!! At first I thought they were part of the Oracle DB we have running on that server.
 
Old 03-06-2009, 02:13 AM   #14
permalac
Member
 
Registered: Jul 2007
Location: Barcelona
Posts: 115

Rep: Reputation: 16
Sorry. This post was for another thread.

Last edited by permalac; 03-06-2009 at 03:54 AM.
 
Old 03-06-2009, 03:03 AM   #15
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,120

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
@permalac, post this in your thread - I referenced this thread, but it makes no sense (in either thread) to post here.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Load average stay as high as around 1.00 lawrence_lee_lee Linux - Software 2 09-10-2008 01:22 AM
high load average, low cpu usage ! jimmyjiang Red Hat 8 02-08-2008 12:28 AM
Why is my load average so high when comp. is idle? BrianK Linux - General 1 11-18-2005 12:25 AM
Qmail problems - CPU load average rising too high xbaez Linux - Software 0 11-16-2005 12:23 PM
RH8 Load Average High - No CPU Utilization jj91709 Red Hat 2 08-29-2004 12:28 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 03:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration