LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 12-07-2009, 08:23 PM   #1
beardo265
LQ Newbie
 
Registered: Dec 2009
Posts: 1

Rep: Reputation: 0
please help: unexplained spike in load average & drive usage


Hi, I'm looking for some help. I'm working with a friend on a fedora 10 server running primarily several fairly high traffic apache/php/mysql sites.

Recently a 2nd hard drive was added for some additional storage (this problem may or may not be related to this addition)

Currently, we keep seeing big jumps in load average, without any obvious reason (ie: high cpu process, or something). What we have noticed, using iostat, is right before the jump in load average, both drives %util jump to 100%, and the await/svctm jump way up for several seconds. I haven't been able to track down what could cause this. It has lately been happening every couple minutes, usually giving the load average time to settle down (it will run around 1-2 if this is left alone), but at times if this happens several times in a row, the server can almost grind to a halt. There's lots of memory & cpu available, and we're not swapping.

Is there anything else I can look at, or possible causes? I've been trying to track it down for several days now with no luck.

Showing below, load averages and iostat output for several seconds where this took place.

Now.. I'm no expert at this type of thing, so be nice. :-)

thanks very much for your time! any thoughts/comments/help would be greatly appreciated.


load average: 6.65, 6.63, 6.66 08:26:50
load average: 6.65, 6.63, 6.66 08:26:51
load average: 6.65, 6.63, 6.66 08:26:52
load average: 6.65, 6.63, 6.66 08:26:53
load average: 6.65, 6.63, 6.66 08:26:57
load average: 13.24, 7.99, 7.10 08:26:58
load average: 13.24, 7.99, 7.10 08:26:59

Code:
Time: 08:26:50 PM
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda              23.00     0.00   26.00    0.00   912.00     0.00    35.08     0.17    6.65   4.19  10.90
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3             23.00     0.00   26.00    0.00   912.00     0.00    35.08     0.17    6.65   4.19  10.90
sdb               0.00    60.00    1.00    2.00     8.00   496.00   168.00     0.33  110.33 110.33  33.10
sdb1              0.00    60.00    1.00    2.00     8.00   496.00   168.00     0.33  110.33 110.33  33.10

Time: 08:26:51 PM
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda              23.00     0.00   12.00    0.00   480.00     0.00    40.00     5.42   45.83  53.50  64.20
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3             23.00     0.00   12.00    0.00   480.00     0.00    40.00     5.42   45.83  53.50  64.20
sdb              24.00     0.00    2.00    0.00   112.00     0.00    56.00     2.22  242.00 326.50  65.30
sdb1             24.00     0.00    2.00    0.00   112.00     0.00    56.00     2.22  242.00 326.50  65.30

Time: 08:26:52 PM
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda              26.00     0.00    1.00    0.00   136.00     0.00   136.00    18.73  912.00 1000.00 100.00
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3             26.00     0.00    1.00    0.00   136.00     0.00   136.00    18.73  912.00 1000.00 100.00
sdb               0.00     0.00    1.00    0.00    88.00     0.00    88.00     4.59  686.00 1000.00 100.00
sdb1              0.00     0.00    1.00    0.00    88.00     0.00    88.00     4.59  686.00 1000.00 100.00

Time: 08:26:53 PM
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    2.00    0.00   112.00     0.00    56.00    29.34 2084.00 500.00 100.00
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3              0.00     0.00    2.00    0.00   112.00     0.00    56.00    29.34 2084.00 500.00 100.00
sdb               0.00     0.00    2.00    0.00    56.00     0.00    28.00     4.06 1927.00 500.00 100.00
sdb1              0.00     0.00    2.00    0.00    56.00     0.00    28.00     4.06 1927.00 500.00 100.00

Time: 08:26:54 PM
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    1.00    0.00     8.00     0.00     8.00    32.12 2719.00 1001.00 100.10
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3              0.00     0.00    1.00    0.00     8.00     0.00     8.00    32.12 2719.00 1001.00 100.10
sdb              12.00     0.00    1.00    0.00     8.00     0.00     8.00     2.61 2633.00 1000.00 100.00
sdb1             12.00     0.00    1.00    0.00     8.00     0.00     8.00     2.61 2633.00 1000.00 100.00

Time: 08:26:58 PM
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               9.18   113.61   74.37   22.78  1746.84  1091.14    29.21    24.62  505.05  10.17  98.77
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3              9.18   113.61   74.37   22.78  1746.84  1091.14    29.21    24.62  505.05  10.17  98.77
sdb               7.59     6.33    6.65    0.63   326.58    55.70    52.52     2.58  608.30  94.65  68.89
sdb1              7.59     6.33    6.65    0.63   326.58    55.70    52.52     2.58  608.30  94.65  68.89

Time: 08:26:59 PM
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   30.00    4.00   792.00    32.00    24.24     0.39   11.53   6.12  20.80
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3              0.00     0.00   30.00    4.00   792.00    32.00    24.24     0.39   11.53   6.12  20.80
sdb              12.00     0.00    5.00    0.00   168.00     0.00    33.60     0.04    8.80   8.80   4.40
sdb1             12.00     0.00    5.00    0.00   168.00     0.00    33.60     0.04    8.80   8.80   4.40
 
Old 12-07-2009, 10:05 PM   #2
flakblas
Member
 
Registered: Jun 2009
Location: Maryland
Distribution: Fedora, CentOS, RHEL, Ubuntu
Posts: 41

Rep: Reputation: 3
I think I just posted a response in LinuxForums to this lol. Anyway, for the sake of this thread's completeness here's my reply:

Quote:
Just curious, when's the last time you've taken an outage and fsck'd all your partitions? Also, check out the output of smartctl on our drives (yum install smartmontools).

Code:
yum install smartmontools
Code:
smartctl -A /dev/sdx
Then from a recovery shell (no partitions mounted):
Code:
for i in `ls /dev/sd*`; do fsck -fy $i; done
That last one is a little ugly and will throw a few errors but it's quick and easy and it will fsck every sd* partition. Please post the output from these commands. I know an outage sucks but if our partitions are healthy and not too big it shouldn't take long. Smartctl doesn't need an outage though.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
load average and cpu usage too high, why could i do? v_fone Linux - Newbie 5 07-02-2009 03:17 AM
high load average, low cpu usage ! jimmyjiang Red Hat 8 02-08-2008 12:28 AM
Spike in load for dB server Swakoo Linux - General 1 09-07-2007 12:52 PM
Unexplained Disk and CPU Usage Soon After Startup spaaarky21 Fedora 2 11-20-2005 11:34 AM
Load average stuck at 7.00, CPU usage ~ 0.1%, what gives? BrianK Linux - General 4 02-16-2004 08:45 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 09:06 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration