LinuxQuestions.org - Disk performance causing high Load Avg?

- Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)

- - Disk performance causing high Load Avg? (https://www.linuxquestions.org/questions/linux-hardware-18/disk-performance-causing-high-load-avg-422861/)

Disk performance causing high Load Avg?

Hi All,
I have a mail/dns/web server that has been having some perfomance issues. Little things will make the load average spike, the system becomes slow and my mail server starts erroring out and won't recover itself.

The latest thing that seems to be causing the trouble is the nightly updatedb. CPU util seems fine, mem seems fine, so I'm assuming it has something to do with disk IO. Here are some outputs, normal load avg. is about 2. Usually when this is running I'll get a load of about 15, for some reason, it only went up to about 11 this time

Code:

top - 12:21:43 up 28 days,  5:31,  2 users,  load average: 10.43, 6.02, 3.59

Tasks: 275 total,  1 running, 274 sleeping,  0 stopped,  0 zombie

Cpu(s):  1.2% user,  1.6% system,  0.0% nice,  97.1% idle

Mem:  2068212k total,  2043964k used,    24248k free,  133152k buffers

Swap:  2097136k total,    46748k used,  2050388k free,  991780k cached



  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                

20519 root      19  0  1048 1048  704 R  9.2  0.1  0:00.11 top                                                                    

  14 root      10  0    0    0    0 D  1.8  0.0  99:30.67 kjournald                                                              

 3006 root      10  0 54672  52m 2832 S  1.8  2.6 752:43.93 named                                                                  

    1 root      8  0    72  64  44 S  0.0  0.0  1:18.72 init                                                                  

    2 root      9  0    0    0    0 S  0.0  0.0  0:00.22 keventd                                                                

    3 root      19  19    0    0    0 S  0.0  0.0  0:02.98 ksoftirqd_CPU0                                                        

    4 root      19  19    0    0    0 S  0.0  0.0  0:00.29 ksoftirqd_CPU1                                                        

    5 root      19  19    0    0    0 S  0.0  0.0  0:00.41 ksoftirqd_CPU2                                                        

    6 root      19  19    0    0    0 S  0.0  0.0  0:00.30 ksoftirqd_CPU3                                                        

    7 root      9  0    0    0    0 S  0.0  0.0  34:49.12 kswapd                                                                

    8 root      9  0    0    0    0 S  0.0  0.0  0:06.06 bdflush                                                                

    9 root      9  0    0    0    0 S  0.0  0.0  2:08.59 kupdated                                                              

  13 root      -1 -20    0    0    0 S  0.0  0.0  0:00.00 mdrecoveryd                                                            

  46 root      9  0    0    0    0 S  0.0  0.0  0:00.00 ahc_dv_0                                                              

  47 root      9  0    0    0    0 S  0.0  0.0  0:00.00 scsi_eh_1                                                              

  71 root      9  0  512  508  432 D  0.0  0.0 103:38.27 syslogd                                                                

  74 root      9  0  376  364  364 S  0.0  0.0  0:00.03 klogd                                                                  

  184 root      7  -4  360  304  304 S  0.0  0.0  0:00.03 udevd                                                                  

 2970 root      9  0    0    0    0 S  0.0  0.0  0:00.00 khubd                                                                  

 2992 bin        9  0  468  376  376 S  0.0  0.0  0:00.00 rpc.portmap                                                            

 2998 root      9  0  404  344  344 S  0.0  0.0  0:00.00 inetd                                                                  

 3002 root      9  0  1240 1104 1016 S  0.0  0.1  0:05.92 sshd                                                                  

 3017 root      8  0  544  536  496 S  0.0  0.0  0:09.54 crond                                                                  

 3019 daemon    9  0  580  552  516 S  0.0  0.0  0:00.71 atd                                                                    

 3022 root      9  0  656  616  584 S  0.0  0.0  0:01.02 saslauthd                                                              

 3023 root      9  0  652  612  580 S  0.0  0.0  0:01.02 saslauthd                                                              

 3024 root      9  0  652  612  580 S  0.0  0.0  0:01.21 saslauthd                                                              

 3025 root      9  0  652  612  580 S  0.0  0.0  0:01.07 saslauthd                                                              

 3026 root      9  0  652  612  580 S  0.0  0.0  0:01.05 saslauthd                                                              

 3088 nobody    9  0  740  624  512 S  0.0  0.0  0:09.69 in.identd                                                              

 3100 nobody    6  0  740  624  512 S  0.0  0.0  0:38.11 in.identd                                                              

 3101 nobody    9  0  740  624  512 S  0.0  0.0 292:47.71 in.identd                                                              

 3102 nobody    9  0  740  624  512 S  0.0  0.0 296:59.63 in.identd                                                              

 3103 nobody    9  0  740  624  512 S  0.0  0.0 296:49.60 in.identd                                                              

 3104 nobody    9  0  740  624  512 S  0.0  0.0 294:40.15 in.identd                                                              

 3105 nobody    9  0  740  624  512 S  0.0  0.0  0:08.33 in.identd                                                              

 3117 root      9  0  8836  880  804 S  0.0  0.0  1:32.99 httpd                                                                  

 3145 root      8  0  1324  704  704 S  0.0  0.0  0:00.42 smbd                                                                  

 3149 root      9  0  1420  808  808 S  0.0  0.0  0:00.00 smbd                                                                  

 3152 root      9  0  812  592  424 S  0.0  0.0  5:10.57 nmbd                                                                  

 3154 root      9  0    96  64  44 S  0.0  0.0  0:58.12 gpm                                                                    

 3163 root      9  0  684  512  512 S  0.0  0.0  0:00.02 mysqld_safe                                                            

 3189 mysql      9  0 22132 9468 6716 S  0.0  0.5  0:35.13 mysqld                                                                

 3208 mysql      9  0 22132 9468 6716 S  0.0  0.5  0:53.51 mysqld                                                                

 3209 mysql      9  0 22132 9468 6716 S  0.0  0.5  0:00.00 mysqld                                                                

 3210 mysql      9  0 22132 9468 6716 S  0.0  0.5  0:00.00 mysqld

Running Kernel 2.4.31 on Slackware. Running RAID5 on a mylex acceleraid 352.

Code:

  

***** DAC960 RAID Driver Version 2.4.11 of 11 October 2001 *****

Copyright 1998-2001 by Leonard N. Zubkoff <lnz@dandelion.com>

Configuring Mylex AcceleRAID 352 PCI RAID Controller

  Firmware Version: 6.00-15, Channels: 2, Memory Size: 32MB

  PCI Bus: 2, Device: 1, Function: 0, I/O Address: Unassigned

  PCI Address: 0xFC000000 mapped at 0xF898E000, IRQ Channel: 24

  Controller Queue Depth: 512, Maximum Blocks per Command: 2048

  Driver Queue Depth: 511, Scatter/Gather Limit: 128 of 257 Segments

  Physical Devices:

    0:0  Vendor: IBM      Model: DDYS-T36950N      Revision: S80D

        Asynchronous

        Serial Number:        4FY0X646

    0:4  Vendor: IBM      Model: IC35L036UWD210-0  Revision: S5BS

        Wide Synchronous at 40 MB/sec

        Serial Number:        KQZX2556

        Disk Status: Online, 71651328 blocks

    0:7  Vendor: MYLEX    Model: AcceleRAID 352    Revision: 0600

        Wide Synchronous at 160 MB/sec

        Serial Number:  

    0:8  Vendor: IBM      Model: IC35L036UWD210-0  Revision: S5BS

        Wide Synchronous at 40 MB/sec

        Serial Number:        52Y0T615

        Disk Status: Online, 71651328 blocks

    0:12 Vendor: IBM      Model: IC35L036UWD210-0  Revision: S5BS

        Wide Synchronous at 40 MB/sec

        Serial Number:        52Y0Y219

        Disk Status: Online, 71651328 blocks

    1:4  Vendor: IBM      Model: IC35L036UWD210-0  Revision: S5BS

        Wide Synchronous at 40 MB/sec

        Serial Number:        KQZX2544

        Disk Status: Standby, 71651328 blocks

    1:7  Vendor: MYLEX    Model: AcceleRAID 352    Revision: 0600

        Wide Synchronous at 160 MB/sec

        Serial Number:  

    1:8  Vendor: IBM      Model: IC35L036UWD210-0  Revision: S5BS

        Wide Synchronous at 40 MB/sec

        Serial Number:        KQZX2484

        Disk Status: Online, 71651328 blocks

    1:12 Vendor: IBM      Model: DDYS-T36950N      Revision: S80D

        Wide Synchronous at 40 MB/sec

        Serial Number:        4FY0Z227

        Disk Status: Online, 71651328 blocks

  Logical Drives:

    /dev/rd/c0d0: RAID-5, Online, 286605312 blocks

                  Logical Device Initialized, BIOS Geometry: 255/63

                  Stripe Size: 64KB, Segment Size: 8KB

                  Read Cache Disabled, Write Cache Disabled

  No Rebuild or Consistency Check in Progress

Truthfully, I'm not even sure what utilites to use to check out IO performance. A lot of what I found through google isn't available on this Slackware machine.

Got any advice on where I can start to troubleshoot the performance issues?

Thanks.

The first thing I see is memory usage being topped out. What is running that is using 2gb of RAM? You are entering swap, which is typically a system killer.

I thought I was doing ok memory wise, since a lot seems to be cached. Cached memory is still available for system use correct?

Code:

            total      used      free    shared    buffers    cached

Mem:      2068212    1963852    104360          0    178788    918656

-/+ buffers/cache:    866408    1201804

Swap:      2097136      46748    2050388

Here is a top output sorted by % of memory, not sure how to answer

Quote:

What is running that is using 2gb of RAM?

, other than this.

Code:

top - 13:58:08 up 28 days,  7:07,  2 users,  load average: 2.52, 2.26, 2.57

Tasks: 210 total,  2 running, 207 sleeping,  0 stopped,  1 zombie

Cpu(s):  0.4% user,  8.5% system,  0.0% nice,  91.1% idle

Mem:  2068212k total,  1974640k used,    93572k free,  178852k buffers

Swap:  2097136k total,    46748k used,  2050388k free,  916188k cached



  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                

 3006 root      9  0 54672  52m 2848 S  1.7  2.6 754:39.38 named                                                                  

11342 root      9  0 22416  21m  21m S  0.0  1.1  0:00.01 sendmail                                                              

 9109 root      8  0 22348  21m  21m S  0.0  1.1  36:40.58 sendmail                                                              

16272 web        9  0 22360  19m 7540 S  0.0  1.0  0:15.75 httpd                                                                  

27107 web        9  0 22356  19m 7404 S  0.0  1.0  0:22.51 httpd                                                                  

16258 web        9  0 22288  19m 7524 S  0.0  1.0  0:16.30 httpd                                                                  

15261 web        9  0 22168  19m 7432 S  0.0  1.0  0:21.29 httpd                                                                  

15411 web        9  0 22100  19m 7388 S  0.0  0.9  0:19.79 httpd                                                                  

26578 web        9  0 22116  19m 7344 S  0.0  0.9  0:10.69 httpd                                                                  

15408 web        9  0 21900  18m 7384 S  0.0  0.9  0:16.77 httpd                                                                  

 9752 web        9  0 19452  16m 6728 S  0.0  0.8  0:00.26 httpd                                                                  

 1542 root      9  0 12296  12m  10m R  0.0  0.6  0:10.10 kdm_greet                                                              

 9751 web        9  0 17708  11m 4248 S  0.0  0.6  0:00.54 httpd                                                                  

 3270 root      9  0 23344  11m 4880 S  0.0  0.5  54:50.14 X                                                                      

 3189 mysql      9  0 22132 9600 6848 S  0.0  0.5  0:35.20 mysqld                                                                

 3208 mysql      9  0 22132 9600 6848 S  0.0  0.5  45:56.85 mysqld                                                                

 3209 mysql      9  0 22132 9600 6848 S  0.0  0.5  0:00.00 mysqld                                                                

 3210 mysql      9  0 22132 9600 6848 S  0.0  0.5  0:00.00 mysqld                                                                

 3211 mysql      9  0 22132 9600 6848 S  0.0  0.5  0:00.00 mysqld                                                                

 3212 mysql      9  0 22132 9600 6848 S  0.0  0.5  0:00.00 mysqld                                                                

 3219 mysql      9  0 22132 9600 6848 S  0.0  0.5  9:19.45 mysqld                                                                

 3220 mysql      9  0 22132 9600 6848 S  0.0  0.5  7:06.52 mysqld                                                                

 3221 mysql      9  0 22132 9600 6848 S  0.0  0.5  0:00.00 mysqld                                                                

 3222 mysql      9  0 22132 9600 6848 S  0.0  0.5  0:46.13 mysqld                                                                

16384 mysql      9  0 22132 9600 6848 S  0.0  0.5  0:06.66 mysqld                                                                

29303 root      9  0  3832 3824 2872 S  0.0  0.2  0:00.72 sendmail                                                              

32600 root      9  0  3784 3776 2892 S  0.0  0.2  0:00.55 sendmail                                                              

26244 root      9  0  3760 3752 2884 S  0.0  0.2  0:00.45 sendmail                                                              

27370 root      9  0  3736 3728 2856 S  0.0  0.2  0:00.54 sendmail                                                              

 3406 root      9  0  3724 3716 2860 S  0.0  0.2  0:00.30 sendmail                                                              

13685 root      9  0  3700 3692 2868 S  0.0  0.2  0:00.30 sendmail                                                              

22872 root      9  0  3696 3688 2864 S  0.0  0.2  0:00.24 sendmail                                                              

25809 root      9  0  3644 3636 2864 S  0.0  0.2  0:00.12 sendmail                                                              

28986 root      9  0  3612 3604 2860 S  0.0  0.2  0:00.19 sendmail                                                              

11361 root      9  0  3604 3596 2768 S  0.0  0.2  0:00.54 sendmail                                                              

 1395 root      9  0  3600 3592 2852 S  0.0  0.2  0:00.16 sendmail                                                              

14989 root      9  0  3592 3584 2768 S  0.0  0.2  0:00.66 sendmail                                                              

22083 root      9  0  3568 3560 2776 S  0.0  0.2  0:00.65 sendmail                                                              

10786 root      9  0  3568 3560 2808 S  0.0  0.2  0:00.02 sendmail                                                              

18639 root      9  0  3564 3556 2764 S  0.0  0.2  0:00.52 sendmail                                                              

10237 root      9  0  3552 3544 2768 S  0.0  0.2  0:00.36 sendmail                                                              

 4306 root      9  0  3528 3520 2756 S  0.0  0.2  0:00.46 sendmail                                                              

19547 root      9  0  3524 3516 2768 S  0.0  0.2  0:00.23 sendmail                                                              

11349 root      9  0  3516 3512 2784 S  0.0  0.2  0:00.00 sendmail                                                              

16402 root      9  0  3516 3508 2760 S  0.0  0.2  0:00.39 sendmail                                                              

 3265 root      9  0  3460 3460 2596 S  0.0  0.2  1:39.78 ntpd                                                                  

 8494 root      9  0  3444 3436 2776 S  0.0  0.2  0:00.05 sendmail                                                              

 6817 root      9  0  3436 3428 2752 S  0.0  0.2  0:00.32 sendmail                                                              

10955 root      9  0  3432 3424 2796 S  0.0  0.2  0:00.00 sendmail                                                              

 9750 web        9  0 10540 3408 3068 S  0.0  0.2  0:00.04 httpd                                                                  

 4852 root      9  0  3388 3380 2752 S  0.0  0.2  0:00.15 sendmail                                                              

11358 root      9  0  3364 3356 2808 D  0.0  0.2  0:00.00 sendmail                                                              

11359 root      9  0  3364 3356 2808 D  0.0  0.2  0:00.00 sendmail                                                              

11354 root      9  0  3324 3316 2816 D  0.0  0.2  0:00.00 sendmail                                                              

23727 root      9  0  3260 3256 2660 D  0.0  0.2  0:01.82 sendmail                                                              

11266 root      9  0  3240 3236 2712 S  0.0  0.2  0:00.00 sendmail

Also, not too sure what the difference is between Shared, RES, and Virt is, that's why I sorted by %Mem.

Thanks,
Craig

Have you checked your system logs for disk I/O error messages ? Hard errors on disk can drop the system performance without any other apparent reason.

cheers,

Go get sysstat, and use iostat. Quick search didn't find a Slack package for it, but just download and install it.

Gotta ask tho', why the updatedb every night ???. What do you use the data for, and how often ???.
I know it's the "done thing" with a lot of distros.

Hi All. Thanks for the replied.

Didn't see anything related to IO in the logs. Especially anything that looked like an Error.

Quote:

why the updatedb every night ???.

No reason really, like you said, just the done thing. I do use the locate command quite a bit, as I'm still learning. And really, it's not just updatedb that's causing me problems.. just general performance issues. While running a back up to a tape, we sometimes see the same thing.

I installed the sysstat package. Unfortunately, iostat isn't showing me much I don't think. Maybe because of the kernel version... I'm not sure. Here is some output though:

Code:

 

root@mailgate:/dev# df

Filesystem          1K-blocks      Used Available Use% Mounted on

/dev/rd/c0d0p1      130633552  51940180  71725180  43% /



Normal utilization IOSTAT:



root@mailgate:/dev# iostat -x c0d0p1 5

Linux 2.4.31 (mailgate)        03/09/06



avg-cpu:  %user  %nice %system %iowait  %steal  %idle

          1.22    0.00    1.67    0.00    0.00  97.11



Device:    rrqm/s wrqm/s  r/s  w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz  await  svctm  %util



(notice all device stats are blank, I've also tried /dev/rd/c0d0p1 and rd/c0d0p1 as devices for iostat)



IOSTAT while running updatedb. Load AVG about 14.  Not much different:



root@mailgate:/dev# iostat -x c0d0p1 5

Linux 2.4.31 (mailgate)        03/09/06



avg-cpu:  %user  %nice %system %iowait  %steal  %idle

          0.45    0.00    0.30    0.00    0.00  99.25



Device:    rrqm/s wrqm/s  r/s  w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz  await  svctm  %util



Also tried this, and gave me no data:



root@mailgate:/dev# iostat -d 5

Linux 2.4.31 (mailgate)        03/09/06



Device:            tps  Blk_read/s  Blk_wrtn/s  Blk_read  Blk_wrtn

Any ideas?
Thanks.

Never really looked at a 2.4 kernel from a performance aspect - even on Slack the first thing I did was go 2.6.
If you get the kernel sources with Slack (I don't have any Slack now so I can't check), have a look at iostats.txt - else go look online. Has a good discussion on fields in /proc - there are differences between 2.4 and 2.6. Maybe you can knock up a script to pull the numbers direct from there and write them to a file to have a look at later.

One would think your problem has to be I/O. I had skipped the fact you are running RAID5 - wonder if that's getting in the way. From Linux point of view, you only have one disk - including swap, which generally isn't a good idea. So it will be trying to manage the I/O based on that - merging I/O and calculating swap slot locality.
Your raid card will then be tearing it all apart and spraying it all over the disks. Hardly working together for optimal performance - not that that was ever a promise of RAID5.

Maybe in the bad periods you could looks for tasks in I/O wait - (reverse) sort "top" on the "S" field (look for status "D").