LinuxQuestions.org - Disk I/O Bottlekneck. Need tuning advice

- Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)

- - Disk I/O Bottlekneck. Need tuning advice (https://www.linuxquestions.org/questions/linux-hardware-18/disk-i-o-bottlekneck-need-tuning-advice-537811/)

Disk I/O Bottlekneck. Need tuning advice

I wrote a web app that writes a lot of data to disk. I recently observed the system intermittently hiccupping, blocking all apache processes for a short (though noticable) period of time.

I'm running debian sarge on a 2.4 kernel. I have virtually no experience tuning disk i/o in linux.

I'm not sure if using elvtune or if tweaking /proc/sys options can help/fix it.

The other two obvious fixes are 1. better HW and 2. Look for code optimizations.

Any advice would be *greatly* appreciated.

Below is output of vmstat showing the hiccupping occuring every 3-4 seconds (when bo is 1000+).

Code:

vmstat 1

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----

 r  b  swpd  free  buff  cache  si  so    bi    bo  in    cs us sy id wa

 1  0  3968 151512 149608 1347020    0    0    0    1    2    2 10  4 87  0

 1  0  3968 151360 149608 1347088    0    0    0    0 2276  4630 21 12 67  0

 0 52  3968 151076 149608 1347120    0    0    0  1784 2211  3318 15  8 76  0

 2  0  3968 151404 149608 1347220    0    0    4  1112 2230  6203 34 13 54  0

 2  0  3968 151180 149608 1347280    0    0    0    0 2191  4026 18 13 69  0

 0  0  3968 151128 149608 1347328    0    0    0    0 2070  4078 16 10 73  0

 1  0  3968 151064 149608 1347388    0    0    0    0 2339  4517 17 12 70  0

 0 53  3968 151208 149608 1347412    0    0    0  1480 2133  3023 11  6 83  0

 0  0  3968 150800 149608 1347492    0    0    0  1012 2171  4536 25 15 60  0

 0  0  3968 151060 149608 1347564    0    0    0    0 2083  4098 14 16 70  0

 0  0  3968 150804 149608 1347640    0    0    0    0 1967  3858 17 11 73  0

 1  0  3968 150908 149608 1347708    0    0    0    0 2237  4391 18 12 71  0

 0 70  3968 150848 149608 1347744    0    0    0  1700 2220  3111 13  8 79  0

 0  0  3968 150776 149608 1347832    0    0    0  1104 2339  4870 29 14 57  0

 2  0  3968 150724 149608 1347876    0    0    0    0 2019  3659 17  8 75  0

 0  0  3968 150668 149608 1347952    0    0    0    0 2486  4344 21 10 69  0

 1  0  3968 150384 149608 1348020    0    0    0    0 2463  4717 22 12 66  0

 0 76  3968 150520 149608 1348052    0    0    0  1760 2130  2717 11  7 82  0

 1  0  3968 150416 149608 1348168    0    0    0  1032 2618  6397 43 10 48  0

 2  0  3968 150228 149608 1348216    0    0    0    0 2173  4238 24 11 65  0

 2  0  3968 150104 149608 1348256    0    0    0    0 2186  3836 17  9 75  0

 1  0  3968 150092 149608 1348332    0    0    0    0 2120  4310 20 12 67  0

 0 87  3968 150204 149608 1348360    0    0    0  1660 1940  2056  5  6 89  0

 2  0  3968 149712 149608 1348456    0    0    0  1028 2509  6046 39 14 46  0

 0  0  3968 150112 149608 1348508    0    0    0    0 2362  4663 24 11 66  0

 2  0  3968 149412 149608 1348588    0    0    0    0 2260  4165 21 13 66  0

 1  0  3968 149852 149608 1348632    0    0    0    0 2235  4122 15 12 73  0

20  0  3968 132980 149608 1348660    0    0    0  2580 2325  2647 17  7 76  0

 2  0  3968 149784 149608 1348768    0    0    0    0 2972  8382 30 20 50  0

 3  0  3968 149592 149608 1348832    0    0    4    0 2385  4751 23 11 66  0

 0  0  3968 149700 149608 1348884    0    0    0    0 2304  4764 22 11 66  0

 0 11  3968 149176 149608 1348940    0    0    0  1688 2292  3899 18  8 74  0

 1  0  3968 149060 149608 1349008    0    0    0  1020 2559  4950 27 11 61  0

 0  0  3968 149308 149608 1349068    0    0    0    0 2417  4686 23 11 66  0

 0  0  3968 149424 149608 1349120    0    0    0    0 2305  4694 22 12 66  0

My observations:

Each occurrence of bo greater than 1000 is followed immediately by another one. In each pair, the first occurrence of bo greater than 1000 also has a large number of processes being blocked (second column with the heading "b"). In all cases when there is a large number of processes being blocked there are zero processes that are in the run queue (first column with the heading "r").

There is one occurrence where there are twenty processes in the run queue. That's a lot.

In all cases the i/o wait is zero (right most column with the heading "wa").

No swapping is occurring.

The CPU is idle a lot of the time.

Having more than one or two processes being blocked is very unusual. Having more than one or two processes in the run queue is also very unusual.

So I am wondering if your web application spawns many child processes. If your web application spawned a lot of child processes that were all contending for access to a single resource then that might explain why you have so many processes being blocked.

That might be enough information for you to figure out the cause of the problem. It sounds like you either have to reduce the number of processes accessing the disk or file simultaneously or spread out the i/o over more disks. The fact that the i/o wait is zero makes me think that more disks won't help. Maybe the resource is a log file or a data file that many processes are trying to read or write simultaneously. Often when many processes have to read or write a single file there is a controller process that manages that access. You may have to redesign your application to include using a database server in order to accomplish this shared access to a single file.

If I were you I would install the sar utility and run the sar data collector (sadc) every ten minutes. It will make binary data files of resource usage which you can then read. There is a wonderful application called KSar that makes graphs of sar data files. You would be able to see what resources are depleted when the bo is greater than 1000 and there are more than 2 processes being blocked.

The sar utility comes in the systats (sysstats?) package. KSar can be found at

http://sourceforge.net/search/?type_...oft&words=ksar

More information about sadc is available as a man page once you have installed the systats (sysstats?) package. You run the sadc utility via cron.

I don't know 2.4 at all, but that I/O profile looks like the standard 5 second write cycle. You just have too much to get done in a second.
In 2.6 you could perhaps pick another I/O scheduler, and/or reduce the 5 second lag (have a look at /proc/sys/vm/dirty_expire_centisecs).

Less I/O would be the best objective. Next would be better spread of I/O - that probably means more disks, on more/separate paths.
You need to get those I/Os completed faster and more consistently. That "b" column isn't really traditional blocked processes - it's processes in uninterruptable sleep. Waiting on (physical) I/O in this case.
Qualifies as "blocked", but not in the usual semaphore/mutex programming sense.
Your processes stall at the 5 second boundary because the physical I/O hasn't signalled completion. Have a look at top - reverse sort on process status, and I'll bet you'll see all those guys go to status "D" at the stall point(s).