LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 03-15-2007, 02:57 PM   #1
MiniMoses
LQ Newbie
 
Registered: Feb 2007
Posts: 8

Rep: Reputation: 0
Disk I/O Bottlekneck. Need tuning advice


I wrote a web app that writes a lot of data to disk. I recently observed the system intermittently hiccupping, blocking all apache processes for a short (though noticable) period of time.

I'm running debian sarge on a 2.4 kernel. I have virtually no experience tuning disk i/o in linux.

I'm not sure if using elvtune or if tweaking /proc/sys options can help/fix it.

The other two obvious fixes are 1. better HW and 2. Look for code optimizations.

Any advice would be *greatly* appreciated.

Below is output of vmstat showing the hiccupping occuring every 3-4 seconds (when bo is 1000+).

Code:
vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0   3968 151512 149608 1347020    0    0     0     1    2     2 10  4 87  0
 1  0   3968 151360 149608 1347088    0    0     0     0 2276  4630 21 12 67  0
 0 52   3968 151076 149608 1347120    0    0     0  1784 2211  3318 15  8 76  0
 2  0   3968 151404 149608 1347220    0    0     4  1112 2230  6203 34 13 54  0
 2  0   3968 151180 149608 1347280    0    0     0     0 2191  4026 18 13 69  0
 0  0   3968 151128 149608 1347328    0    0     0     0 2070  4078 16 10 73  0
 1  0   3968 151064 149608 1347388    0    0     0     0 2339  4517 17 12 70  0
 0 53   3968 151208 149608 1347412    0    0     0  1480 2133  3023 11  6 83  0
 0  0   3968 150800 149608 1347492    0    0     0  1012 2171  4536 25 15 60  0
 0  0   3968 151060 149608 1347564    0    0     0     0 2083  4098 14 16 70  0
 0  0   3968 150804 149608 1347640    0    0     0     0 1967  3858 17 11 73  0
 1  0   3968 150908 149608 1347708    0    0     0     0 2237  4391 18 12 71  0
 0 70   3968 150848 149608 1347744    0    0     0  1700 2220  3111 13  8 79  0
 0  0   3968 150776 149608 1347832    0    0     0  1104 2339  4870 29 14 57  0
 2  0   3968 150724 149608 1347876    0    0     0     0 2019  3659 17  8 75  0
 0  0   3968 150668 149608 1347952    0    0     0     0 2486  4344 21 10 69  0
 1  0   3968 150384 149608 1348020    0    0     0     0 2463  4717 22 12 66  0
 0 76   3968 150520 149608 1348052    0    0     0  1760 2130  2717 11  7 82  0
 1  0   3968 150416 149608 1348168    0    0     0  1032 2618  6397 43 10 48  0
 2  0   3968 150228 149608 1348216    0    0     0     0 2173  4238 24 11 65  0
 2  0   3968 150104 149608 1348256    0    0     0     0 2186  3836 17  9 75  0
 1  0   3968 150092 149608 1348332    0    0     0     0 2120  4310 20 12 67  0
 0 87   3968 150204 149608 1348360    0    0     0  1660 1940  2056  5  6 89  0
 2  0   3968 149712 149608 1348456    0    0     0  1028 2509  6046 39 14 46  0
 0  0   3968 150112 149608 1348508    0    0     0     0 2362  4663 24 11 66  0
 2  0   3968 149412 149608 1348588    0    0     0     0 2260  4165 21 13 66  0
 1  0   3968 149852 149608 1348632    0    0     0     0 2235  4122 15 12 73  0
20  0   3968 132980 149608 1348660    0    0     0  2580 2325  2647 17  7 76  0
 2  0   3968 149784 149608 1348768    0    0     0     0 2972  8382 30 20 50  0
 3  0   3968 149592 149608 1348832    0    0     4     0 2385  4751 23 11 66  0
 0  0   3968 149700 149608 1348884    0    0     0     0 2304  4764 22 11 66  0
 0 11   3968 149176 149608 1348940    0    0     0  1688 2292  3899 18  8 74  0
 1  0   3968 149060 149608 1349008    0    0     0  1020 2559  4950 27 11 61  0
 0  0   3968 149308 149608 1349068    0    0     0     0 2417  4686 23 11 66  0
 0  0   3968 149424 149608 1349120    0    0     0     0 2305  4694 22 12 66  0
 
Old 03-16-2007, 08:24 PM   #2
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
My observations:

Each occurrence of bo greater than 1000 is followed immediately by another one. In each pair, the first occurrence of bo greater than 1000 also has a large number of processes being blocked (second column with the heading "b"). In all cases when there is a large number of processes being blocked there are zero processes that are in the run queue (first column with the heading "r").

There is one occurrence where there are twenty processes in the run queue. That's a lot.

In all cases the i/o wait is zero (right most column with the heading "wa").

No swapping is occurring.

The CPU is idle a lot of the time.

Having more than one or two processes being blocked is very unusual. Having more than one or two processes in the run queue is also very unusual.

So I am wondering if your web application spawns many child processes. If your web application spawned a lot of child processes that were all contending for access to a single resource then that might explain why you have so many processes being blocked.

That might be enough information for you to figure out the cause of the problem. It sounds like you either have to reduce the number of processes accessing the disk or file simultaneously or spread out the i/o over more disks. The fact that the i/o wait is zero makes me think that more disks won't help. Maybe the resource is a log file or a data file that many processes are trying to read or write simultaneously. Often when many processes have to read or write a single file there is a controller process that manages that access. You may have to redesign your application to include using a database server in order to accomplish this shared access to a single file.

If I were you I would install the sar utility and run the sar data collector (sadc) every ten minutes. It will make binary data files of resource usage which you can then read. There is a wonderful application called KSar that makes graphs of sar data files. You would be able to see what resources are depleted when the bo is greater than 1000 and there are more than 2 processes being blocked.

The sar utility comes in the systats (sysstats?) package. KSar can be found at

http://sourceforge.net/search/?type_...oft&words=ksar

More information about sadc is available as a man page once you have installed the systats (sysstats?) package. You run the sadc utility via cron.

Last edited by stress_junkie; 03-16-2007 at 08:59 PM.
 
Old 03-17-2007, 12:19 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
I don't know 2.4 at all, but that I/O profile looks like the standard 5 second write cycle. You just have too much to get done in a second.
In 2.6 you could perhaps pick another I/O scheduler, and/or reduce the 5 second lag (have a look at /proc/sys/vm/dirty_expire_centisecs).

Less I/O would be the best objective. Next would be better spread of I/O - that probably means more disks, on more/separate paths.
You need to get those I/Os completed faster and more consistently. That "b" column isn't really traditional blocked processes - it's processes in uninterruptable sleep. Waiting on (physical) I/O in this case.
Qualifies as "blocked", but not in the usual semaphore/mutex programming sense.
Your processes stall at the 5 second boundary because the physical I/O hasn't signalled completion. Have a look at top - reverse sort on process status, and I'll bet you'll see all those guys go to status "D" at the stall point(s).
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Advice for backup disk partitioning scheme? thinksincode Linux - Newbie 10 02-08-2007 04:51 PM
Advice needed: data recovery from NTFS disk using ddrescue and Ubuntu afzal_b Linux - Newbie 10 09-20-2006 09:14 AM
Need advice on SATA Hard disk and Controller which is 100% Linux Compatible ganja_guru Linux - Hardware 1 08-11-2005 12:25 AM
disk to disk backup for debian/Mempis on bootable cd-- advice loninappleton Linux - Software 3 05-27-2005 01:00 AM
Disk Storage Advice Needed vdi_nenna Linux - Hardware 1 06-26-2003 07:42 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 04:46 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration