Hi,
I'm running home server on Debian stable with DHCP, DNS, Mail, VDR, Filesharing and my Weatherstation as main services. The filesharing is used to mount homes at clients.
The machine features an Athlon BE-2300, 3GB RAM, GB-LAN, 1TB and 1.5TB SATA HDD plus HDDs for backups. Mainboard has an NVIDIA chipset with
Code:
nVidia Corporation MCP65 SATA Controller (rev a3)
The primary disks are running in RAID1 + LVM.
My problem is that I/O, especially to HDDs, blocks the whole system, i.e. ssh login is nearly not possible and clients using the shared homes are nearly disabled. Typical causes for high I/O loads in my case are e.g. the daily backup or copying large files (-->VDR) via the network.
I googled a lot, checked outputs of e.g. (a)top or dstat and found the following:
- One of my disks in the RAID1 (the 1.5TB one, WD EARS) uses 4kB blocks which I somewhat ignored . Hence one can see at atop that one disk is at nearly 100% (/dev/sda) while the other is somewhat idleing (/dev/sdc) in average while copying files.
Performance of my drives looks as follows (this varies +/-5MB/s):
1.5TB (4kB blocks)
Code:
/dev/sda:
Timing buffered disk reads: 258 MB in 3.01 seconds = 85.65 MB/sec
1TB
Code:
/dev/sdc:
Timing buffered disk reads: 276 MB in 3.01 seconds = 91.58 MB/sec
So comparison is not too bad for the 4kB drive...
RAID
Code:
/dev/md1:
Timing buffered disk reads: 248 MB in 3.01 seconds = 82.29 MB/sec
LVM with and w/o RAID
Code:
/dev/mapper/home-home:
Timing buffered disk reads: 270 MB in 3.01 seconds = 89.71 MB/sec
/dev/mapper/unsafe_data-unsafe_data:
Timing buffered disk reads: 132 MB in 3.01 seconds = 43.84 MB/sec
Looking at atop while copying a 150MB file to the server via samba:
Code:
ATOP - red 2010/04/11 14:41:26 10 seconds elapsed
PRC | sys 1.72s | user 0.67s | #proc 162 | #zombie 0 | #exit 0 |
CPU | sys 7% | user 3% | irq 6% | idle 112% | wait 72% |
cpu | sys 6% | user 2% | irq 5% | idle 16% | cpu001 w 71% |
cpu | sys 1% | user 1% | irq 0% | idle 98% | cpu000 w 0% |
CPL | avg1 3.40 | avg5 1.18 | avg15 0.54 | csw 8050 | intr 55853 |
MEM | tot 3.0G | free 659.1M | cache 1.8G | buff 39.9M | slab 208.4M |
SWP | tot 2.8G | free 2.8G | | vmcom 736.2M | vmlim 4.3G |
DSK | sda | busy 97% | read 0 | write 246 | avio 40 ms |
DSK | sdc | busy 3% | read 0 | write 67 | avio 4 ms |
NET | transport | tcpi 39141 | tcpo 13703 | udpi 2 | udpo 0 |
NET | network | ipi 39143 | ipo 13703 | ipfrw 0 | deliv 39143 |
NET | lan 4% | pcki 39139 | pcko 13699 | si 46 Mbps | so 814 Kbps |
NET | lo ---- | pcki 4 | pcko 4 | si 0 Kbps | so 0 Kbps |
PID SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPU CMD 1/1
3433 0.62s 0.61s -24K -20K 0K 0K -- - S 12% vdr-kbd
25398 0.96s 0.06s 0K 0K 0K 54196K -- - D 10% smbd
27078 0.06s 0.00s 0K 0K 0K 0K -- - R 1% atop
1169 0.03s 0.00s 0K 0K 0K 0K -- - D 0% md4_raid1
2544 0.03s 0.00s 0K 0K 0K 152K -- - S 0% kjournald
2536 0.02s 0.00s 0K 0K 0K 0K -- - S 0% xfsdatad/1
2546 0.00s 0.00s 0K 0K 0K 0K -- - S 0% kjournald
26972 0.00s 0.00s 0K 0K 0K 0K -- - D 0% pdflush
27077 0.00s 0.00s 0K 0K 0K 4080K -- - D 0% pdflush
Note the difference between sda and sdc which are in the same RAID.
- The dstat output seems to show some let's call it hickups. For example copying a 150MB file via samba to the server looks like the following:
Code:
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
2 1 95 2 0 0|4056k 866k| 0 0 | 2.1B 2.1B| 728 844
1 0 99 0 0 0| 0 0 | 60B 302B| 0 0 | 605 496
1 0 93 5 0 0| 0 2937k| 60B 522B| 0 0 | 663 537
1 7 85 0 3 4| 0 552k| 13M 255k| 0 0 | 13k 1078
2 28 42 15 5 8| 0 18M| 38M 653k| 0 0 | 35k 2027
1 0 48 50 0 0| 0 6144k| 60B 476B| 0 0 | 676 643
1 1 49 50 0 0| 0 5936k| 125k 3720B| 0 0 | 760 546
1 1 48 50 0 0| 0 15M| 376k 38k| 0 0 | 994 613
1 2 48 49 0 0| 0 14M| 126k 3152B| 0 0 | 782 684
0 2 46 51 0 0| 0 6856k| 627k 14k| 0 0 |1206 591
1 0 0 99 0 0| 0 21M| 63k 624B| 0 0 | 693 589
1 0 0 99 0 0| 0 1416k| 60B 318B| 0 0 | 609 500
0 1 0 99 0 0| 0 0 | 60B 318B| 0 0 | 609 489
1 1 5 92 0 0| 0 24M| 60B 318B| 0 0 | 665 527
0 1 48 50 0 0| 0 32M| 244B 318B| 0 0 | 673 593
0 1 49 50 0 0| 0 8624k| 336B 318B| 0 0 | 628 491
1 1 49 49 0 0| 0 21M| 244B 428B| 0 0 | 640 495
0 0 49 50 0 0| 0 0 | 336B 412B| 0 0 | 519 836
5 4 41 50 0 0| 0 0 | 244B 302B| 0 0 | 510 659
3 14 27 45 4 6| 0 43M| 24M 420k| 0 0 | 22k 1682
2 19 45 21 5 8| 0 89M| 27M 472k| 0 0 | 25k 1705
0 1 48 50 0 0| 0 8632k| 60B 302B| 0 0 | 612 460
1 10 49 30 2 7| 0 51M| 22M 379k| 0 0 | 20k 1465
3 16 32 40 3 7| 0 74M| 24M 462k| 0 0 | 22k 1470
0 0 0 99 0 0| 0 0 | 60B 318B| 0 0 | 612 428
0 1 0 99 0 0| 0 9880k| 60B 318B| 0 0 | 629 453
1 1 49 48 0 0| 0 35M| 60B 318B| 0 0 | 655 475
0 0 77 22 0 0| 0 3888k| 60B 318B| 0 0 | 714 490
0 0 100 0 0 0| 0 8192B| 60B 318B| 0 0 | 609 421
0 4 78 18 0 0| 0 32M| 60B 318B| 0 0 | 668 498
1 0 69 29 0 0| 0 76M| 60B 318B| 0 0 | 689 439
0 1 99 0 0 0| 0 1728k| 60B 318B| 0 0 | 628 470
Why are there such gaps? Is this normal?
- Google brought me to some discussions like e.g. http://www.linuxquestions.org/questi...y-high-794777/ or http://forum.ubuntuusers.de/topic/fe.../#post-2360287 (in german, sorry). But I found nothing on how to deal with high I/O loads blocking the system.
So I guess my problem is twofold:
- Can I tell the kernel to balance better between different I/O tasks, i.e. the backup is not rendering my clients completely unusable? This might mitigate the symptoms.
- Any suggestions regarding the general performance or regarding the (in my opinion) strange values above? If I'm honest, I would like to avoid setting up the system from scratch....
Thanks in advance,
joko