Hello,
Since we have bought a server fitted with 2 x Samsung 840 Pro SSDs assembled into a linux software raid-1 matrix.
(1) First I've noticed a very acute write speed problem when creating or copying larger files (hundreds of MBs or GBs) accompanied by serious overloads:
root [~]# w
01:02:14 up 55 days, 57 min, 2 users, load average:
0.48, 1.07, 1.84
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root [~]# time dd if=backup-10.5.2013_23-22-11_xxxxxxxx.tar.gz of=test3 oflag=sync bs=1G
0+1 records in
0+1 records out
307191761 bytes (
307 MB) copied, 43.0388 s,
7.1 MB/s
real 0m
43.060s
user 0m0.000s
sys 0m1.228s
root [~]# w
01:03:07 up 55 days, 58 min, 2 users, load average:
17.97, 5.22, 3.18
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
As you can see a 307MB files was copied in 43 seconds with an average speed of 7MB/s. These SSDs should be able to do it in 1 second with hundreds of MB per second.
Also, this time the load spiked very moderately. With 500MB the load spikes to 30-40 and with 1GB files it can spike to 100.
(2) During the same kind of operations the iostat looks funny. A few samples during the copy:
Right at the start:
Code:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 1.50 6298.50 311.50 613.50 64616.00 55270.50 129.61 3.18 3.44 0.27 25.10
sdb 2.50 6603.50 284.50 230.50 61576.00 12054.50 142.97 23.50 8.54 1.58 81.40
md1 0.00 0.00 0.00 393.00 0.00 3144.00 8.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 599.00 6814.00 125936.00 54504.00 24.34 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
After 2 seconds:
Code:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 1589.50 0.00 54.00 0.00 13148.00 243.48 0.60 11.17 0.46 2.50
sdb 0.00 1627.50 0.00 16.50 0.00 9524.00 577.21 144.25 1439.33 60.61 100.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 1602.00 0.00 12816.00 8.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
.......... (~40 seconds)
42 seconds later
Code:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 14.50 0.00 11788.00 812.97 143.62 7448.45 68.97 100.00
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
As you can see for the few seconds /dev/sda spiked to 25% %util. But /dev/sdb spiked to 100% and it kept it like this for the remainder of ~40 seconds (while /dev/sda barely broke a sweat). It took 43 seconds for a 300MB file but obviously with larger files it takes a lot more to copy it.
Also the first iteration of iostat (that present averages since last reboot) %util is quite different between the 2 members of the same raid-1 array:
Code:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 10.44 51.06 790.39 125.41 8803.98 1633.11 11.40 0.33 0.37 0.06 5.64
sdb 9.53 58.35 322.37 118.11 4835.59 1633.11 14.69 0.33 0.76 0.29 12.97
md1 0.00 0.00 1.88 1.33 15.07 10.68 8.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 1109.02 173.12 10881.59 1620.39 9.75 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.41 0.01 3.10 0.02 7.42 0.00 0.00 0.00 0.00
(3) The wear of the 2 SSDs is very different:
Code:
root [~]# smartctl --attributes /dev/sda | grep -i wear
177 Wear_Leveling_Count 0x0013 095% 095 000 Pre-fail Always - 180
root [~]# smartctl --attributes /dev/sdb | grep -i wear
177 Wear_Leveling_Count 0x0013 072% 072 000 Pre-fail Always - 1005
/dev/sda: 5% wear
/dev/sdb: 28% wear
Also the total number of LBAs written on the 2 members is different but not so much as the above:
Code:
root [~]# smartctl --attributes /dev/sda | grep -i LBA
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 21912041841
root [~]# smartctl --attributes /dev/sdb | grep -i LBA
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 23720836220
(4) Also following the io load with iotop I've noticed moments when jbd2 uses close to 100% I/O without writing or reading much (or anything for that matter).
Some background info
- this is a shared cPanel server (web, mail, mysql etc)
- the SSDs have been used exactly the same amount of time and I know of no resyncs during this time;
- initially they had the DXM04B0Q but I have updated both to DXM05B0Q
- I have looked for "hard resetting link" in dmesg to check for cable/port issues but nothing
- I believe they are aligned correctly (listing below)
- SO is CentOS 6.4, 2.6.32-358.11.1.el6.x86_64
- the write intent bitmap was removed right at the start
- the system was installed from a minimum installation DVD and completed with stuff as needed (it didn't even have the man pages)
- tested with and without discard and noatime
- tested with all schedulers
root [~]# fdisk -ul /dev/sda
Disk /dev/sda: 512.1 GB, 512110190592 bytes
255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00026d59
Device Boot Start End Blocks Id System
/dev/sda1 2048 4196351 2097152 fd Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sda2 * 4196352 4605951 204800 fd Linux raid autodetect
Partition 2 does not end on cylinder boundary.
/dev/sda3 4605952 814106623 404750336 fd Linux raid autodetect
root [~]# fdisk -ul /dev/sdb
Disk /dev/sdb: 512.1 GB, 512110190592 bytes
255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0003dede
Device Boot Start End Blocks Id System
/dev/sdb1 2048 4196351 2097152 fd Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sdb2 * 4196352 4605951 204800 fd Linux raid autodetect
Partition 2 does not end on cylinder boundary.
/dev/sdb3 4605952 814106623 404750336 fd Linux raid autodetect
MOUNT
root [/var/log]# mount
/dev/md2 on / type ext4 (rw,noatime,usrjquota=quota.user,jqfmt=vfsv0)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/md0 on /boot type ext4 (rw,noatime)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/usr/tmpDSK on /tmp type ext3 (rw,noexec,nosuid,loop=/dev/loop0)
/tmp on /var/tmp type none (rw,noexec,nosuid,bind)
/etc/fstab
root # cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Wed Apr 3 17:22:52 2013
#
UUID=8fedde2c-f5b7-4edf-975f-d8d087d79ebf / ext4 noatime,usrjquota=quota.user,jqfmt=vfsv0 1 1
UUID=bfc50d02-6d4d-4510-93ea-27941cd49cf4 /boot ext4 noatime,defaults 1 2
UUID=cef1d19d-2578-43db-9ffc-b6b70e227bfa swap swap defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/usr/tmpDSK /tmp ext3 noatime,defaults,noauto 0 0
/proc/mdstat
root # cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb2[1] sda2[0]
204736 blocks super 1.0 [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0]
404750144 blocks super 1.0 [2/2] [UU]
md1 : active raid1 sdb1[1] sda1[0]
2096064 blocks super 1.1 [2/2] [UU]
unused devices: <none>
What is the problem with this array?