CPU: i7-2600K @ 4300MHz - supports aesni.
RAM: 32GB
Kernel: 3.16-3-amd64
HDDs: WD Passport Ultra 2TB USB 3.0 x 3
LVM2: 2.0.2.111 --
pvcreate -M2 --dataalignment 512K /dev/sd[ijk]1
--------------- vgcreate wdultra /dev/sd[ijk]1
--------------- lvcreate -n wdbackup -i 3 -I 512 -l +1430769 wdultra
cryptsetup: 1.6.6 --
cryptsetup -v luksFormat /dev/wdultra/wdbackup
Filesystem: btrfs 3.17 --
mount -t btrfs -o noatime /dev/mapper/<device> /mnt/backup
I've created a luks container on top of LVM2, which is striped across the 3 HDDs. According to the benchmark, I can encrypt data at over 2000 MiB/s. The benchmark only uses a single core.
Code:
# cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 1242388 iterations per second
PBKDF2-sha256 732245 iterations per second
PBKDF2-sha512 595781 iterations per second
PBKDF2-ripemd160 708497 iterations per second
PBKDF2-whirlpool 267493 iterations per second
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 738.2 MiB/s 2525.3 MiB/s
serpent-cbc 128b 93.6 MiB/s 364.0 MiB/s
twofish-cbc 128b 209.1 MiB/s 394.1 MiB/s
aes-cbc 256b 544.2 MiB/s 1929.9 MiB/s
serpent-cbc 256b 93.6 MiB/s 364.8 MiB/s
twofish-cbc 256b 210.6 MiB/s 395.9 MiB/s
aes-xts 256b 2198.6 MiB/s 2199.0 MiB/s
serpent-xts 256b 378.0 MiB/s 358.6 MiB/s
twofish-xts 256b 390.9 MiB/s 389.3 MiB/s
aes-xts 512b 1704.8 MiB/s 1713.5 MiB/s
serpent-xts 512b 378.2 MiB/s 359.1 MiB/s
twofish-xts 512b 389.2 MiB/s 390.1 MiB/s
hdparm shows the throughput of the 3 drives together can easily hit at least 300 MiB/s.
Code:
# hdparm -tT /dev/wdultra/wdbackup
/dev/wdultra/wdbackup:
Timing cached reads: 24850 MB in 2.00 seconds = 12435.30 MB/sec
Timing buffered disk reads: 896 MB in 3.00 seconds = 298.34 MB/sec
When I run rsync to backup to the 3 HDDs without cryptsetup, I can regularly write data at 320 MiB/s. When I use cryptsetup using the options I mentioned above, that throughput seems to be cut in half, and all 8 cores seems to be maxed with I/O traffic.
I would think the CPU was waiting for the slower I/O of the HDDs, instead of the HDDs waiting for the CPU to process the data.