LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 09-24-2014, 04:08 AM   #1
der_hede
LQ Newbie
 
Registered: Sep 2014
Posts: 6

Rep: Reputation: Disabled
dm-crypt single vs. multithreaded


The behaviour of dm-crypt seems to depend on machine type. Sometimes it is multithreaded, sometimes it is singlethreaded. Even with the same Software.

I have three different PC systems here. Thinkpad X60, T400 and an AMD PC. All are multicore machines (X60: Intel Core Duo, T400: Intel Core 2 Duo, AMD PC: Phenom II X6)

All booted from the same linux live image (via usb flash drive). All accessing the same encrypted external hard drive (LUKS with aes-cbc-essiv:sha256).

I'm watching "top"/"htop" in one terminal while starting a reading thread in another one:
cat /dev/mapper/encrypteddisk > /dev/null

With the T400 there are ~2 active "kworker/n:m"-kernel-threads in the process list. Where n and m are random digits. It seems there are 4 active threads running, but only two are heavily using the two CPU cores.

With the X60 and the AMD PC there's only one active "kworker/n:m"-thread at a time, while n and m are constantly changing. So they are also mutlithreaded, there are indeed multiple "kworker/n:m" threads with low CPU load, but it seems multithreading is not working. There's only one thread actively using high CPU load at a time.

Does anybody know why?

The expected behaviour would be 2 (X60) to 6 (AMD PC) active kworker threads, like with the T400.

With external USB 2.0 media there isn't much speed difference, but with internal hard drive (esp. SSD) the T400 is faster, even if the AMD PC has a more powerful CPU. The T400 CPU load rises to 200% (2 Cores at 100% each) while the other ones remain at 100% (1 Core at 100%). i.e. 100% cpu load for the T400, 50% for the X60 and 17% for the AMD X6.

Btw. this affects only reading. Writing to the Disk rises multiple kworker threads with CPU load on all the machines.
 
Old 10-04-2014, 03:53 PM   #2
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
I know that in CBC mode encryption is limited to one thread, while decryption can be multi-threaded. If you want better encryption performance use CTR mode.
https://en.wikipedia.org/wiki/Block_...n#Common_modes

You will have to re-encrypt of course.
 
Old 02-01-2015, 07:07 AM   #3
der_hede
LQ Newbie
 
Registered: Sep 2014
Posts: 6

Original Poster
Rep: Reputation: Disabled
CBC encryption is limited to one thread per block. But every single Filesystemblock can be encrypted and decrypted in parallel. And btw. I said "this affects only reading" (i.e. decrypting). Writing (i.e. encrypting) is multithreaded, so parallelism via blocks seems to be in use.

And btw. I'm using XTS for all those systems which truecrypt is able to encrypt and decrypt via multicore.

And I've used the same drive and Software (Linux/dm-crypt) on all 3 PCs and it is using multiple cores on one machine and a single core on the other two multicore-machines. Seems odd.
 
Old 02-01-2015, 10:36 AM   #4
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
I'm guessing you believe there is less performance with one thread ? Try measuring throughput instead of worrying about how the CPUs are being used.
 
Old 02-01-2015, 11:16 AM   #5
der_hede
LQ Newbie
 
Registered: Sep 2014
Posts: 6

Original Poster
Rep: Reputation: Disabled
Why do you think a CPU-bound process should _not_ run faster with more parallel threads?

#######
root@pc:~# sync; echo 3 > /proc/sys/vm/drop_caches
root@pc:~# pv /dev/mapper/sda2_crypt > /dev/null
^C15GB 0:00:19 [ 117MB/s] [> ] 1% ETA 0:17:12
root@pc:~# echo 0 > /sys/devices/system/cpu/cpu1/online
root@pc:~# sync; echo 3 > /proc/sys/vm/drop_caches
892MB 0:00:11 [84,6MB/s] [> ] 0% ETA 0:24:52
^C
#######

interestingly after reactivating the second cpu core the speed doesn't fully recover:
#######
root@pc:~# echo 1 > /sys/devices/system/cpu/cpu1/online
root@pc:~# sync; echo 3 > /proc/sys/vm/drop_caches
root@pc:~# pv /dev/mapper/sda2_crypt > /dev/null
920MB 0:00:10 [95,6MB/s] [> ] 0% ETA 0:21:55
^C
#######
me wonders why it's still faster then with one single core, because htop/top shows only one active kworker and core.

This is with an old FSB-based dual core. A modern CPU with far more CPU cores should produce even better results.

Nevertheless it's far from doubling performance with doubling the number of cpu cores.

With dm-crypt (kernelspace) the X6 also gets ~100 MB/s, with truecrypt (userspace, but multithreaded) it's >400 MB/s!

It seems dm-crypts implementation is less-than-ideal / subpar. The old unbeloved truecrypt is superior in that case.
 
Old 02-01-2015, 12:49 PM   #6
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Yeah could be a kernel issue, you didn't provide very much info about the systems, so I can't tell. Try newer kernels.

More threads does not always equal more performance. I was thinking that this may not be a CPU-bound process, but rather a HDD throughput bound process.
 
Old 02-04-2015, 04:49 AM   #7
der_hede
LQ Newbie
 
Registered: Sep 2014
Posts: 6

Original Poster
Rep: Reputation: Disabled
Ok. More details.

Btw: It's no serious problem; I can live with it since I started using encryption and I'm using encrypted drives for many years now...

At the X6 the CPU-load is ~18%, which is near 100/6 (not by chance). If I switch to the powersave governor (to limit the CPU clock) the CPU-load is still ~18%, right a bit above one active core. But throughput decreases to ~30MB/s. On the naked (unencrypted) device it's ~180 MB/s. Yes, it seems CPU-bound.

I think while starting this thread I was using Ubuntu 12.04 on the T400 and X6. I switched to the new Ubuntu lts late in 2014 but I don't know exactly when. Right now it's Ubuntu 14.04 and Kernel 3.13.0-45-generic #74-Ubuntu SMP. The X60 runs Arch Linux with its i686 lts-kernel (3.14 now and 3.12 in September?). Anyway: I'm monitoring this behaviour for many years now. So this is not limited to a single kernel version. While in past dmcrypt was only single threaded, today it seems multithreaded only in some rare conditions.

I also have a small embedded ARM board running linux 3.18 and debian 7 in userland. It uses LUKS (aes, xts-plain, sha1) and also suffers from only using a single core for decryption (quad core SoC).

The LUKS parameters for the Tests at the PCs are various combinations of aes 128 or 256 with xts-plain64 or cbc-essiv mode and ripemd160 or sha1 hashes. I've copied (via dd and nc) the same small test partition to the internal drives of all three Computers to compare them.

The Truecrypt partition is created by Truecrypt 7 with aes and default parameters (AFAIK also xts mode) and mounted in Linux via either Truecrypt or:
$ dmsetup-tc /boot/TC-header1.img /dev/sdc3 | dmsetup create WinTC
$ mount /dev/mapper/WinTC /mnt/tmp
It's a USB/eSata drive, so it's limited to USB 2.0 speed with the older Thinkpads - this is not relevant for truecrypt vs. dm-crypt comparison at the X6 because it has eSata.

The latter illustrates: it's not depending on encryption parameters and disk layout. Truecrypt uses multiple cores while dm-crypt still uses a single one with the same disk and data partition. (using pv [pipe viewer] on the block device to have a sequential read, but even with ntfs-3g it's cpu-limited by the decrypting kworker)

For me it seems dm-crypt itself is multithreaded. There are several different kworkers involved. But instead of running in parallel like with the T400 they are running serialized at the other machines. Maybe dmcrypt multithreading works only on modern Genuine Intel...

Thanks for your efforts, but maybe I should simply wait (further) for the problem to vanish spontaneously with future upgrades. ;-)
 
Old 02-04-2015, 12:54 PM   #8
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
First make sure this is enabled in the kernel:
Code:
  ┌──────────────────────── Parallel crypto engine ─────────────────────────┐
  │ CONFIG_CRYPTO_PCRYPT:                                                   │  
  │                                                                         │  
  │ This converts an arbitrary crypto algorithm into a parallel             │  
  │ algorithm that executes in kernel threads.                              │  
  │                                                                         │  
  │ Symbol: CRYPTO_PCRYPT [=y]                                              │  
  │ Type  : tristate                                                        │  
  │ Prompt: Parallel crypto engine                                          │  
  │   Location:                                                             │  
  │     -> Cryptographic API (CRYPTO [=y])                                  │  
  │   Defined at crypto/Kconfig:136                                         │  
  │   Depends on: CRYPTO [=y] && SMP [=y]                                   │  
  │   Selects: PADATA [=y] && CRYPTO_MANAGER [=y] && CRYPTO_AEAD [=y]       │
If you tried different kernels and the problem persists, then the problem is unlikely to go away. It could have to do with the age of the processors. The newer Core 2 Duo threads better than the older Core Duo.

I would try to do some benchmarks locally on an internal drive. Just make a file and encrypt it with various programs and methods and algorithms and see if things change. Technically, newer cryptsetup installs have a benchmark option, but it's not too reliable unfortunately, it can give absurdly high numbers for some algorithms.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
multithreaded software? jmite Linux - Software 2 12-12-2008 03:40 AM
Multithreaded Program Ashok_mittal Linux - Newbie 2 01-04-2008 12:56 AM
Multithreaded Server in C(++) Klesk1337 Programming 1 12-04-2006 05:32 PM
wget multithreaded AlexiaKeth Linux - Software 1 06-25-2006 04:03 PM
multithreaded server using single message queue ekern Programming 0 11-19-2004 01:30 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 07:28 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration