LinuxQuestions.org - Mount 2TB encrypted-container file through SSHFS (performance question)

- Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)

- - Mount 2TB encrypted-container file through SSHFS (performance question) (https://www.linuxquestions.org/questions/linux-general-1/mount-2tb-encrypted-container-file-through-sshfs-performance-question-4175453734/)

Mount 2TB encrypted-container file through SSHFS (performance question)

Dear Community,

due to missing support of SSH/Rsync I tried to use a mounted and encrypted container file for storing a huge amount of data.

Setup:
Dedicated Server (source) + Storage Server connected with 1 Gbit (target)

I have mounted and build an encrypted container file the following way:

Code:

sshfs xxxx@xxxx.xxx:/ /mnt/backup

dd if=/dev/urandom of=/mnt/backup/container.img bs=1M count=2000000

losetup /dev/loop1 /mnt/backup/container.img

cryptsetup luksFormat /dev/loop1

cryptsetup luksOpen /dev/loop1 container

mkfs.ext2 /dev/mapper/container

mount /dev/mapper/container /mnt/container

This is my setup for the monted container file, it works like a charm, doing an rsync:

Code:

rsync -aAXv /xxx/* /mnt/container

works with incredible speed. :D
This way approx 800 GB of data are backed up correctly.
The second rsync run was also very fast, just a few minutes.

Now my problem:
After some days some data changed and there is approx ~ 100 GB of new data to be stored.
Rsync now seems to run forever, there is still activity on rsync and the networking interface but far far away from using the availible bandwith and as said it wont even finish a rsync run within 24 hours...

After doing some `ls` in various directories of the mounted device i recognized that every `ls` command in each directory takes a very long time (up to 30 seconds).
This brings me to the conclusion that this must have something to do with the filesystem structure and inode readings. This would also explain why rsync which is mainly sending and reading incremental file lists takes forever.

As you can read in the above code, i formated as ext2 filesystem. Now i tried converting to ext3 (tune2fs -j -O dir_index). The filesystem should now be ext3, but this did not solve my problem.

Anyone has an idea how I can increase the perfomance and what would be the best way to mount such a large container file? Do i have to change blocksizes, convert to ext4, use ReiserFS or is this just not possible at all.

Thanks for your suggestions and feedback.

Is the target a host you control?
Do you need the traffic to it to be encrypted? You are using a user mode filesystem (sshfs) to loop mount an encrypted filesystem.

Could you have an encrypted partition on the backup server instead? I'm suggesting using the network block device. If you can't repartition the backup server, you could loop mount the encrypted file at the server and use the encrypted loopback device for the nbd source. Then use cryptsetup on the local machine. I think this may give you a faster throughput, but you probably want to test it to be sure.

Since the file is already encrypted, you don't need ssh to encrypt the traffic.

http://www.linuxjournal.com/article/3778

P.S. The article also creates a file on the server, but the nbd-server serves uses the file directly instead of a loop device. So there example is even closer to what you are doing then I first thought.

Unfortunately I dont have control over the target host. The only ways to connect to the host are FTP/SFTP/SCP/SAMBA/CIFS.
I think using SFTP with sshfs mount is still the fastest way. I dont see any other way to get the ability to use rsync in combination with hardlinking.

Do you think encryption reduces the speed or the way files and inodes are read from the target? On the source host i have 12 cores, the first rsync run is using the complete 1Gbit bandwith. So encrpytion doesn not seem to affect the performance. Do you think reading encrypted data from the target influences the way directory and inode listings are read?

From the ext4 wiki i read the following:

Quote:

In a regular UNIX filesystem, the inode stores all the metadata pertaining to the file (time stamps, block maps, extended attributes, etc), not the directory entry. To find the information associated with a file, one must traverse the directory files to find the directory entry associated with a file, then load the inode to find the metadata for that file. ext4 appears to cheat (for performance reasons) a little bit by storing a copy of the file type (normally stored in the inode) in the directory entry. (Compare all this to FAT, which stores all the file information directly in the directory entry, but does not support hard links and is in general more seek-happy than ext4 due to its simpler block allocator and extensive use of linked lists.)

This brings me to the conclusion that doing an rsync on a existing backup makes a lot of disk access on various locations in the container file. Thus traversing each directory makes it necessary to read from the inode table in addition. This would explain the long time I need to perfom a `ls` on each directory on the target host.

Quote:

Since the file is already encrypted, you don't need ssh to encrypt the traffic.

I will give it a try...

I don't know how well sshfs does at seeking into the encrypted file. The decryption, and filesystem are mapped locally.

Try catting a large file to /mnt/container through a pv pipe to measure the bandwidth. Then mount the /mnt/backup/ share using cifs instead, replacing your first step. See if there is an improvement.

It turned out that there were perfomance issues on the network/storage server.
Now everything seems to work fluently - so there seems to be no problem mounting a 2TB image file over a network connection.

Anyway using cifs was in my case more stable than using sshfs.