[SOLVED] Is my data safe after running "sync" if I use data=writeback,barrier=0 (ext4)?

*Dark Dragon* · 01-11-2012, 08:15 AM

I think about using data=writeback,barrier=0 in my root partition. I'm doing full/incremental daily backups so I cannot lose more than a day of work, and I'm on Xeon-based workstation with RDIMM memory protected by UPS. However, possibility of an accident (like power supply or UPS failure, or kernel panic) is never zero. Sometimes when I write something important (for example finished hard part in my program/script), I want to have a way to make sure that even if something bad happens (for example battery in UPS or UPS itself fails unexpectedly and power disappears), my data is 100% safe (at file system level).

I think that running sync manually is the way, and after sync have finished its job, all my data (saved before sync) is safe in the file system (ext4) even if I use data=writeback,barrier=0. But I'm not 100% sure about this. I would really appreciate if somebody knowledgeable tells me if I'm right?

H_TeXMeX_H · 01-11-2012, 08:24 AM

I think you are right in that if sync finishes writing the data, your data will be safe.

http://en.wikipedia.org/wiki/Ext4#De...tial_data_loss

However, if you want your data to be 100% safe, I do wonder why you are using data=writeback ...

If you wanted 100% safety, or close, you would use data=ordered or ext3 or another filesystem that uses data=ordered by default.

*Dark Dragon* · 01-11-2012, 09:54 AM

Thanks for quick answer. But I already read many docs including Wikipedia before posting my question, including the "Delayed allocation and potential data loss" section. But none of these docs tell me if my data is guaranteed to be safe when sync exits, and none of them talk about worst-case scenario when using data=writeback,barrier=0.

But at least it looks like it is guaranteed that after sync all data will be put into file system or in the process of getting into the filesystem as fast as it can. Also, I discovered that kernel syncs automatically, and I can control how often it happens using "sysctl vm.dirty_expire_centisecs=N" command, where is N is how often kernel will sync in centiseconds. After changing the setting I can put "vm.dirty_expire_centisecs=N" to /etc/sysctl.d/local.conf so the value I like will be set on each boot automatically.

Quote:

Originally Posted by H_TeXMeX_H

if you want your data to be 100% safe, I do wonder why you are using data=writeback

Because I want max. performance. But I also want to understand the risks I'm taking.

So what is the worst case scenario? By default vm.dirty_expire_centisecs is set to 30s and vm.dirty_writeback_centisecs to 5s (can be overridden with commit=N mount option), so as far as I understand with data=writeback,barrier=0 options worst case scenario is losing 30s + time_to_finish_sync (if I keep default values for both settings). So basically I'm risking to lose up to 35 seconds of my work if complete sync will take 5 seconds. Am I right? Or even worse scenario is possible when using data=writeback,barrier=0?

H_TeXMeX_H · 01-11-2012, 10:58 AM

I think that you are thinking about this in the wrong way.

The worst that can happen is that you forget to fsync or fsync doesn't complete in time, and in this case you will likely get massive and possibly unrecoverable filesystem corruption using those options. Yeah, 35 seconds would be right in your calculation, but you can never know when something will fail. I would not take these kinds of chances, especially with backups. Find other ways to optimize.

https://wiki.archlinux.org/index.php...ata_corruption

I would do other things to improve performance, not this. This is too dangerous. From the benchmarks that I did and that I've seen, I would use ext4 (default, safe settings) or reiserfs and the deadline I/O scheduler.

Some other things that I've messed with that have helped are:

Code:

blockdev --setra 16384 /dev/sda
echo 512 > /sys/block/sda/queue/nr_requests

http://kb.lsi.com/KnowledgebaseArticle11050.aspx
The max_sectors_kb is specific for RAID 5 and 6, so don't use that.

*Dark Dragon* · 01-11-2012, 02:25 PM

Quote:

Originally Posted by H_TeXMeX_H

The worst that can happen is that you forget to fsync or fsync doesn't complete in time, and in this case you will likely get massive and possibly unrecoverable filesystem corruption using those options. Yeah, 35 seconds would be right in your calculation, but you can never know when something will fail. I would not take these kinds of chances, especially with backups. Find other ways to optimize.

It looks that you are right. I read somewhere that disabling barrier improved performance when writing many files (for example when unpacking Linux kernel source) but it seems this is not the case:

Code:

# mount / -o remount,barrier=0; rm -rf linux-3.1.6
# sync; echo 3 > /proc/sys/vm/drop_caches
# time tar -xjf linux-3.1.6.tar.bz2      
tar -xjf linux-3.1.6.tar.bz2  15.36s user 1.79s system 106% cpu 16.053 total
# mount / -o remount,barrier=1; rm -rf linux-3.1.6                     
# sync; echo 3 > /proc/sys/vm/drop_caches
# time tar -xjf linux-3.1.6.tar.bz2      
tar -xjf linux-3.1.6.tar.bz2  15.36s user 1.82s system 109% cpu 15.747 total

In the results above please ignore total time (I did not measure sync after tar on purpose, it does not matter if you have enough memory). I did the test above many times and there is no significant difference at all. So disabling the barrier is pointless risk then. Perhaps disabling barrier is useful only when there is very little free memory, I usually have 10-25GiB RAM available for disk cache and this is enough for most activities.

data=writeback is safer than barrier=0, but after searching for benchmarks I found out that it does not give any performance benefits except in few very specific cases, and even then difference is so small that it's not worth the risk. Here is one of the benchmarks I found: https://natzo.com/doku.php?id=catego...nux:filesystem

Quote:

From the benchmarks that I did and that I've seen, I would use ext4 (default, safe settings) or reiserfs and the deadline I/O scheduler.

deadline scheduler is bad choice for me, it does not support ionice, and I use it a lot, so I'm happy with default CFQ.

Quote:

Some other things that I've messed with that have helped are:

Code:

blockdev --setra 16384 /dev/sda
echo 512 > /sys/block/sda/queue/nr_requests

Thank you for excellent suggestions. These adjustments made noticeable difference in many of my tasks (up to 3%-8%). I did not do a lot of benchmarking yet but it looks like only blockdev --setra is useful in my case.

I managed to improve performance even farther with the following settings (use sysctl to apply them without reboot):

Quote:

# cat /etc/sysctl.d/local.conf
# vm.dirty.* are explained in http://www.westnet.com/~gsmith/conte...ux-pdflush.htm
vm.dirty_writeback_centisecs=60000
vm.dirty_expire_centisecs=120000
vm.dirty_ratio=60
vm.dirty_background_ratio=80

Yes, writeback and expire values are intentionally that large and I understand that I can lose up to 20 minutes of work, but in my case this is noticeably better than 6000 and 12000 for example because I often do massive amount of writes (many GiB in total size) and then overwrite them again and again. Large simulations of custom network-like systems is one example where I see the big difference; also, I often save my work when I'm actively doing something, for example when using Maya: saving large files and often overwriting them then happens in RAM, which is much faster (and keeps my work safe in case the program crashes). Losing up to 20 minutes of work is acceptable risk to me, especially considering the fact that so far my current workstation never crashed, so I do not expect accidents/crashes more often than once in 2-5 years.

Warning to other people: do not blindly copy my adjustments. Benchmark your typical activities with default settings, then try to change something, then benchmark again, and so on. If you decide to increase dirty_writeback_centisecs and dirty_expire_centisecs try to use minimal values (but not smaller than default). For example, if I increase my already large expire and writeback values 10 times, I will not see any noticeable improvements. But if I decrease them 10 times, I see performance degradation and too much load on my disk during certain activities. So I stick with the values I posted. For other person optimal values will be different.