Unmounting USB disk after successful Rsync led to data loss

oneleaf · 10-03-2021, 12:46 PM

Hello,

I had a strange issue happen yesterday that I am trying to figure out what happened.

I am on Linux Mint Mate 19.1, and I ran a regular rsync script. Here's what happened:

At 1:13:00, Rsync finished successfully
At 1:13:10, I unmounted the hard disk from the file manager, and it gave no error.
I then proceeded to disconnect the USB hard disk.
24 hours later, I ran the same rsync script and was surprised to see a file copied again that already successfully copied the day before.
Since I use hard-links for versioning, I was able to go back and see that in the prior day, the file in question was only partially there (instead of 5.3GB file, it was only 4.0GB).
However, the Rsync log indicated a successful transfer the day before, so I found this puzzling.

So I went into /var/log/syslog.1 (which had the events from the initial rsync job). I was surprised to find the following:

As expected, at 1:13:10, it does show the attempted unmounting of the disk.
However, at 1:13:19, it is littered with I/O errors on the disk.

The output of the pertinent section of the log is below. Any ideas what happened? Is there still some disk activity that occurs after the rsync job is done? And is it possible the file manager falsely indicates a successful unmounting while this is still happening? Thanks for any insight!

Code:

Oct  2 01:13:10 mintmate systemd[1]: Stopping Clean the /media/yudy/MovieBackup mount point...
Oct  2 01:13:10 mintmate systemd[1]: Stopped Clean the /media/yudy/MovieBackup mount point.
Oct  2 01:13:13 mintmate dhclient[2782]: XMT: Solicit on enp0s31f6, interval 123960ms.
Oct  2 01:13:19 mintmate kernel: [1431446.560051] usb 1-9: USB disconnect, device number 12
Oct  2 01:13:19 mintmate kernel: [1431446.576008] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Oct  2 01:13:19 mintmate kernel: [1431446.576010] sd 6:0:0:0: [sdd] tag#0 CDB: Write(10) 2a 00 0d e6 8f 80 00 00 f0 00
Oct  2 01:13:19 mintmate kernel: [1431446.576011] print_req_error: I/O error, dev sdd, sector 233213824
Oct  2 01:13:19 mintmate kernel: [1431446.576072] EXT4-fs warning (device sdd1): ext4_end_bio:323: I/O error 10 writing to inode 40894678 (offset 4169138176 size 8388608 starting block 29152512)
Oct  2 01:13:19 mintmate kernel: [1431446.576074] buffer_io_error: 8182 callbacks suppressed
Oct  2 01:13:19 mintmate kernel: [1431446.576074] Buffer I/O error on device sdd1, logical block 29151744
Oct  2 01:13:19 mintmate kernel: [1431446.576077] Buffer I/O error on device sdd1, logical block 29151745
Oct  2 01:13:19 mintmate kernel: [1431446.576078] Buffer I/O error on device sdd1, logical block 29151746
Oct  2 01:13:19 mintmate kernel: [1431446.576079] Buffer I/O error on device sdd1, logical block 29151747
Oct  2 01:13:19 mintmate kernel: [1431446.576080] Buffer I/O error on device sdd1, logical block 29151748
Oct  2 01:13:19 mintmate kernel: [1431446.576081] Buffer I/O error on device sdd1, logical block 29151749
Oct  2 01:13:19 mintmate kernel: [1431446.576082] Buffer I/O error on device sdd1, logical block 29151750
Oct  2 01:13:19 mintmate kernel: [1431446.576083] Buffer I/O error on device sdd1, logical block 29151751
Oct  2 01:13:19 mintmate kernel: [1431446.576084] Buffer I/O error on device sdd1, logical block 29151752
Oct  2 01:13:19 mintmate kernel: [1431446.576085] Buffer I/O error on device sdd1, logical block 29151753
Oct  2 01:13:19 mintmate kernel: [1431446.576290] EXT4-fs warning (device sdd1): ext4_end_bio:323: I/O error 10 writing to inode 40894678 (offset 4169138176 size 8388608 starting block 29153024)
Oct  2 01:13:19 mintmate kernel: [1431446.576515] EXT4-fs warning (device sdd1): ext4_end_bio:323: I/O error 10 writing to inode 40894678 (offset 4169138176 size 8388608 starting block 29153536)
Oct  2 01:13:19 mintmate kernel: [1431446.576704] EXT4-fs warning (device sdd1): ext4_end_bio:323: I/O error 10 writing to inode 40894678 (offset 4177526784 size 8388608 starting block 29154048)
Oct  2 01:13:19 mintmate kernel: [1431446.576871] EXT4-fs warning (device sdd1): ext4_end_bio:323: I/O error 10 writing to inode 40894678 (offset 4177526784 size 8388608 starting block 29154560)
Oct  2 01:13:19 mintmate kernel: [1431446.577039] EXT4-fs warning (device sdd1): ext4_end_bio:323: I/O error 10 writing to inode 40894678 (offset 4177526784 size 8388608 starting block 29155072)
Oct  2 01:13:19 mintmate kernel: [1431446.577216] EXT4-fs warning (device sdd1): ext4_end_bio:323: I/O error 10 writing to inode 40894678 (offset 4177526784 size 8388608 starting block 29155584)
Oct  2 01:13:19 mintmate kernel: [1431446.577374] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Oct  2 01:13:19 mintmate kernel: [1431446.577375] sd 6:0:0:0: [sdd] tag#0 CDB: Write(10) 2a 00 0d e6 90 70 00 00 f0 00
Oct  2 01:13:19 mintmate kernel: [1431446.577375] print_req_error: I/O error, dev sdd, sector 233214064
Oct  2 01:13:19 mintmate kernel: [1431446.577377] EXT4-fs warning (device sdd1): ext4_end_bio:323: I/O error 10 writing to inode 40894678 (offset 4169138176 size 8388608 starting block 29152000)
Oct  2 01:13:19 mintmate kernel: [1431446.577673] EXT4-fs warning (device sdd1): ext4_end_bio:323: I/O error 10 writing to inode 40894678 (offset 4185915392 size 4198400 starting block 29156096)
Oct  2 01:13:19 mintmate kernel: [1431446.578112] EXT4-fs warning (device sdd1): ext4_end_bio:323: I/O error 10 writing to inode 40894678 (offset 4185915392 size 6295552 starting block 29156608)
Oct  2 01:13:19 mintmate kernel: [1431446.579199] JBD2: Detected IO errors while flushing file data on sdd1-8
Oct  2 01:13:19 mintmate kernel: [1431446.579202] Aborting journal on device sdd1-8.
Oct  2 01:13:19 mintmate kernel: [1431446.579210] JBD2: Error -5 detected when updating journal superblock for sdd1-8.
Oct  2 01:13:19 mintmate kernel: [1431446.580009] EXT4-fs error (device sdd1) in ext4_reserve_inode_write:5875: Journal has aborted
Oct  2 01:13:19 mintmate kernel: [1431446.580122] EXT4-fs error (device sdd1): mpage_map_and_submit_extent:2594: comm kworker/u8:2: Failed to mark inode 40894678 dirty
Oct  2 01:13:19 mintmate kernel: [1431446.580124] EXT4-fs (sdd1): previous I/O error to superblock detected
Oct  2 01:13:19 mintmate kernel: [1431446.580130] EXT4-fs error (device sdd1) in ext4_writepages:2923: Journal has aborted
Oct  2 01:13:19 mintmate kernel: [1431446.580132] EXT4-fs (sdd1): previous I/O error to superblock detected
Oct  2 01:13:19 mintmate kernel: [1431446.580374] JBD2: Detected IO errors while flushing file data on sdd1-8
Oct  2 01:13:19 mintmate kernel: [1431446.583085] EXT4-fs (sdd1): previous I/O error to superblock detected
Oct  2 01:13:19 mintmate kernel: [1431446.583098] EXT4-fs error (device sdd1): ext4_journal_check_start:61: Detected aborted journal
Oct  2 01:13:19 mintmate kernel: [1431446.583099] EXT4-fs (sdd1): Remounting filesystem read-only
Oct  2 01:13:19 mintmate kernel: [1431446.583100] EXT4-fs (sdd1): previous I/O error to superblock detected
Oct  2 01:13:19 mintmate kernel: [1431446.583107] EXT4-fs (sdd1): ext4_writepages: jbd2_start: 9223372036854775807 pages, ino 40894678; err -30
Oct  2 01:13:20 mintmate udisksd[959]: Unmounted /dev/sdd1 on behalf of uid 1000
Oct  2 01:13:20 mintmate upowerd[2378]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb1/1-9/1-9:1.0
Oct  2 01:13:20 mintmate upowerd[2378]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:14.0/usb1/1-9

rknichols · 10-03-2021, 01:16 PM

What file manager?

After rsync completes there can be a lot of data still in kernel buffers needing to be flushed out to the device. I know that the Nautilus file manager will pop up a window warning that the data transfer is in progress. Does yours not do that?

oneleaf · 10-03-2021, 01:23 PM

Quote:

Originally Posted by rknichols

What file manager?

After rsync completes there can be a lot of data still in kernel buffers needing to be flushed out to the device. I know that the Nautilus file manager will pop up a window warning that the data transfer is in progress. Does yours not do that?

It's Caja (which is the default file manager for Mate). I've had pop-ups indicating that data transfer is still in progress, but it didn't happen this time, which I found puzzling.

rknichols · 10-03-2021, 01:53 PM

By any chance are you doing this in a virtual machine? I just tried it that way (CentOS 8 VM with Mate desktop), and the VM considers the transfer complete while the host is still flushing buffers. I don't know of any simple fix for that.

If you are running on the "bare metal", this sounds like a bug in Caja. That notification can be important, since it can take anywhere from a few seconds to several minutes to flush those buffers. The time depends on the amount of memory available for buffers and the speed of the target device (think: 10GB of dirty buffers and a 25MB/s slow USB flash drive -- do the math).

oneleaf · 10-03-2021, 02:15 PM

I'm running it on bare metal. It sure does seem like a Caja bug. I was quite disturbed by the behavior, because it called into question the reliability of rsync (i.e. did it have a silent error), or possible hardware problems (i.e. was it file corruption on the USB hard disk). But looking at the log, it does seem like Rsync did its job and that the hard drive was still doing what it was supposed to do, when Caja mistakenly made me feel like I could unplug the device.

I don't know enough about rsync and the EXT4 filesystem to know, but I am surprised that rsync shows a successful transfer when the data is still vulnerable to a situation like this though.

Thanks for the insight, I'm glad to have a better understanding of where the problem resides.

rknichols · 10-03-2021, 10:33 PM

Allowing the kernel to decide when, and in what order, to complete I/O requests is important to the performance of the I/O subsystem. And, this is independent of the type of filesystem. There are a few commands (dd is the only one that comes to mind) that issue an fsync() after a transfer and do not exit until that returns. Commands like rsync that write to multiple files really can't do that. An fsync() after each file would seriouly hurt performance, and a global sync() at the end would affect the whole system, not just the files written by that process.

There is an interesting story about a clever programmer who thought he was protecting himself against data loss by patching an editor to issue a sync() call after every line. This was on a system with a large number of users. Whenever he was using that program, performance of the whole system would grind to a near halt.

dugan · 10-04-2021, 12:49 AM

I've definitely seen this. This is the only interesting thing I got when I googled for it:

https://www.reddit.com/r/linuxquesti...ovable_device/

My guess (I have definitely not confirmed this) is that when you click Eject in the graphical file manager, it returns control to the user immediately, it runs the sync operation in the background, and you're expected to wait for a "It is now safe to eject this device" message to pop up.

SlowCoder · 10-04-2021, 01:35 PM

Sucks to potentially lose gigs of data.

I built an rsync backup script. At the end of the script, I run the sync command to force all cached data to be written. Only after sync has returned do I consider the backup to be complete, and I can unmount my device.

Quote:

Originally Posted by dugan

... and you're expected to wait for a "It is now safe to eject this device" message to pop up.

Not global. I use Cinnamon desktop, and when I unmount via the GUI, the icon disappears. There is no message.

dugan · 10-04-2021, 02:25 PM

Quote:

Originally Posted by SlowCoder

I use Cinnamon desktop, and when I unmount via the GUI, the icon disappears. There is no message.

I think you mean there is no immediate message.

Are you sure you won't get one later?

ondoho · 10-05-2021, 12:57 AM

If this is a single user system I'd still recommend to simply add a

Code:

sync

to the end of the script, and also unmount it via command line after that.

If that doesn't help/solve the mystery, I'd say you have a different problem. To me, your log messages don't look like the sort of error you see when reconnecting a previously yanked out peripheral. Maybe the drive itself is on its last leg?

SlowCoder · 10-07-2021, 12:11 PM

Quote:

Originally Posted by dugan

I think you mean there is no immediate message.

Are you sure you won't get one later?

No, there is no message.

sundialsvcs · 10-12-2021, 06:55 PM

My guess is that you have a hard-drive or an interface that is dying. I don't know what "error code 10" is, but I'd find out right away and I'll say that it looks bad. I suspect that the actual problem isn't that buffered data failed to write, but that "some of the writes failed, period." I don't know why you didn't get a failure message at the time.

allend · 10-12-2021, 08:20 PM

Quote:

Oct 2 01:13:19 mintmate kernel: [1431446.579199] JBD2: Detected IO errors while flushing file data on sdd1-8
Oct 2 01:13:19 mintmate kernel: [1431446.579202] Aborting journal on device sdd1-8.
Oct 2 01:13:19 mintmate kernel: [1431446.579210] JBD2: Error -5 detected when updating journal superblock for sdd1-8.

Seems clear that the USB drive was unplugged without being unmounted first.

GazL · 10-14-2021, 04:45 AM

In the old days usb sticks all had activity lights on then to give you a clue they were still in use. I guess manufacturers wanted to save the cost of a LED as most modern ones seem to have done away with them.

'udisks --detach' on the cli has never let me down (yes, its the older udisks version but I prefer it to udiskctl). I wouldn't trust a GUI for this sort of operation.

sundialsvcs · 10-14-2021, 09:16 AM

I would consider it a very serious bug if a GUI "dismount" command returned to you before it did a sync to ensure that all of the data was actually written to the disk and the volume had actually been dismounted by the OS. When the "wait cursor" disappears and the icon vanishes from the desktop, you have every reason to assume that the process is actually finished.