LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Clone multiply-claimed blocks (https://www.linuxquestions.org/questions/linux-server-73/clone-multiply-claimed-blocks-4175714269/)

mfoley 07-05-2022 10:10 AM

Clone multiply-claimed blocks
 
I ran 'fsck.ext4 -f -y /dev/md0' on my RAID-6 array. Why am I getting "Clone multiply-claimed blocks" errors? This fsck has been running for 14 hours, with the last shown "Clone" error for 2202-01-01-pensionFilesFullbackup.tar.bz2 being display 6 hours ago.

Code:

Pass 1D: Reconciling multiply-claimed blocks
(There are 8 inodes containing multiply-claimed blocks.)

File /Backups/MAIL/2021-11-30-MAILfullbackupUSR.tar.bz2 (inode #10657818, mod time Wed Dec
  has 4449 multiply-claimed block(s), shared with 1 file(s):
        ... (inode #437919747, mod time Sat Jul  2 20:10:42 2022)
Clone multiply-claimed blocks? yes

File /Backups/MAIL/2021-11-30-MAILfullbackupSYS.tar.bz2 (inode #10657869, mod time Wed Dec
  has 4252 multiply-claimed block(s), shared with 1 file(s):
        ... (inode #437919747, mod time Sat Jul  2 20:10:42 2022)
Clone multiply-claimed blocks? yes


File /Backups/SQLServerBackup/Quarterly/master/master_backup_20220701201002.bak (inode #1070:03 2022)
  has 996 multiply-claimed block(s), shared with 1 file(s):
        /Backups/public/2022-01-01-publicFullBackup.tar.bz2 (inode #407691271, mod time Sun
Clone multiply-claimed blocks? yes

File /Backups/PensionFiles/2022-01-01-pensionFilesFullBackup.tar.bz2 (inode #12787733, mod
  has 9980 multiply-claimed block(s), shared with 1 file(s):
        ... (inode #437919747, mod time Sat Jul  2 20:10:42 2022)
Clone multiply-claimed blocks? yes


mfoley 07-06-2022 03:01 PM

No one has anything on this, eh? The fsck on the RAID has been running for 37 hours thus far. It this normal?

I'm beginning to wonder if a 4-drive RAID-6 is doing what was intended. A RAID may guard against hardware failures in one of it's members, but a corruption in the filesystem apparently completely negates the RAID benefit. I'm thinking about converting these 4 drives to two RAID-1s, one being the main online device and the other being the target of a periodic rsync to clone the production drive. That way, if the production drive file system get corrupted the mirror could be used. This would be a heck-of-a-lot quicker than fsck'ing for 4 or more days with possible data loss as well. I could have rebuilt this RAID from scratch in less time!

rknichols 07-06-2022 10:05 PM

RAID has absolutely nothing to do with filesystem corruption or any other cause of data loss or corruption originating from higher up in the software/firmware stack. RAID levels higher than RAID-0 protect against ONE cause of data loss (disk failure). If the OS writes bad metadata to the filesystem, all any level of RAID can do is faithfully record that.

As for the time taken by the fsck, how big is that filesystem? Judging by those large inode numbers, I'd guess it's pretty big.

syg00 07-06-2022 10:45 PM

A single backup is subject to the same frailties as the source. Multiples has always been the answer.

As for fsck - it is designed to ensure the integrity of the filesystem, not specifically the files within. If you have multiply-linked blocks in a tar backup and you don't know which file wrote to those blocks last, the tar is too suspect to be any use. Scrub the lot and restore what you have.

mfoley 07-07-2022 01:27 AM

rknichols and syg00: Yes, It is dawning on me that RAID is not a silver bullet against all types of failures as I concluded in my post #2. I did have multiple backups of the production data, which I restored to a different drive, and the office is happily using that w/o problem. I am also backing that up every 20 minutes to both local and offsite storage.

The main other thing this RAID is used for is storing backups going back many years according to retention policy, the most important of which are also on other backups (done quarterly and stored on external USBs in a fireproof safe). So far, as shown by my initial post, the files with multiply claimed block are temporary backups kept for no more than a year. So far fsck has found only 7 such files after 47 hours of running, but more may crop up as the fsck progresses.

Here's another thought: since a RAID-6 can physically lose two of the four members and supposedly not lose data, would pulling two of the drives make fsck go faster?

While I may in the end "Scrub the lot and restore what you have" per syg00's suggestion, my current plan is to go ahead and let it keep grinding through the weekend and see if it completes. Since the affected files are tarfiles, I should be able to 'tar -tv' them and see if they are OK or not and if not, I can go ahead and delete them. My hope is that most of the other tar and zip files on the driver are OK and I can verify all of them in the same way as well.

Then, I'll see about implementing my own suggestion of breaking these up into two sets of RAID-1s and rsync'ing one to the other -- being first sure to do a check on each RAID to make sure the filesystem is w/o errors before copying.

I'll go ahead and leave this thread open for a while and post my progress. If anyone has a better idea than my two-RAID-mirror idea, please speak up!

rknichols 07-07-2022 08:13 AM

Quote:

Originally Posted by mfoley (Post 6366163)
Here's another thought: since a RAID-6 can physically lose two of the four members and supposedly not lose data, would pulling two of the drives make fsck go faster?

No. fsck just checks the one virtual device presented by the RAID driver and is ignorant of the RAID structure, and the RAID driver is just going to satisfy the read from one device. fsck wouldn't even detect mismatch of the mirrored devices unless it happened to get data from the device that was wrong. That type of error is detected by scrubbing the array, not by fsck.

mfoley 07-08-2022 03:25 PM

The fsck finally did finish at 3:00am yesterday, so about 48 hours to run this. Only 4 files were affected by the "Multiply-claim blocks error. I've removed those. They were temporary, short term backup files, so no big deal. I ran fsck again, just to be sure. I've done 'unzip -t' and am now running 'tar -tf' on all the rest of the backup files on that drive to be as sure as possible that everything else is OK.

When that's done, I intend to convert the RAID-6 to RAID-5 (note that I've changed my mind from having two RAID-2s). Here's what I propose and here's where I could use some expert LQ feedback:
Code:

mdadm --fail /dev/md0 /dev/sdd1
mdadm --remove /dev/md0 /dev/sdd1
mdadm --grow /dev/md0 --level=raid5 --raid-devices=3 --backup-file=/root/mdadm-backupfile

I have 4 hot-swap bays with four, 4TB drives.

I'm planning on removing sdd1 for two reasons:

1) to insure that mdadm builds the RAID-5 with sda1, sdb1, and sdc1. The --grow examples I've found do not specify the physical drives so I don't know if mdadm will just pick the 3-of-4 in alphabetic order, or random.

2) Freeing up the 4th bay will let me put an 8TB drive in that bay and I'll do an rsync backup of the RAID-6 to this drive before converting to RAID-5.

Does this seem reasonable?

1 week later ...

The grow finally finished after 6 days! I now have a 8TB RAID-5 with 3 disks. The 4th bay is an 8TB normal drive. I back up the RAID to the backup drive twice daily. I think I've finally got "belt and suspenders". The last thing I needed to do was convert the RAID file system to ext4. For whatever reason it was ext2. I did the following:
Code:

umount /mnt/md0
tune2fs -O has_journal,dir_index,filetype,extent,flex_bg,sparse_super,large_file,uninit_bg,dir_nlink,extra_isize /dev/md0
fsck.ext4 -f /dev/md0

Everything works. I hope this info proves useful to someone.


All times are GMT -5. The time now is 05:13 AM.