Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
My raid arrays stopped assembling. I have a nested configuration of:
/dev/sda1 + /dev/sdb1 in raid0 > /dev/md132
/dev/sde1 + /dev/sdc1 in raid0 > /dev/md131
/dev/md132 + /dev/md131 + /dev/sdd1 in raid5 > /dev/md128.
Both raid0 arrays refuse to assemble, and (I guess consequently,) /dev/md128 also failes.
The arrays worked fine for weeks before the current issue. I am on Fedora 30 and yesterday I did a dnf update, and did a manual shutdown (with the shutdown command).
Some mdadm diagnostic output:
Code:
[root@piglet ~]# mdadm --detail /dev/md132
/dev/md132:
Version : 1.2
Creation Time : Tue Sep 10 18:13:21 2019
Raid Level : raid0
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Tue Sep 10 18:13:21 2019
State : active, FAILED, Not Started
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Consistency Policy : unknown
Name : piglet:132 (local to host piglet)
UUID : 360771e4:018d3478:81883fa3:a6b5f578
Events : 0
Number Major Minor RaidDevice State
- 0 0 0 removed
- 0 0 1 removed
- 8 1 0 sync /dev/sda1
- 8 17 1 sync /dev/sdb1
Code:
mdadm --detail /dev/md131
/dev/md131:
Version : 1.2
Creation Time : Wed Jun 7 19:38:24 2017
Raid Level : raid0
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Wed Jun 7 19:38:24 2017
State : active, FAILED, Not Started
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Consistency Policy : unknown
Name : piglet:131 (local to host piglet)
UUID : 1dea08ea:326b7b82:1430d1bc:1c2fac1c
Events : 0
Number Major Minor RaidDevice State
- 0 0 0 removed
- 0 0 1 removed
- 8 65 0 sync /dev/sde1
- 8 33 1 sync /dev/sdc1
Code:
[root@piglet ~]# mdadm --detail /dev/md128
/dev/md128:
Version : 1.2
Raid Level : raid0
Total Devices : 1
Persistence : Superblock is persistent
State : inactive
Working Devices : 1
Name : piglet:128 (local to host piglet)
UUID : 1a4e32eb:1cd1a4bf:122e69cc:c9e996c9
Events : 68089
Number Major Minor RaidDevice
- 8 49 - /dev/sdd1
Code:
[root@piglet ~]# mdadm --examine /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 360771e4:018d3478:81883fa3:a6b5f578
Name : piglet:132 (local to host piglet)
Creation Time : Tue Sep 10 18:13:21 2019
Raid Level : raid0
Raid Devices : 2
Avail Dev Size : 3906762752 (1862.89 GiB 2000.26 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264112 sectors, after=0 sectors
State : clean
Device UUID : 01d01cd7:06904ff9:313e8481:579b89e1
Update Time : Tue Sep 10 18:13:21 2019
Bad Block Log : 512 entries available at offset 8 sectors
Checksum : 96fb86ce - correct
Events : 0
Chunk Size : 512K
Device Role : Active device 0
Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
Code:
[root@piglet ~]# mdadm --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 360771e4:018d3478:81883fa3:a6b5f578
Name : piglet:132 (local to host piglet)
Creation Time : Tue Sep 10 18:13:21 2019
Raid Level : raid0
Raid Devices : 2
Avail Dev Size : 1953257472 (931.39 GiB 1000.07 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before=264112 sectors, after=0 sectors
State : clean
Device UUID : 330150e2:948d0372:5211b844:1b9a5da0
Update Time : Tue Sep 10 18:13:21 2019
Bad Block Log : 512 entries available at offset 8 sectors
Checksum : 287a6773 - correct
Events : 0
Chunk Size : 512K
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
I am skipping the other outputs for mdadm --examine /dev/sd*1, because I think the problems should have a common root, and this information should be enough.
Stopping and assembling the raid0 array gives an error:
Since two raid0 devices broke at the same time, I would guess that the cause is either a software problem (driver?), or caused by the update, or the manual shutdown.
Any help on how to get the arrays running again, or how to diagnose is very much appreciated.
I found the source of the problem, but still have questions.
Code:
dmesg -T |grep raid0
[Sat Oct 19 10:03:27 2019] md/raid0:md131: cannot assemble multi-zone RAID0 with default_layout setting
[Sat Oct 19 10:03:27 2019] md/raid0: please set raid.default_layout to 1 or 2
[Sat Oct 19 10:03:27 2019] md/raid0:md132: cannot assemble multi-zone RAID0 with default_layout setting
[Sat Oct 19 10:03:27 2019] md/raid0: please set raid.default_layout to 1 or 2
This means the cause is indeed the kernel patch (link in previous post), that disables assembling when it is not explicitly set what layout the array should use. Apparently it is technically not possible to automatatically detect the layout (for disassembled arrays?), but it is beyond my understanding why the update cannot automatically detect the layout in systems where the array was working fine (so set the parameter before applying the patch). When working, the driver is using a specific layout, right? And the kernel version is also known. This way of patching will make, or already has made many systems unbootable.
To make things worse, the documentation of how and where to set the layout is confusing:
- First, in contrast with the error message, the kernel parameter should be raid0.default_layout (note the 0), not raid.default_layout. I think this will be corrected in future updates.
- Second, the description of the patch mentions values of 0 and 1, but the possible parameter values have been changed to 0, 1, and 2, where 0 means 'not set' (so array will not assemble), but the meaning of 1 and 2 is not properly documented.
I also read a suggestion that it should be possible to set the parameter to 1 or 2 (randomly), assemble the system with --readonly, mounting it and seeing whether all files are there (use at your own risk). However, I cannot do that because md131 and md132 are parts of a raid5 array, and I am not sure how I could test them individually.
My plan is to set the paramters and assemble all devices in readonly mode first. Hopefully that provides enough safety for the case that the wrong parameter is used.
The plan:
1. Set the parameter to 2 (as that is more likely to be correct)
6. Mount /dev/md128 (also read-only) and see whether the files look normal. It's a 6TB device, so a full check will be impossible.
7. Manually iniate a raid-check on /dev/md128. Is that possible for read-only devices? I only need a confirmation that all raid5 devices are in sync. If they are not in sync, I don't want it to be corrected (changed), but probably I should assemble the raid0 devices with parameter 1.
8. If any of the steps 2-7 fails, then maybe try parameter 1 (depending on the nature of the failure).
9. If everything went fine, make the paremeter setting permanent. Make documentation about the correct parameter value, because if the system crashes and the arrays are moved to a new system, the question will come back. (And in my opinion the layout should be stored in the meta-data of the array.)
Before I execute the plan, I would highly appreciate an expert opinion on whether this would be a safe and a working plan.
[caveat]not an expert, but I feel your pain[/caveat]
However I'm not short of opinions, so here's a few:
- kudos for your persistence
- that's amongst the whackiest RAID constructions I could imagine
- RAID0 is just asking for trouble
- data not backed up is by definition not worth the trouble
- whoever let this patch through needs their arse kicked forever
- few (none ?) of us will be able to offer relevant advice without direct experience.
- have you tried booting from one of the prior kernels Fedora maintains by default
If it t'was me I'd take an image of all the partitions and do all your fiddling from another system using the images. I keep old systems for just this sort of circumstance - doesn't need to be latest-and-greatest. Even a liveCD would probably work, but you'd need to do the dnf update.
Thanks. Actually, I am writing here in this much detail, because I expect that many people will experience the same problem, and will find that the information that can be currently found is lacking. Hopefully, more helpful information will be found here, eventually.
Quote:
- that's amongst the whackiest RAID constructions I could imagine
- RAID0 is just asking for trouble
- data not backed up is by definition not worth the trouble
Those are different discussions. For me it provides a meaningful trade-off between disk capacity and risk. The raid array in question is for me an intermediate backup step between files in daily use and an off-line backup.
Quote:
- whoever let this patch through needs their arse kicked forever
Yeah ... I think it was a mistake with quite some impact. At the same time these are probably the people who normally contribute to the proper working of our systems.
Quote:
- few (none ?) of us will be able to offer relevant advice without direct experience.
The main question are:
- Whether the --readonly option provides enough protection to prevent ruining the array if the wrong parameter is chosen.
- How the resulting array can be verified (through sync/scrub) in read-only mode.
I would expect that people who know more about the internals of mdadm can say something about this, even if not explicitly tested in relation with this patch.
Quote:
- have you tried booting from one of the prior kernels Fedora maintains by default
I haven't. I could, and it would probably work, but then what? It did work with the kernel before my last update, so I know that already, but I don't know how to benefit from that knowledge.
Quote:
If it t'was me I'd take an image of all the partitions and do all your fiddling from another system using the images. I keep old systems for just this sort of circumstance - doesn't need to be latest-and-greatest. Even a liveCD would probably work, but you'd need to do the dnf update.
I can't take an image, because the disks are too large, and I simply don't have a spare set of disks of this size lying around (array size is 6TB).
Though, I could test with creating another raid0 array while raid0.default_layout=1, stop the array, change the parameter to 0, and try to assemble it (which will fail), and change the parameter to 2 and try to assemble it and test it. Then change the parameter to 0 etc. and then to 1 and test it again. I have a VM for these kind of tests. I will need a bit of time for that. I will report back. Thanks for the idea.
My purpose was to verify that setting the raid0.default_layout parameter to the wrong value and assembling the array with the --readonly option will do no harm to the array (so it can be re-assembled with the correct setting later), except that assembling with the wrong parameter value will lead to some sort of failure. However, the array seems to work even with the wrong parameter.
This is what I did:
1. Create two partitions with *different* sizes: /dev/sdb1 (200MiB), /dev/sdb2 (400MiB)
2. Create array
Code:
mdadm --create /dev/md0 --level=raid0 --raid-devices=2 /dev/sdb1 /dev/sdb2
mdadm: Defaulting to version 1.2 metadata
mdadm: RUN_ARRAY failed: Unknown error 524
Okay, so I probably need to set the kernel parameter first. Error message could have been more descriptive.
mkfs.ext4 /dev/md0
mke2fs 1.44.6 (5-Mar-2019)
Creating filesystem with 523264 4k blocks and 130816 inodes
Filesystem UUID: fc54fa2f-c9cb-4467-b3a0-08c60be6ae5d
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
6. Mount device on a dir named test:
Code:
mount /dev/md0 test
7. Generate random files to fill up the whole device. A copy of the files will be stored on another device in folder test_copy, and will be used to check the devices after re-assembling.
Code:
for i in {1..600}; do dd if=/dev/urandom bs=1M count=1 of=test/file$i; done
...
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0145325 s, 72.2 MB/s
...
dd: failed to open 'test/file558': No space left on device
cp -rv test/* test_copy
8. Check for difference, just to be sure that the initial state is correct.
Code:
diff -rq test test_copy
(no output means they are identical)
Okay, now let's try to ruin the array by setting the parameter different than 1 (as set before the array creation).
10. Unset default_layout parameter and try to assemble
Code:
echo 0 > /sys/module/raid0/parameters/default_layout
mdadm --assemble --readonly /dev/md0
mdadm: failed to RUN_ARRAY /dev/md0: Unknown error 524
dmesg -T |grep raid0
Sat Oct 19 22:27:28 2019] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting
[Sat Oct 19 22:27:28 2019] md/raid0: please set raid.default_layout to 1 or 2
So, this correctly refuses to assemble, because the default_layout has not been specified.
11. Now set it to 2.
Code:
echo 2 > /sys/module/raid0/parameters/default_layout
mdadm --assemble --readonly /dev/md0
mdadm: /dev/md0 has been started with 2 drives.
It assembles,
12. but doest it work?
Code:
mount /dev/md0 test
mount: /mnt/test: WARNING: device write-protected, mounted read-only.
diff -r test test_copy
No difference found, so the files are identical. This confuses me, because I though a wrong parameter would imply ruining the array. I also tried the other way around, setting the parameter to 2 before creation and stop the array, set the parameter to 1 and assemble the array, and also that seemed to lead to a working array, which is also unexpected.
I don't know what is going on. Or is my testing procedure wrong? Maybe my way of creating the raid0 array is not complicated enough? I also did a similar test with three devices of different sizes (100M, 200M, 300M), and that lead to the same result.
I thought the patch makes systems unbootable, because manual intervention and sysadmin's wisdom is needed to select the only right value for the parameter (and a wrong value may ruin the data forever), and now it looks like that the value doesn't matter?
The link says the situation can lead to corruption, not that it (always) will.
If this is as you say merely transient data, erase the array, and rebuild it from scratch on latest kernel. Get on with life. Just hope they don't revert the change at some future time.
Yes, you (we) will remain in ignorance, but is it really worth the angst to find out ?.
I re-assembled the array with the default_layout parameter set to 2. Comparing with a recent backup showed that the data is fine.
Just re-creating the raid0 array, without understanding the meaning of the parameters, makes no sense. A choice has to be made between 1 and 2 otherwise the array will not even be created. You better know why you chose the red or the blue pill.
Please note that on the kernel raid mailing list there are messages asking for better documentation. I am marking this issue as solved, because I don't expect more info here.
Ran into the same issue after upgrading to Fedora 31 and having a disk crash just afterwards
Tried to recreate a raid0 array with a new drive as part of the array and boom same issue as BearTom. Just wanted to thank BearTom for his research in this matter.
Set my raid0.default_layout to 2 and rebuilt my grub2 config and away she went.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.