LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-07-2023, 04:16 PM   #1
WizadNoNext
Member
 
Registered: Nov 2009
Posts: 140

Rep: Reputation: 9
RAID6 on top of LVM failed with md/raid:mdX: not enough operational devices (3/8 failed)


Hello.

I have had flaky power connection on my 8 SAS HDDs RAID6 array
This led to 3 LVs to fail within hour
The data is all on drives. I just cannot get it to work either with LVM commands or dmsetup commands.
I was successful at assembling RAID6 using both LVM and dmsetup commands by hand.
This is actually Thin Provisioned LV. I can assemble metadata, but not data.

I even tried to stop LVs using
Code:
mdadm --stop
but it just complained it is not md device (go figure - it very much looks like md on top of lvm)

I am quite desperate and thinking that I shall proceed and do recovery manually. Just don't have idea how to deal with RAID. Dealing with Thin Provisioned LV would be quite simple by converting XML file with awk and running dd script.

I know it should be, by all means, possible to repair this within minutes, not hours on days.

I just didn't find it yet.

Any suggestion may help.
 
Old 02-07-2023, 07:51 PM   #2
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
You probably have LVM on top of RAID.
The raid array failed due to losing disks.
An 8 disk RAID 6 array should be able to lose 2 devices and still function in a degraded state. Thus, before you can access the data you need to use mdadm and recover the array to a functional state.

Use the man pages for mdadm to identify how to recover the array. It will need to have 6 or more functional devices before it can become active.
"cat /proc/mdstat" should tell you the status of the array and the status of each member device. You then can use mdadm to recover the devices that are not working at present.

However, you stated that you have a power problem. It would seem critical that you fix the cause of this failure rather than attempting to recover the array while the power problem exists. Failure to fix the cause first may backfire and cause more problems with the array.
 
Old 02-07-2023, 11:49 PM   #3
WizadNoNext
Member
 
Registered: Nov 2009
Posts: 140

Original Poster
Rep: Reputation: 9
Power problem is fixed.
I haven't tried mdadm.
I shall give it a go today evening (UK time) and see.
Hopefully I can assemble it in read-only mode (I want to avoid wires to it, drives are read-write)
 
Old 02-08-2023, 03:14 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Quote:
Originally Posted by WizadNoNext View Post
I know it should be, by all means, possible to repair this within minutes, not hours on days.
One has to admire your optimism.
Very likely to be misplaced.

You have to get the underlying architecture sorted first (LVM on mdadm, or mdadm on LVM ?), then deal with the thin provisioning - which appears to be particularly fragile in error situations.

Good luck.
 
Old 02-08-2023, 01:30 PM   #5
WizadNoNext
Member
 
Registered: Nov 2009
Posts: 140

Original Poster
Rep: Reputation: 9
This is pure LVM.
No MD or DM array was created in explicit means. All commands were LVM tools commands.

There was minimal amount of writes to this array
This is NextCloud instance with just 2 users
This happened between 22:00 and 23:00, so not much, if any data was written
I have seen 8 failed writes to filesystem
That is not much. I can live with 8 writes lost.

The HDDs are perfectly fine. Nothing is wrong with those.
I am going to duplicate those, but I don't have another 8 SAS HDDs and I don't trust SATA HDDs in RAID situations.
 
Old 02-08-2023, 02:12 PM   #6
WizadNoNext
Member
 
Registered: Nov 2009
Posts: 140

Original Poster
Rep: Reputation: 9
Can I get some more information about failure?
Some form of kernel debugging?
Anybody knows relevant command or kernel line command?
Documentation is absolutely massive, I have read it at least twice years ago and it was there. Now I cannot find file in documentation in kernel source
 
Old 02-08-2023, 03:39 PM   #7
WizadNoNext
Member
 
Registered: Nov 2009
Posts: 140

Original Poster
Rep: Reputation: 9
I am trying to assemble or create this existing raid6 array using dmsetup
I am failing to see any error in table syntax
but I always get
Code:
raid: Cannot understand number of raid devices parameters (-EINVAL)
in kernel log
I get this even if I would create another linear dm device on top of lvm raid6 component devices.
I know for fact that problem is with rimage4 rimage5 and rimage6
All of those are readable
I can duplicate those on other drives no problem
It seems that kernel is simply refusing to use those in raid6 for some reason. Maybe they are marked busy or dirty?
How to check it and clear it?
I cannot find it.
 
Old 02-09-2023, 02:15 PM   #8
WizadNoNext
Member
 
Registered: Nov 2009
Posts: 140

Original Poster
Rep: Reputation: 9
I am still investigating
I have got such log (just excerpt) from
Code:
lvchange ThinVG/ThinLV
Code:
19:49:02.991204 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rmeta_0 (254:98).
19:49:02.991228 lvchange[28034] device_mapper/libdm-common.c:2552  Udev cookie 0xd4dee3b (semid 65551) created
19:49:02.991238 lvchange[28034] device_mapper/libdm-common.c:2572  Udev cookie 0xd4dee3b (semid 65551) incremented to 1
19:49:02.991249 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 2
Code:
19:49:02.991617 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rimage_0 (254:99).
19:49:02.991627 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 3
Code:
19:49:02.992012 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rmeta_1 (254:100).
19:49:02.992023 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 4
Code:
19:49:02.992729 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rimage_1 (254:101).
19:49:02.992770 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 5
Code:
19:49:02.993402 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rmeta_2 (254:102).
19:49:02.993414 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 6
Code:
19:49:02.993841 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rimage_2 (254:103).
19:49:02.993853 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 7
Code:
19:49:02.994265 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rmeta_3 (254:104).
19:49:02.994274 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 8
Code:
19:49:03.3062 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rimage_3 (254:105).
19:49:03.3082 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 9
Code:
19:49:03.4863 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rmeta_4 (254:106).
19:49:03.4893 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 10
Code:
19:49:03.9737 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rimage_4 (254:107).
19:49:03.9752 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 8
Code:
19:49:03.14662 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rmeta_5 (254:108).
19:49:03.14678 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 7
Code:
19:49:03.15395 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rimage_5 (254:109).
19:49:03.15417 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 8
Code:
19:49:03.18786 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rmeta_6 (254:110).
19:49:03.18807 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 7
Code:
19:49:03.22841 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rimage_6 (254:111).
19:49:03.22864 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 5
Code:
19:49:03.25771 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rmeta_7 (254:112).
19:49:03.25781 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 6
Code:
19:49:03.26224 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta_rimage_7 (254:113).
19:49:03.26233 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 7
Code:
19:49:03.67351 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tmeta (254:114).
19:49:03.67397 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 2
Code:
19:49:03.83138 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rmeta_0 (254:115).
19:49:03.83185 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 3
Code:
19:49:03.84717 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rimage_0 (254:116).
19:49:03.84780 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 4
Code:
19:49:03.86509 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rmeta_1 (254:117).
19:49:03.86565 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 5
Code:
19:49:03.88255 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rimage_1 (254:118).
19:49:03.88268 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 6
Code:
19:49:03.88858 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rmeta_2 (254:119).
19:49:03.88870 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 7
Code:
19:49:03.90173 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rimage_2 (254:120).
19:49:03.90184 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 8
Code:
19:49:03.91857 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rmeta_3 (254:121).
19:49:03.91870 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 9
Code:
19:49:03.94323 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rimage_3 (254:122).
19:49:03.94343 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 10
Code:
19:49:03.94846 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rmeta_4 (254:123).
19:49:03.94858 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 11
Code:
19:49:03.96758 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rimage_4 (254:124).
19:49:03.96772 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 11
Code:
19:49:03.97519 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rmeta_5 (254:125).
19:49:03.97539 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 12
Code:
19:49:03.100877 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rimage_5 (254:126).
19:49:03.100891 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 11
Code:
19:49:03.103441 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rmeta_6 (254:127).
19:49:03.103457 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 10
Code:
19:49:03.105216 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rimage_6 (254:128).
19:49:03.105230 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 11
Code:
19:49:03.114615 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rmeta_7 (254:129).
19:49:03.114637 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 10
Code:
19:49:03.122410 lvchange[28034] device_mapper/libdm-deptree.c:1341  Resuming ThinVG-ThinLV_tdata_rimage_7 (254:130).
19:49:03.122499 lvchange[28034] device_mapper/libdm-common.c:2444  Udev cookie 0xd4dee3b (semid 65551) incremented to 7
There are inconsistent "increments", which are either incrementing by one, incrementing by zero, decrementing by one, by two or even by three

rimage_4, rimage_5 and rimage_6 are not loading.
 
Old 02-09-2023, 02:40 PM   #9
WizadNoNext
Member
 
Registered: Nov 2009
Posts: 140

Original Poster
Rep: Reputation: 9
I have found this
Code:
__le64 failed_devices;
in
Code:
drivers/md/dm-raid.c
Maybe I just need to clear such flags?
 
Old 02-09-2023, 03:16 PM   #10
WizadNoNext
Member
 
Registered: Nov 2009
Posts: 140

Original Poster
Rep: Reputation: 9
If it would help at all
I have dig kernel log
Code:
Feb 04 22:45:01 Bedroom kernel: md/raid:mdX: Disk failure on dm-110, disabling device.
Feb 04 22:45:01 Bedroom kernel: md/raid:mdX: Operation continuing on 7 devices.
Code:
Feb 04 23:19:41 Bedroom kernel: md: super_written gets error=-5
Feb 04 23:19:41 Bedroom kernel: md/raid:mdX: Disk failure on dm-108, disabling device.
Feb 04 23:19:41 Bedroom kernel: md/raid:mdX: Operation continuing on 6 devices.
Feb 04 23:19:41 Bedroom lvm[694]: WARNING: Device #4 of raid6_zr array, ThinVG-ThinLV_tdata, has failed.
Code:
Feb 04 23:59:04 Bedroom kernel: md: super_written gets error=-5
Feb 04 23:59:04 Bedroom kernel: md/raid:mdX: Disk failure on dm-112, disabling device.
Feb 04 23:59:04 Bedroom kernel: md/raid:mdX: Cannot continue operation (3/8 failed).
 
Old 02-09-2023, 05:21 PM   #11
WizadNoNext
Member
 
Registered: Nov 2009
Posts: 140

Original Poster
Rep: Reputation: 9
I have recovered this
It was 5 minutes fix
I just needed to clear failed bits in metadata

Please don't assume you know everything
I assumed it would be such case, but that was educated guess

I have lost a bit data, not much, but just enough to not be able to start MariaDB
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to open file with *.mdx extension (mdict dict.)? Xeratul Linux - Software 1 05-18-2022 08:18 PM
which mdx number is for real ? helen314 Linux - Newbie 3 06-27-2019 09:22 PM
software RAID failed -- not enough operational mirrors duffrecords Linux - Software 7 10-16-2014 01:42 PM
Raid + LVM add new raid device to LVM, problem request Linux - Server 3 08-15-2012 04:06 AM
>raid5 : not enough operational devices for md0 (2/3 failed) targi Linux - Newbie 3 04-08-2006 12:03 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 07:47 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration