Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a VG called BU with 5 LVs and >20 PVs.
I once tried to start up the VG while a group of 13 PVs wasn't yet online, and vgchange failed as could be expected. After starting up the group of 13 PVs the vgchange -ay BU command still failed, suggesting using --activationmode partial.
I then ran
vgchange -ay --activationmode partial BU
which was succesful. All 5 LVs were activated and could be mounted correctly.
While this would be a useable way to start up my VG, it excludes me from making changes or adding a new LV to the VG, as the 13 PVs are still identified as missing (actually they are not!). Message:
Code:
WARNING: Missing device /dev/sdu1 reappeared, updating metadata for VG BU to version 83.
WARNING: Device /dev/sdu1 still marked missing because of allocated data on it, remove volumes and consider vgreduce --removemissing.
Cannot change VG BU while PVs are missing.
Consider vgreduce --removemissing.
Cannot process volume group BU
I tried vgreduce --removemissing BU to no avail.
I would like to get rid of the error messages and fully use my VG (which is currently working OK as a single activation). Any suggestions would be greatly appreciated.
[root@san ~]# pvs -v --segments
There are 13 physical volumes missing.
PV VG Fmt Attr PSize PFree Start SSize LV Start Type PE Ranges
/dev/sda1 BU lvm2 a-m <1,82t 0 0 476931 AV 668852 linear /dev/sda1:0-476930
/dev/sdc2 clearos lvm2 a-- <36,27g 4,00m 0 512 swap 0 linear /dev/sdc2:0-511
/dev/sdc2 clearos lvm2 a-- <36,27g 4,00m 512 8772 root 0 linear /dev/sdc2:512-9283
/dev/sdc2 clearos lvm2 a-- <36,27g 4,00m 9284 1 0 free
/dev/sdd1 BU lvm2 a-- <931,51g 0 0 238466 DVD 715382 linear /dev/sdd1:0-238465
/dev/sde1 BU lvm2 a-- <1,54t <1,54t 0 402823 0 free
/dev/sdf1 BU lvm2 a-m <1,82t 0 0 476931 AV 191921 linear /dev/sdf1:0-476930
/dev/sdg1 BU lvm2 a-m <1,82t 0 0 476931 Jeugd 0 linear /dev/sdg1:0-476930
/dev/sdi1 BU lvm2 a-m <931,51g 0 0 238466 DVD 476916 linear /dev/sdi1:0-238465
/dev/sdk1 BU lvm2 a-m <931,51g 0 0 238466 DVD 953848 linear /dev/sdk1:0-238465
/dev/sdl1 BU lvm2 a-- 1,36t 1,36t 0 357699 0 free
/dev/sdm1 BU lvm2 a-m <931,51g 0 0 238466 BU 503072 linear /dev/sdm1:0-238465
/dev/sdn1 BU lvm2 a-m <931,51g 0 0 238466 PC 238466 linear /dev/sdn1:0-238465
/dev/sdp1 BU lvm2 a-m <931,51g 0 0 238466 PC 0 linear /dev/sdp1:0-238465
/dev/sdr1 BU lvm2 a-- <465,70g <465,70g 0 119219 0 free
/dev/sds1 BU lvm2 a-m 1,36t 0 0 357683 BU 145389 linear /dev/sds1:0-357682
/dev/sdt1 BU lvm2 a-- <931,45g <931,45g 0 238451 0 free
/dev/sdu1 BU lvm2 a-m <1,82t 0 0 476916 DVD 0 linear /dev/sdu1:0-476915
/dev/sdv1 BU lvm2 a-m 1,36t 0 0 164937 AV 1145783 linear /dev/sdv1:0-164936
/dev/sdv1 BU lvm2 a-m 1,36t 0 164937 47357 Jeugd 476931 linear /dev/sdv1:164937-212293
/dev/sdv1 BU lvm2 a-m 1,36t 0 212294 145389 BU 0 linear /dev/sdv1:212294-357682
/dev/sdw1 BU lvm2 a-m 1,36t 0 0 189443 AV 2478 linear /dev/sdw1:0-189442
/dev/sdw1 BU lvm2 a-m 1,36t 0 189443 118406 DVD 1192314 linear /dev/sdw1:189443-307848
/dev/sdw1 BU lvm2 a-m 1,36t 0 307849 47356 PC 476932 linear /dev/sdw1:307849-355204
/dev/sdw1 BU lvm2 a-m 1,36t 0 355205 2478 AV 0 linear /dev/sdw1:355205-357682
/dev/sdx1 BU lvm2 a-m 1,36t 197,83g 0 262144 PC 524288 linear /dev/sdx1:0-262143
/dev/sdx1 BU lvm2 a-m 1,36t 197,83g 262144 44894 BU 741538 linear /dev/sdx1:262144-307037
/dev/sdx1 BU lvm2 a-m 1,36t 197,83g 307038 50645 0 free
[root@san ~]#
Side-issue: The 13 PVs are external USB drives. I need to disconnect the USB cable before starting up the server, or else the server will freeze at the original splash screen, even before POST. I have been struggling with this for long time but disconnecting and reconnecting the USB cable after POST was a minor issue to me. Until I once forgot to reconnect the USB cable in time, causing above problem. Can you think of a fix for this minor side problem too, please? I have already tried various BIOS changes.
Have you tried "pvscan --cache" ? That should cause a re-scan of the physical drives, ignoring the cached metadata and passing new metadata to the lvmetad daemon. Since you are apparently able to mount the affected filesystems, I'm guessing that the BU volume group is actually OK and that the problem is just bad cached metadata.
Quote from the pvscan manpage:
"When lvmetad is used, LVM commands avoid scanning disks by reading metadata from lvmetad. When new disks appear, they must be scanned so their metadata can be cached in lvmetad. This is done by the command pvscan --cache, which scans disks and passes the metadata to lvmetad."
Last edited by rknichols; 05-27-2019 at 10:28 PM.
Reason: Add manpage quote
Thanks. Like always, the Linux manpages are very concise.
pvscan --cache didn't solve the problem but your excellent explanation made me think. If the VG=BU is OK (like it is) and the lvm2 info from the relevant PVs is OK (it must be, otherwise I could not activate the VG), but lvmetad is still clobbered with odd info, this info could come from PVs belonging to VG=BU, but currently not actively containing LVs, i.e. /dev/sde1, /dev/sdl1, /dev/sdr1 and /dev/sdt1.
My suggestion would be trying to pvremove these currently unused PVs and vgreduce the VG accordingly. If that doesn't help either, I would suggest to pvcreate --uuid --restorefile all PVs, then pvremove same and vgcfgrestore BU. What do you think?
The "pvscan -v --segments" output is showing the "m" (missing) flag on 13 PVs that do have LVs allocated. I'm pretty much at a loss for things to suggest. I don't understand how the system could be successfully mapping the LVs on those "missing" PVs. You couldn't mount those filesystems without that. What does "lsblk -f" report?
You might try stopping the lvm2-lvmetad service (perhaps rebooting with that service disabled/masked) and see if that changes anything. Somehow I doubt it, but it's one more straw to grasp for.
BTW, I hope you have some redundancy in those LVs. With 20+ physical devices the liklihood of a device failure is significant, and losing any part of an LV means its entire filesystem is lost.
Running the server without lvmetad will be done after trying vgreduce -a first, report follows.
Regarding redundancies: Yes, I know. This system really is my Backup system for personal files, but it's also intended to learn about LVM by heavily using it. The disks are former harddisks from my production system that were replaced over the (many) years. Sometimes there are I/O-errors that ddrescue can resolve. If a HD will suddenly fail entirely I know I will have to rebuild the LV. It's not the most efficient way to backup files but I learn a lot. Thanks for your help, I really appreciate it.
Edit:
vgreduce -a BU :
Cannot change VG BU while PVs are missing.
Consider vgreduce --removemissing.
----- I think that could be a dangerous thing to do as it might wipe the PVs currently active
[root@san ~]# systemctl stop lvm2-lvmetad.service
Warning: Stopping lvm2-lvmetad.service, but it can still be activated by:
lvm2-lvmetad.socket
[root@san ~]# pvscan --cache
[root@san ~]# pvs -v --segments
----- Same output as before with "m" attributes. Haven't tried a reboot yet.
I see things that are amiss. The first is that the pvs output shows /dev/sdg1 as an LVM2 PV, but lsblk shows it formatted as swap space and not LVM at all. Contrast that with the way that your actual swap space on /dev/sdc2 shows up as an LV that contains swap space.
Another that I notice (not looking in any particular order) is /dev/sds1, which pvs thinks is an LVM2 member but lsblk does not.
It looks like the LVM header on some of the partitions has become corrupted. One thing to note is that the entire structure of the VG is recorded on each of the PVs. That's all of the information that you can see in /etc/lvm/backup/BU, also in ASCII, just not as nicely formatted.
It would be interesting to see what "file -s /dev/sd?1" reports. I suspect it's going to say that some of those supposedly LVM partitions are not LVM2 members. If that's the case, you need to copy /etc/lvm/backup/BU to another location and then use that copy as the restorefile for pvcreate (with the -ff, --uuid, and --restorefile options) for each affected partition, and then a vgcfgrestore. You'll almost certainly have to unmount all of the LVs in the BU VG while doing that.
Last edited by rknichols; 05-29-2019 at 10:13 AM.
Reason: vgcfgrestore just gets run once, after all the pvcreate steps
My fault, I was too quick responding to your earlier request and had not noticed the difference in device names after a reboot. Normally I get the same device names, not this time.
Your latest suggestion was exactly what I already intended to do. I have tried 1 PV and found it nearly impossible to pvremove a PV before pvcreating it. It takes pvcreate/lvmetad more than an hour to process 1 command, reflecting the complicated situation the VG is in. My suggestion is:
- list current device names with their respective PV UUIDs (and do not reboot !)
- remove all LVM info on all BU-disks, not by pvremove/lvmetad but through dd
- pvcreate all disks with --uuid --restorefile
- vgcfgrestore BU
If "pvcreate -ff ..." is taking a long time, I guess it's worth removing the LVM signature with dd, though I'm not sure that's going to help. LVM will still try to find and update the state of all the devices. You might have to resort to having just one disk at a time online. Otherwise, I don't see anything better than your plan.
If I first do a dd on all VG=BU-disks, clearing the LVM signatures of all PVs in the VG, then pvcreate/LVM wouldn't have much to check anymore. I would expect it to quickly report 20 PV's missing. After doing that 16 times it would probably report 4 PVs still missing, these being the devices that were part of the original VG=BU but not actively used. These missing devices could then be removed by vgreduce -a --removemissing BU, followed by vgcfgrestore BU.
I will try this anyway and if this takes 20 times >1 hour, that's OK. If it takes much longer I will connect 1 disk at a time running pvdisplay -m every time in order to be aware of the exact PV UUIDs that I should link them to.
I will report back as soon as I'm done.
I will leave this for a while. Running nearly any LVM command on VG=BU is blocked because of missing PVs. All LVs are still OK, so I can use the system except for making changes to the VG. Clearing all LVM signatures seems risky as I may not be able to use pvcreate to re-create the PVs or re-extend them to the VG.
Thanks anyway for your help, it has been a good learning experience. I may come back with more news.
I kind of solved the problem but there seems to be a small problem remaining.
Original problem: My server with 17 external USD-devices forming a LVM VG=BU had been started up with only 4 of 17 USB disks. After reconnecting the 13 external disks LVM found 13 duplicate PVs as "reappearing". All 5 LVs of the VG worked fine, but I could not change metadata untill the duplicate problem was solved. I left the problem untill I needed an extension to 1 of 5 LVs.
Solution: Edit /etc/lvm/backup/BU manually by removing all flags with "MISSING" or "LOCKED", then vgcfgrestore BU.
This worked fine, I could still use all 5 LVs.
Next step was to extend /dev/BU/AV from 5 Tb to 7 Tb, using
lvextend -L +2T BU/AV
Lvextend reported that the size was correctly increased from 5 Tb to 7 Tb but seemed to halt at changing the ext4 filesystem.
I tried a reboot and now 4 out of 5 LVs were active, but the extended LV=AV remains inactive.
Code:
[root@san ~]# lvscan
ACTIVE '/dev/clearos/swap' [2,00 GiB] inherit
ACTIVE '/dev/clearos/root' [<34,27 GiB] inherit
ACTIVE '/dev/BU/DVD' [5,00 TiB] inherit
ACTIVE '/dev/BU/PC' [3,00 TiB] inherit
inactive '/dev/BU/AV' [7,00 TiB] inherit
ACTIVE '/dev/BU/Jeugd' [2,00 TiB] inherit
ACTIVE '/dev/BU/BU' [3,00 TiB] inherit
[root@san ~]# lvchange -ay BU/AV
device-mapper: reload ioctl on (253:6) failed: Ongeldig argument
[root@san ~]#
I guess I can go back to a former version of the metadata but then I still don't have the extension to BU/AV.
Would it be possible to solve above ioctl error?
Edit:
I solved this problem by going back to an earlier *.vg file, redoing above procedure and rebooting. Funny enough it worked this time.
Last edited by roelvdh; 08-01-2019 at 01:15 AM.
Reason: Problem solved
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.