Linux - Virtualization and CloudThis forum is for the discussion of all topics relating to Linux Virtualization and Linux Cloud platforms. Xen, KVM, OpenVZ, VirtualBox, VMware, Linux-VServer and all other Linux Virtualization platforms are welcome. OpenStack, CloudStack, ownCloud, Cloud Foundry, Eucalyptus, Nimbus, OpenNebula and all other Linux Cloud platforms are welcome. Note that questions relating solely to non-Linux OS's should be asked in the General forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a Gentoo KVM host running 2 Gentoo guests. They've been working for 2 years without issue. Recently, one guest started locking without any error message, kernel panic, or other logged failure when I started mysql. Shortly after, it started locking up if I did any drive access to it. When I loaded a backup into the other guest and ran mysql on it, it started locking too. Unable to produce any software fault log, and unable to produce this behavior in the host, I tried swapping the motherboard out of desperation, thinking that something was corrupting the disk. I should have known that wasn't the cause, but oh well.
Both virtual machines lock at this line in the boot process:
Code:
[ 1.135647] EXT3-fs (sda1): error: couldn't mount because of unsupported optional features (240)
[ 1.144455] EXT4-fs (sda1): couldn't mount as ext2 due to feature incompatibilities
[ 1.160868] EXT4-fs (sda1): INFO: recovery required on readonly filesystem
[ 1.163865] EXT4-fs (sda1): write access will be enabled during recovery
Booting either VM to a livecd and trying to mount, fsck, or otherwise do anything with the drives causes the VM to lock as well. I tried blowing away / on one VM and doing a full reinstall with no luck. Immediately after, it resumed locking on boot, even with a clean filesystem.
At first, I thought it was power related, and decided to slam the host with as much CPU use as possible, with no failures. I also ran memtest86 with no failures reported. I've run emerge -e world on the host in the hopes something got broken, as well as revdep-rebuild.
I've been running these VMs with the following manual kvm lines:
Wonder if moving the vm's to a clean new installed host would show any improvement. At first it would seem to be an issue with clients but if both fail it leads me to consider host or host hardware. I'd bet host hardware at this point.
Wonder if moving the vm's to a clean new installed host would show any improvement. At first it would seem to be an issue with clients but if both fail it leads me to consider host or host hardware. I'd bet host hardware at this point.
What's strange about this, is that the host has no issues whatsoever. The host is on 2 320GB SATA drives in md RAID1, and the RAID shows as clean. The other thing that's strange about this is that it wasn't both at first. At first it was just one, and only moved to the other when I started up MySQL. Shutting down all VMs and pushing resource consumption up higher than with MySQL running in either guest does nothing.
I'd really rather not reinstall unless I was sure this is the cause.
I also just tried using virtio, and booting from an ISO, and still no luck. It locked up here:
I do see that there's a "no such file or directory" on anon_inode:kvm-vcpu, but I'm entirely unsure what this is or whether or not it's the cause of my problems, and google doesn't show much useful info. At least, not that I've found.
I'd run memtest on it for several hours with no errors detected previously. Regardless, I've used more RAM than the guests use with the host OS with no ill effects.
"RAM than the guests use with the host OS with no ill effects."
Do you mean you assigned more ram than is available?
My apologies. I meant the guests use less RAM than I used to test with on the host. The guests use about 2.5GB of RAM out of 8GB total. I used enough to swap the box out without issue on the host. Plus, memtest86 completed without error.
I feel it is the hardware. Without any evidence to say what is the cause then you need to take more aggressive tests.
"I also just tried using virtio, and booting from an ISO, and still no luck. It locked up here:" In a very real sense I guess it could be components related to VM.
I feel it is the hardware. Without any evidence to say what is the cause then you need to take more aggressive tests.
"I also just tried using virtio, and booting from an ISO, and still no luck. It locked up here:" In a very real sense I guess it could be components related to VM.
Is there anything that could cause RAM to pass memtest, but still actually fail? Or do you think it could be CPU? I've replaced the mobo already, so it's for sure not with the SATA controller. What more aggressive tests could I run? I don't have the cash to just replace everything.
Ram and any associated device could fail in seconds or days. A long time ago companies used to use hot/cold chambers and run diags on computers to try to weed out failures. Causes for failures are poor connections as in cold solder joints or any connector. Maybe the most common is PN junctions in components. Damage from time, heat and esd as well as poor production could cause any of the few million gates to fail.
We don't really have any sort of way to do full diags under hot and cold conditions. When one runs memtest it may or may not use all the components in your system. It also may need to run for days to see if any failure happens. There are a few memtests out there and may or may not be best way to test.
Might be worth it to just try to reseat all components and try live again.
I assume you have server ram such as ecc. Might have to disable that and run memtest again.
I assume you have server ram such as ecc. Might have to disable that and run memtest again.
Sadly, this server was built on a budget, and uses desktop components. I try not to skimp on things like RAM, CPU, etc, and instead skimp on areas like the case. I was afraid you'd say RAM, as that's the only thing that makes sense to me still, too. I get paid in a few days, so I'll probably be purchasing replacements then. Thanks for your help. :-)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.