Slackware 14.2 virtual machine hangs regularly (perhaps due to snmpd)
SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Slackware 14.2 virtual machine hangs regularly (perhaps due to snmpd)
Hello,
I have a fully patched Slackware 14.2 VMware virtual machine (version 8) running in ESXi 5.5 U1 (build 1623387) hypervisor. From time to time this virtual machine just hangs up completely starting to burn high CPU on the host and thus making other running virtual machines less responsive. I wasn't been able to identify the reason for these hangs so far and haven't had such problem on any other Slackware virtual machine (and I have others raging from Slackware 11.0 and current, both 32 and 64 bit). The last time when the hang occurred I was logged on the terminal through SSH, so I caught the following written:
Code:
Message from syslogd@slack-142 at Wed Jul 19 10:26:01 2017 ...
slack-142 kernel: [ 2649.351954] CPU: 1 PID: 1085 Comm: snmpd Not tainted 4.4.75-smp #2
Message from syslogd@slack-142 at Wed Jul 19 10:26:01 2017 ...
slack-142 kernel: [ 2649.352612] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
Message from syslogd@slack-142 at Wed Jul 19 10:26:03 2017 ...
slack-142 kernel: [ 2649.353910] task: f2d19200 ti: f0176000 task.ti: f0176000
Message from syslogd@slack-142 at Wed Jul 19 10:26:03 2017 ...
slack-142 kernel: [ 2649.358574] Stack:
Message from syslogd@slack-142 at Wed Jul 19 10:26:03 2017 ...
slack-142 kernel: [ 2649.359265] 00000018 00000010 f3263000 f4dad4b8 00000000 00000000 f4003d80 00000000
Message from syslogd@slack-142 at Wed Jul 19 10:26:03 2017 ...
slack-142 kernel: [ 2649.361261] Call Trace:
Message from syslogd@slack-142 at Wed Jul 19 10:26:03 2017 ...
slack-142 kernel: [ 2649.359925] 024280c0 00000000 f2d19200 00000000 00000001 00000000 024000c0 f4003d80
Message from syslogd@slack-142 at Wed Jul 19 10:26:03 2017 ...
slack-142 kernel: [ 2649.361899] [<c1197208>] ? inode_init_always+0xe8/0x180
Message from syslogd@slack-142 at Wed Jul 19 10:26:03 2017 ...
slack-142 kernel: [ 2649.360602] 024080c0 f0177d78 f0177da0 80200020 f0177d78 c1fef180 f017000a e46e4ecc
Message from syslogd@slack-142 at Wed Jul 19 10:26:03 2017 ...
slack-142 kernel: [ 2649.362572] [<c11dc1cc>] ? proc_alloc_inode+0x1c/0x90
Message from syslogd@slack-142 at Wed Jul 19 10:26:03 2017 ...
slack-142 kernel: [ 2649.363252] [<c11696c1>] ___slab_alloc+0x51/0x4a0
Message from syslogd@slack-142 at Wed Jul 19 10:26:03 2017 ...
slack-142 kernel: [ 2649.364242] [<c11a5196>] ? __inode_wait_for_writeback+0x56/0x90
Message from syslogd@slack-142 at Wed Jul 19 10:26:03 2017 ...
slack-142 kernel: [ 2649.364955] [<c10adb4c>] ? call_rcu_sched+0x1c/0x20
Message from syslogd@slack-142 at Wed Jul 19 10:26:03 2017 ...
Which makes me think that the problem is due to snmpd, but what could it be? The virtual machine is being pooled through SNMP by Cacti running on another (real) machine in the same network segment just like my other Slackware virtual machines. However, only the Slackware 14.2 virtual machine hangs.
Any ideas anyone?
I would really appreciate any help in resolving this strange issue. And please, let me know if I need to provide any other information necessary.
I don't use vmware esxi anymore since a decade (more or less) but looking at the kernel errors of the virtual machine, wild guessing, might it be that the vmdk image file of the vm (more probably) and/or the esxi filesystem (where the vms are stored) are damaged?
I don't use vmware esxi anymore since a decade (more or less)
And what are you using instead? I'm using ESX on a dedicated real machine for running all my virtual machines (~40).
Quote:
Originally Posted by ponce
might it be that the vmdk image file of the vm (more probably) and/or the esxi filesystem (where the vms are stored) are damaged?
I have ran the following:
Code:
# vmkfstools --fix check Slack-14.2.vmdk
Disk is error free
to check the disk image file, but I'm not able to run voma for checking the vmfs volume, because it's the one on which ESXi is installed and thus it is in use (e.g. I received "Found 1 actively heartbeating hosts on device"). Do you know other ways to check for damages?
Anyway, I'll try to move the virtual machine on another vmfs volume (on another disk) and would report back if this has any effect.
We don't run Slackware on ESXi ( not yet, but we are evaluating it )
However, we do have more than a few CentOS 6 Machines on ESXi out there among our Customer Sites.
On some Customer's Systems, we experienced occasional periods where the CentOS VM would be unresponsive for a few seconds for all users( and a few seconds feels like a long time when running a terminal-based application via ssh ).
One thing that made a difference for us were a few of the Linux-on-VM tuning recommendations:
We did not implement all the recommendations in #1.
However, Kernel Parameter elevator=noop ; set noatime in fstab ; vm.swappiness=1 were low-hanging fruit that helped a lot ( I assume you're running vmtools and you're running the para-virtualized devices ) ...
Another obvious one is If you don't need a GUI Console, be sure to boot into runlevel 3, and not runlevel 4 ... the GUI Console is just a `startx` away if you ever need one.
Setting vm.swappiness seemed to help a lot on our Customer's VMs ( we chose to set vm.swappiness=1 as in #2 instead of vm.swappiness=0 as in #1 ).
Answer 1 in URL #2 also shows how the user solved his own problem via the results of `sar -W` which URL #1 says "don't do that !" ...
Anyhow, we've tried `sar` but never got any useful info from it.
As for other logs and sorry to belabor the obvious if you've already looked ... do you see anything in /var/log/{dmesg,messages,syslog} when the system freezes up ?
Or maybe `top` shows something useful ?
One thing about dmesg is you lose it after each reboot. If you can manage to reboot cleanly, then adding the following line to /etc/rc.d/rc.local_shutdown will save your dmesg for the current boot so you can inspect it after a reboot:
Code:
#
# save dmesg for this boot. Append at end of /etc/rc.d/rc.local_shutdown
#
/bin/dmesg > /var/log/dmesg-last-boot
I do run VMWare Workstation on my main Slackware64 14.2 + Multilib Laptop with a couple Slackware Guests ( 14.2 and Current ).
I follow most of the recommendations in URL #1 for the VMWare Workstation Guests and they run very well on my Laptop.
HTH and good luck !
-- kjh
P.S. there are also a few customers running Hyper-V and we also tune CentOS on Hyper-V pretty much the same way.
Last edited by kjhambrick; 07-20-2017 at 07:04 AM.
Reason: add p.s.
First, thanks for the extensive replay and for sharing your experience kjhambrick!
Now, to the different points and questions:
Quote:
Originally Posted by kjhambrick
We don't run Slackware on ESXi ( not yet, but we are evaluating it )
Apart, from the problem I described in this post I could say Slackware is running just fine on ESXi (and I mean different x86 and x86_64 versions) :-)
Quote:
Originally Posted by kjhambrick
However, Kernel Parameter elevator=noop ; set noatime in fstab ; vm.swappiness=1 were low-hanging fruit that helped a lot ( I assume you're running vmtools and you're running the para-virtualized devices ) ...
Setting vm.swappiness seemed to help a lot on our Customer's VMs ( we chose to set vm.swappiness=1 as in #2 instead of vm.swappiness=0 as in #1 ).
I haven't tried these parameters, but I would just after I finish the test with moving the virtual disk on another physical drive. The default value in Slackware 14.2 seem quite high (see below), so thanks for the suggestion!
Code:
# cat /proc/sys/vm/swappiness
60
Quote:
Originally Posted by kjhambrick
Another obvious one is If you don't need a GUI Console, be sure to boot into runlevel 3, and not runlevel 4 ... the GUI Console is just a `startx` away if you ever need one.
All my Slackware virtual machines are set up like this. If I ever had to test or use an X application I would redirect it to a X Server on another PC (e.g. with export DISPLAY=host:session).
Quote:
Originally Posted by kjhambrick
As for other logs and sorry to belabor the obvious if you've already looked ... do you see anything in /var/log/{dmesg,messages,syslog} when the system freezes up ?
No, there's noting written in the logs, so the message that was printed on the terminal is the only trace I have.
Quote:
Originally Posted by kjhambrick
Or maybe `top` shows something useful ?
I cannot check this, because the virtual machine becomes completely unresponsive and I cannot access it neither by SSH nor VMware vSphere Client Console.
Quote:
Originally Posted by kjhambrick
If you can manage to reboot cleanly, then adding the following line to /etc/rc.d/rc.local_shutdown will save your dmesg for the current boot so you can inspect it after a reboot:
No, I cannot reboot, so the only option is to power off the virtual machine.
I'll write again later when I try your suggestions, because I prefer trying them one by one to find what actually makes difference :-)
So moving the virtual machine to a different physical drive didn't solve the problem (it crashed the next day). Currently trying with /proc/sys/vm/swappiness set to 1.
I'm writing back since I've seems to have found the solution, so I hope it helps someone else. After some time I've realized that I'm running an ESXi 6.0 version that's too old - build 3620759 from 2016-03-15. Just upgrading to the next build 3825889 from 2016-05-12 seems to solve the problem, because the virtual machine is already 3 days without a single crash. I'll write again if it start crashing again.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.