LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Virtualization and Cloud (https://www.linuxquestions.org/questions/linux-virtualization-and-cloud-90/)
-   -   How to Warn Via Email Whenever Virtualbox VM Takes a Dump? (https://www.linuxquestions.org/questions/linux-virtualization-and-cloud-90/how-to-warn-via-email-whenever-virtualbox-vm-takes-a-dump-4175450375/)

mbvpixies78 02-15-2013 10:58 PM

How to Warn Via Email Whenever Virtualbox VM Takes a Dump?
 
I have an Apache Mirror running on a VirtualBox VM and it occasionally stops serving web pages and stops updating its content via rsync. I know this because I get an e-mail from Apache telling me the content of the mirror is 3 days old and then I type up the address of my web site and it fails to load.

On the web server I type:
#service httpd status
#httpd is dead

***I'm typing that response from memory, but there's more to it and I'll edit this to correct when I see it next-- something extremely brief about being dead but still having children or threads or subprocesses...

Why this is happening I have my suspicions and will wait til later to pursue. For now, I need a method of knowing more quickly (same day, within minutes, preferably) when my web server is down, not after 3 days.

I would like to handle this warning internally, without reliance on anyone else. Ideally, I'd like the host to see that the VM shut down, and immediately send me an e-mail.

Better yet, have the VM recognize that httpd just died and initialize a script to restart httpd?

How do I get the VM to see httpd is dead and initiate a daemon restart?

Or is an e-mail the best I can hope for?

Bonus points if the answer involves my writing a script in Python, which I'm trying to learn right now.

unSpawn 02-16-2013 08:32 PM

Quote:

Originally Posted by mbvpixies78 (Post 4892753)
Why this is happening I have my suspicions and will wait til later to pursue.

Core problems should be worth fixing first: the rest is only combating symptoms.


Quote:

Originally Posted by mbvpixies78 (Post 4892753)
I need a method of knowing more quickly (same day, within minutes, preferably) when my web server is down

How about letting Monit poll the web server status (or whatever else) page and restart the service on failure?

scheidel21 02-16-2013 09:55 PM

unSpawn has a food suggestion. I've done something similar with Webmin monitoring and alerting me of issues as well as restarting services.

mbvpixies78 02-17-2013 09:05 PM

Quote:

Originally Posted by unSpawn (Post 4893363)
Core problems should be worth fixing first: the rest is only combating symptoms.



How about letting Monit poll the web server status (or whatever else) page and restart the service on failure?


I understand what you're saying but how to uncover other symptoms that would point to a cause.

The VM seems fine, it's just that httpd dies. My logs in /var/log/httpd only go up to 2/11/13. I also have email logs from Logwatch, but neither is saying anything I recognize as a clue. Are there particular things I should grep for in the logs?

Where else should I look?


I'm setting up Monit now.

unSpawn 02-18-2013 06:26 AM

Quote:

Originally Posted by mbvpixies78 (Post 4894080)
I understand what you're saying but how to uncover other symptoms that would point to a cause.

Generally speaking there's two approaches you could use: I) walk the tree from the outside inwards and eliminate potential causes or II) target most likely causes first.
In the case of I.) You've got three major leads:
0) the Virtual Box server and client log files on the Virtual Box host,
1) the guest system and log files and
2) whatever configuration and logs its web stack has or produces.

If the Virtual Box server and client logs don't hold any clues you could make it log more (debug mode if possible) in the hope it could reveal something. If over time these (after all you said it happens occasionally) logs don't show anything worthy of investigation you could then draw the conclusion the "outer layer" is OK. Then you would move on to the VB guest and start assessing like you would any regular system: is all software (including whatever runs in the web stack) up to date? Is file system integrity intact? Any unwanted subsystems or processes running? Does it run (any form of) SAR to produce resource usage stats over time? (If you don't run that you should, I described a few tools tersely here.) In what way does its (web stack) configuration deviate from the standard and for what reason? And the same goes here: if log files don't hold clues make it log more so you can draw the conclusion it's OK or not.


In the case of II.) you could for example start by checking for the (approximate) first time you experienced outage and look at what software or system / service configuration changed around that time. (I keep a time stamped admin log so I can read back major changes and I have all configuration under revision control so I can revert back for diagnostics or in case of error) Or you could start with the process itself:

Quote:

Originally Posted by mbvpixies78 (Post 4894080)
The VM seems fine, it's just that httpd dies.

A killed process should be easier to diagnose but what you say here kind of contradicts what you said in your OP: Apache occasionally stopping to serve web pages. So the first question would be if httpd processes still exist, and if they do, what process state they're in.


Quote:

Originally Posted by mbvpixies78 (Post 4894080)
My logs in /var/log/httpd only go up to 2/11/13. I also have email logs from Logwatch, but neither is saying anything I recognize as a clue.

So the first question here would be: why do your /var/log/httpd logs only go up to 2013-02-11? If that's due to log rotation then allow it to archive more, if it's due to file system space constraints or other reasons then list them. Wrt Logwatch you should check what the default detail level is set to first. If it's "Medium" / 5 then setting it to "High" / 10 should reveal more detail. Don't forget to run it again with "--archives --range All".


Quote:

Originally Posted by mbvpixies78 (Post 4894080)
Are there particular things I should grep for in the logs? Where else should I look

That's difficult to say as you haven't posted any host, guest and service details. Generally speaking Logwatch at detail level 10 should show errors if it can find any, another way could be to check logs for event recurrence (often, way too much, rarely) and check message significance (obviously the message level matters because if you don't log it you won't find it) for example by running 'cat /var/log/httpd/access_log|petit --hash|less'.

mbvpixies78 02-19-2013 05:38 PM

Today upon arriving home I found that the VM state is 'aborted.' I haven't looked through logs yet-- will post more when I have a chance to do so.

Restarted the VM and httpd started up just fine on its own. Web site is accessible again.

Here are two concerns about VirtualBox:

(1) There's an update available but yum fails to update (yes, I shut down VirtualBox when I tried to update)
(2) There's a warning about the filesystem format (ext4) which may cause problems for VirtualBox


I tried to update VirtualBox manually, using:
rpm -ivh VirtualBox... .rpm
"File [ ... ] conflicts with [ ... ]

It fails to update from 4.1.4... to 4.2.4...

Right now I am creating a clone of my Apache Mirror VM file and will uninstall VirtualBox and reinstall per https://forums.virtualbox.org/viewtopic.php?f=7&t=52605 which suggests I have a fork and to fix the update problem like thus.

Will update when finished cloning, uninstalling and reinstalling VirtualBox.

scheidel21 02-19-2013 05:46 PM

That certainly sounds like VirtualBox issue if the status of the VM was "Aborted"

mbvpixies78 02-19-2013 07:02 PM

Quote:

Originally Posted by scheidel21 (Post 4895517)
That certainly sounds like VirtualBox issue if the status of the VM was "Aborted"

Ok. I'm still going to install Monit as unSpawn suggested since I did tweak the Apache config file a bit and would like to a little more info on how the mirror fares day in and day out. Will post more when I get there.

I know what I'm saying sounds contradictory, i.e., VM is fine, process dies, then I typed that the VM was down, 'aborted' state-- but that's the reality of the symptoms, which may indicate multiple causes or a changing problem.

mbvpixies78 02-21-2013 06:32 PM

I'm trying to uninstall VirtualBox so that I can install a newer version, as this might be what's causing the problem. Unfortunately I can't locate the uninstall file mentioned in the VirtualBox manual. Manual uninstall is not explained in the manual/not applicable, so any suggestions of how to remove VirtualBox would be greatly appreciated as I try to track down the steps elsewhere.

I wanted to download a tar.gz for the version I have installed in the hopes that I can extract the uninstaller file from it, but there are only rpms available...

Synopsis: When I try to install the latest version of VirtualBox, I get many "conflicts with ..." errors between my installed version and the new version. Where to go from here?

scheidel21 02-21-2013 07:37 PM

How was VB installed originally? If by the package manager then use the package manager to remove, the same goes for an RPM package if installed via RPM you should be able to remove it via the package manager as well.

unSpawn 02-21-2013 09:20 PM

Before you update or remove I would list the contents of the package and the %scripts section because I vaguely remember long time ago when I installed Virtual Box it needed mess with kernel modules. If there are no stock ones for your kernel it may compile them and then they won't be removed when you update the package and with the listings you can check that. Else just post the stdout / stderr your upgrade causes preferably in vBB code tags?

mbvpixies78 02-21-2013 09:22 PM

I didn't recall how VB was installed, so I typed:
Code:

#yum remove VirtualBox
Setting up Remove Process
No Match for argument: VirtualBox
[ ... ]
No Packages marked for removal

So it turns out I installed via an rpm. For anyone else like me who's never had to remove a program installed via downloaded rpm before, it's really straight-forward, as mentioned here

mbvpixies78 02-22-2013 12:12 AM

(Side Note:)
 
Quote:

Originally Posted by unSpawn (Post 4894304)
Generally speaking there's two approaches you could use: I) walk the tree from the outside inwards and eliminate potential causes or II) target most likely causes first.
In the case of I.) You've got three major leads:
0) the Virtual Box server and client log files on the Virtual Box host,
1) the guest system and log files and
2) whatever configuration and logs its web stack has or produces.

If the Virtual Box server and client logs don't hold any clues you could make it log more (debug mode if possible) in the hope it could reveal something. If over time these (after all you said it happens occasionally) logs don't show anything worthy of investigation you could then draw the conclusion the "outer layer" is OK. Then you would move on to the VB guest and start assessing like you would any regular system: is all software (including whatever runs in the web stack) up to date? Is file system integrity intact?

I updated VirtualBox (didn't see your most recent post until just now), but still get this warning (re: file system integrity)

Code:

The virtual machine execution may run into an error condition as described below. We suggest that you take an appropriate action to avert the error.
The host I/O cache for at least one controller is disabled and the medium '/home/admin/VirtualBox VMs/Apache Mirror 3/Apache Mirror.vdi' for this VM is located on an ext4 partition. There is a known Linux kernel bug which can lead to the corruption of the virtual disk image under these conditions.
Either enable the host I/O cache permanently in the VM settings or put the disk image and the snapshot folder onto a different file system.
The host I/O cache will now be enabled for this medium.

Weird thing is I/O cache is enabled, yet this warning still shows every time.

Quote:

Any unwanted subsystems or processes running?
I must embarassingly admit I had snort running on the same VM as the apache mirror briefly due to interesting hacking attempts I noticed in the logs. I was thinking it wasn't running but now it seems it was starting on boot. I deleted the links which cause this to happen though.
Quote:

Does it run (any form of) SAR to produce resource usage stats over time? (If you don't run that you should, I described a few tools tersely here.) In what way does its (web stack) configuration deviate from the standard and for what reason? And the same goes here: if log files don't hold clues make it log more so you can draw the conclusion it's OK or not.
I'm not running a SAR but will do so.

Quote:

In the case of II.) you could for example start by checking for the (approximate) first time you experienced outage and look at what software or system / service configuration changed around that time. (I keep a time stamped admin log so I can read back major changes and I have all configuration under revision control so I can revert back for diagnostics or in case of error)
I'm not sure when it started and logs are periodically being auto-deleted to save space-- I need to look into how to allow logs to grow larger because I do have the space.



Quote:

Or you could start with the process itself:


A killed process should be easier to diagnose but what you say here kind of contradicts what you said in your OP: Apache occasionally stopping to serve web pages. So the first question would be if httpd processes still exist, and if they do, what process state they're in.
I need to let it happen again-- I can't remember 100% if it has always been the VM aborted or if sometimes it was just httpd dead, or perhaps 100% the latter. I had assumed at first it was perhaps due to strain on my residential internet connection or something unrelated to Apache or VB causing the problem, and so didn't pay too close attention to VM or host.

Quote:

So the first question here would be: why do your /var/log/httpd logs only go up to 2013-02-11? If that's due to log rotation then allow it to archive more, if it's due to file system space constraints or other reasons then list them. Wrt Logwatch you should check what the default detail level is set to first. If it's "Medium" / 5 then setting it to "High" / 10 should reveal more detail. Don't forget to run it again with "--archives --range All".
Logging goes up to 2-11-13, and then the next entry is 2-19-13. Logwatch was set to medium detail, but starting today, it's at high detail level. I have daily emails (for the most part) from back to 10-23-12.

Apache tells me on 2-12-13 warning mirror content is 3 days old--- last successful probe 2-10-13 3:48. 2-10-13 16:29 I see a tag Romanian Black Hat calling itself ZmEu after a Romanian mythological figure that kidnaps and rapes young girls... that spells issues with primary caregiver(s.)

All I see are rudimentary exploit attempts involving phpmyadmin, which I don't use anyway. Apache is still serving pages until 2-11-13 22:55, and I don't see anything unusual outside of the Romanian rapist(s).



Quote:

That's difficult to say as you haven't posted any host, guest and service details. Generally speaking Logwatch at detail level 10 should show errors if it can find any, another way could be to check logs for event recurrence (often, way too much, rarely) and check message significance (obviously the message level matters because if you don't log it you won't find it) for example by running 'cat /var/log/httpd/access_log|petit --hash|less'.
Host and guest are CentOS 6.3, guest is minimal, host is full install. What else do you need to know?

mbvpixies78 02-24-2013 07:11 PM

No failure of the web server and/or the VM yet... meanwhile I'll start a separate thread w/other questions semi-related to the above

unSpawn 02-25-2013 08:48 PM

Quote:

Originally Posted by mbvpixies78 (Post 4897217)
Code:

There is a known Linux kernel bug which can lead to the corruption of the virtual disk image under these conditions.

It's best to check the VirtualBox bug tracker to find out exactly wich conditions this bug manifests itself under. Then you can make an informed decision to either leave it as it is or move it to an ext2 partition. In my opinion anything between the host and the guest that isn't a performance benefit should be measured, questioned, removed or disabled as much as possible. Your ASF mirror for example is just that: a mirror. You're right to run minimal CentOS installs because it doesn't need all the kernel modules and user land applications a generic server uses, it doesn't need a "real disk" scheduler (vitual storage so NOOP should do fine) and IMHO it doesn't need journaling. Hell, it wouldn't even need to run Apache (there's way lighter web server around) and it may benefit from request caching (depending on how much traffic you get, performance benefits being relative because you obviously have to clear the cache after you rsync with ASF).
*The benefit of running a capable SAR is that you will be able to determine the effect of some of the tuning you do. Don't change configuration, sysctls or turn other knobs because some (misinformed, outdated) web log says so: to measure is to know and IMHO that's the only reasonable approach.


Quote:

Originally Posted by mbvpixies78 (Post 4897217)
I'm not sure when it started and logs are periodically being auto-deleted to save space-- I need to look into how to allow logs to grow larger because I do have the space.

Firs thing is that when installed out of the box 'logrotate' comes with a retention time that suits workstations more than servers IMHO. Apart from changing that you could also opt to have the guests rsyslogd send logs to the host for processing.


Quote:

Originally Posted by mbvpixies78 (Post 4897217)
I need to let it happen again

Then just configure appropriate logging and let it happen.


Quote:

Originally Posted by mbvpixies78 (Post 4897217)
that spells issues with primary caregiver(s.)

I detect professional deformation here ;-p Anyway, if you run a publicly accessible (web) server you're bound to find evidence of vulnerability scanners. Apart from the fact your web server will mostly send 4nn-type return codes back anyway, hardening the server and web stack should be a given (also see mod_security, Netfilter rate limiting, fail2ban).


All times are GMT -5. The time now is 01:00 AM.