LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
 
Search this Thread
Old 09-02-2009, 07:51 AM   #1
steve51184
Member
 
Registered: Dec 2006
Posts: 372

Rep: Reputation: 30
monit keeps stopping monitoring my services


hi all i have a dual core server with 4gb of ram and i get high apache memory usage and sometimes mysql usage but i installed monit and setup some limits and things seem to be working fine but monit keeps stopping monitoring my services and apache/mysql is crashing.. is there a way to make sure monit is always monitoring?

here is my monit config:

Quote:
set daemon 60
set logfile /var/log/monit.log
set mailserver localhost port 25
set mail-format { from: monit@server1.domain.com }
set alert email@gmail.com
set httpd port 2812 and
allow admin:pass

check process mysql with pidfile /var/run/mysqld/mysqld.pid
group database
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
if failed host H045 port 3306 then restart
if 5 restarts within 5 cycles then timeout
if totalmem > 500 MB for 2 cycles then restart

check process apache with pidfile /var/run/apache2.pid
group www
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
if failed host www.domain.com port 80 protocol http
and request "/monit/token" then restart
if cpu is greater than 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 768 MB for 2 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if 3 restarts within 5 cycles then timeout
and here is my log file:

Quote:
[CEST Sep 2 05:54:00] error : 'apache' loadavg(5min) of 11.1 matches resource limit [loadavg(5min)>10.0]
[CEST Sep 2 05:55:20] error : 'apache' loadavg(5min) of 25.5 matches resource limit [loadavg(5min)>10.0]
[CEST Sep 2 05:55:44] error : HTTP: error receiving data -- Resource temporarily unavailable
[CEST Sep 2 05:55:45] error : 'apache' failed protocol test [HTTP] at INET[www.domain.com:80] via TCP
[CEST Sep 2 05:55:48] error : Cannot open a connection to the mailserver 'localhost:25' -- Transport endpoint is not connected
[CEST Sep 2 05:55:49] error : No mail servers are available
[CEST Sep 2 05:55:49] error : Aborting event
[CEST Sep 2 05:55:49] info : 'apache' trying to restart
[CEST Sep 2 05:55:49] info : 'apache' stop: /etc/init.d/apache2
[CEST Sep 2 05:56:25] error : 'apache' failed to stop
[CEST Sep 2 05:56:35] error : Cannot open a connection to the mailserver 'localhost:25' -- Transport endpoint is not connected
[CEST Sep 2 05:56:35] error : No mail servers are available
[CEST Sep 2 05:56:35] error : Aborting event
[CEST Sep 2 05:58:09] error : 'apache' loadavg(5min) of 52.6 matches resource limit [loadavg(5min)>10.0]
[CEST Sep 2 05:58:59] error : HTTP: error receiving data -- Resource temporarily unavailable
[CEST Sep 2 05:59:03] error : 'apache' failed protocol test [HTTP] at INET[www.domain.com:80] via TCP
[CEST Sep 2 05:59:03] info : 'apache' trying to restart
[CEST Sep 2 05:59:03] info : 'apache' stop: /etc/init.d/apache2
[CEST Sep 2 05:59:40] error : 'apache' failed to stop
[CEST Sep 2 06:21:31] error : 'apache' loadavg(5min) of 396.5 matches resource limit [loadavg(5min)>10.0]
[CEST Sep 2 06:24:55] error : HTTP: error receiving data -- Success
[CEST Sep 2 06:25:47] error : 'apache' failed protocol test [HTTP] at INET[www.domain.com:80] via TCP
[CEST Sep 2 06:26:33] info : 'apache' trying to restart
[CEST Sep 2 06:27:54] info : 'apache' stop: /etc/init.d/apache2
[CEST Sep 2 06:39:03] error : 'apache' failed to stop
[CEST Sep 2 06:43:18] error : 'apache' service timed out and will not be checked anymore
[CEST Sep 2 06:43:27] error : Cannot open a connection to the mailserver 'localhost:25' -- Transport endpoint is not connected
[CEST Sep 2 06:43:30] error : No mail servers are available
[CEST Sep 2 06:43:31] error : Aborting event
[CEST Sep 2 07:39:56] error : 'mysql' failed, cannot open a connection to INET[H045:3306] via TCP
[CEST Sep 2 07:40:01] error : Sendmail: error receiving data from the mailserver 'localhost' -- Resource temporarily unavailable
[CEST Sep 2 07:40:01] error : Aborting event
[CEST Sep 2 07:40:01] info : 'mysql' trying to restart
[CEST Sep 2 07:40:01] info : 'mysql' stop: /etc/init.d/mysql
[CEST Sep 2 07:40:03] info : 'mysql' start: /etc/init.d/mysql
[CEST Sep 2 07:41:24] error : 'mysql' failed, cannot open a connection to INET[H045:3306] via TCP
[CEST Sep 2 07:41:24] info : 'mysql' trying to restart
[CEST Sep 2 07:41:24] info : 'mysql' stop: /etc/init.d/mysql
[CEST Sep 2 07:41:25] info : 'mysql' start: /etc/init.d/mysql
[CEST Sep 2 07:42:26] info : 'mysql' connection succeeded to INET[H045:3306] via TCP
[CEST Sep 2 14:10:31] debug : start service 'apache' on user request
[CEST Sep 2 14:10:31] info : Awakened by User defined signal 1
[CEST Sep 2 14:10:31] info : 'apache' start: /etc/init.d/apache2
[CEST Sep 2 14:10:31] info : monit daemon at 2313 awakened
[CEST Sep 2 14:10:33] info : 'apache' start action done
as you can see my server (ie apache) goes nuts at 05:54am till 07:42am when it seems to just give up until i manually start apache back up at 02:10pm but why did monit give up and just stop apache and why didn't it start it back up again when server load etc was lower?

also is there any better/alternative to monit for service monitoring? all i want is for it to keep apache/mysql running stable ;)

Last edited by steve51184; 09-02-2009 at 08:10 AM.
 
Old 09-02-2009, 08:27 AM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 28,399
Blog Entries: 54

Rep: Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229
Quote:
Originally Posted by steve51184 View Post
why did monit give up and just stop apache and why didn't it start it back up again when server load etc was lower?
Because by default a timeout is configured, indicated by the "error : 'apache' service timed out and will not be checked anymore" message, so if you would like Monit to keep checking you have to explicitly configure it to not honour a timeout for the check.


Quote:
Originally Posted by steve51184 View Post
is there any better/alternative to monit for service monitoring? all i want is for it to keep apache/mysql running stable
My guess is you're not looking for something "better" but just need to understand configuring Monit. However sticking your head in the sand, fig. sp., trying to mitigate service failures by continuously restarting them is addressing symptoms, not the cause.
 
Old 09-02-2009, 08:39 AM   #3
steve51184
Member
 
Registered: Dec 2006
Posts: 372

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by unSpawn View Post
Because by default a timeout is configured, indicated by the "error : 'apache' service timed out and will not be checked anymore" message, so if you would like Monit to keep checking you have to explicitly configure it to not honour a timeout for the check.
so if i remove these 2 lines from the apache/mysql config it'll just keep trying? and would you recommend it?

if 5 restarts within 5 cycles then timeout
if 3 restarts within 5 cycles then timeout

Quote:
Originally Posted by unSpawn View Post
trying to mitigate service failures by continuously restarting them is addressing symptoms, not the cause.
your totally right on that but i have no idea how to fix the cause :\
 
Old 09-02-2009, 08:52 AM   #4
unSpawn
Moderator
 
Registered: May 2001
Posts: 28,399
Blog Entries: 54

Rep: Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229
Quote:
Originally Posted by steve51184 View Post
so if i remove these 2 lines from the apache/mysql config it'll just keep trying?
It should, yes.


Quote:
Originally Posted by steve51184 View Post
and would you recommend it?
With messages like "error : 'apache' loadavg(5min) of 396.5" I'd try addressing the cause first.


Quote:
Originally Posted by steve51184 View Post
i have no idea how to fix the cause
What's the distribution? Is it in colo or local? Does it have a fat pipe? Do you run any SAR for logging system performance? What does the machine serve? Do you limit bandwidth? Use a proxy in front of your webserver? Anything else you'd like to add?
 
Old 09-02-2009, 09:05 AM   #5
steve51184
Member
 
Registered: Dec 2006
Posts: 372

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by unSpawn View Post
With messages like "error : 'apache' loadavg(5min) of 396.5" I'd try addressing the cause first.
ok will do

Quote:
Originally Posted by unSpawn View Post
What's the distribution? Is it in colo or local? Does it have a fat pipe? Do you run any SAR for logging system performance? What does the machine serve? Do you limit bandwidth? Use a proxy in front of your webserver? Anything else you'd like to add?
ok i'll try to answer each question you asked 1 per line

ubuntu 8.10 server

it's not a colo or local it's just a normal rented dedicated server (amd dual core 4800+ 2.5ghz, 4gb ram, 320gb hdd x2)

yeah it's quite fat.. 100mbits full duplex and fully burstable

SAR? no idea what that is but no i don't log system performance

the machine is to server a few websites but mostly a large smf forum and a file server

no i don't limit bandwidth as if i'm correct things like mod_bandwidth only work for apache 1.x and i use 2.x

nope no proxy

no nothing else i think i need to add
 
Old 09-02-2009, 09:27 AM   #6
unSpawn
Moderator
 
Registered: May 2001
Posts: 28,399
Blog Entries: 54

Rep: Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229
Logging and running Atop/Dstat/Collectl/(At)sar may make it easier to keep tabs on system performance, spot bottlenecks and other problems. Is the forum and website software up to date? Are the fora and file server all publicly accessable? Does this load spiking happen each day at the same time (cronjobs)? If not, is it accompanied by huge amounts of remote or outbound requests?
 
Old 09-02-2009, 09:50 AM   #7
steve51184
Member
 
Registered: Dec 2006
Posts: 372

Original Poster
Rep: Reputation: 30
yeah the forum software is up to date and no the file server is not public only few people have access to it and even then bandwidth usage is low/ok

and no the load spikes happen at random times i believe (only just started logging monit so i don't know yet) and i believe there is HUGE amounts of outbound requests (using all 100mbits of bandwidth) as i use a script called "qooy" (remote file mirroring script) and i believe it's what is causing the problem
 
Old 09-02-2009, 10:22 AM   #8
unSpawn
Moderator
 
Registered: May 2001
Posts: 28,399
Blog Entries: 54

Rep: Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229
Quote:
Originally Posted by steve51184 View Post
i use a script called "qooy" (remote file mirroring script) and i believe it's what is causing the problem
Should be easy to test if that's it, right?
 
Old 09-02-2009, 10:28 AM   #9
steve51184
Member
 
Registered: Dec 2006
Posts: 372

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by unSpawn View Post
Should be easy to test if that's it, right?
not really as the script it self works fine but the remote uploading part sometimes gets stuck and/or loops

this is what i can tell from htop as there is MANY process uploading to the same host (mirror) where as there shouldn't be but again this doesn't happen all the time but it does use a lot of ram and all 100mbits of bandwidth and causes a very high load if nothing is done.. might be fixed by simply limiting the bandwidth? and/or closing/limiting the child process somehow?

Last edited by steve51184; 09-02-2009 at 10:30 AM.
 
Old 09-02-2009, 10:54 AM   #10
unSpawn
Moderator
 
Registered: May 2001
Posts: 28,399
Blog Entries: 54

Rep: Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229Reputation: 3229
I don't know the script, all I know it's commercially licensed software. If you bought it you may be eligible for support. If the script isn't able to properly detect and deal with remote problems (and considering bandwidth limiting also is not treating the cause) then that would be a good candidate for finding a better/alternative/substitute.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Monitoring Ubuntu Services Using Monit LXer Syndicated Linux News 0 07-31-2007 10:31 PM
LXer: Monitoring Debian Servers Using Monit LXer Syndicated Linux News 0 07-25-2007 12:46 AM
LXer: Server Monitoring With munin And monit LXer Syndicated Linux News 0 05-07-2006 08:54 PM
monitoring daemons, filesystems and more (Monit v4.0B) markus1982 Linux - Software 2 09-14-2003 08:38 AM
INFO: configuring monit (process monitoring) markus1982 Linux - Software 0 05-25-2003 05:39 AM


All times are GMT -5. The time now is 11:24 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration