LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   cron error (https://www.linuxquestions.org/questions/linux-newbie-8/cron-error-737424/)

linuxentry 07-03-2009 04:56 AM

cron error
 
Hi,

This is a bit long post sorry about that. Being totally new to Linux getting some difficulty in handling this particular problem. Any help on this would be great!

We have the MRTG application for router monitoring running in on a Linux Server, monitoring and polling around 300 routers. The server has 2 CPU with 6 GB RAM.

For this we have almost 200+ cronjobs scheduled to start every 5th minute. Basically to poll every router on every 5th minute.

My question is: Is there a limitation on how many cronjobs we can run on a server (both as a good practice and as a limitation of cron queue). I dont see any configuration currently on the server.

The problem on the server is we can see that the server goes down on every 2-3 days and we do not have any panic/or other messages in the message file which can relate to some kind of problem.

The only message we see is:

crond[30090]: System error


which keeps appearing on every minute or so.

This has prompted to me to suspect if the server is going down because of cron queue growing ?

Also when I see top command I can see that there are around 800-1400 zombie processes and in the ps output lots of crond [defunct] processes.

Keep :)
linuxentry

unSpawn 07-03-2009 06:26 AM

Quote:

Originally Posted by linuxentry (Post 3595324)
This is a bit long post sorry about that.

Don't be. The more factual information the better: it will help us help you.


Quote:

Originally Posted by linuxentry (Post 3595324)
I can see that there are around 800-1400 zombie processes and in the ps output lots of crond [defunct] processes.

While crond may cause the system to run out of resources, this may indicate there could be something wrong with the jobs themselves. I would suggest to look at that first.

centosboy 07-03-2009 07:07 AM

Quote:

Originally Posted by linuxentry (Post 3595324)
Hi,

This is a bit long post sorry about that. Being totally new to Linux getting some difficulty in handling this particular problem. Any help on this would be great!

We have the MRTG application for router monitoring running in on a Linux Server, monitoring and polling around 300 routers. The server has 2 CPU with 6 GB RAM.

For this we have almost 200+ cronjobs scheduled to start every 5th minute. Basically to poll every router on every 5th minute.

My question is: Is there a limitation on how many cronjobs we can run on a server (both as a good practice and as a limitation of cron queue). I dont see any configuration currently on the server.

The problem on the server is we can see that the server goes down on every 2-3 days and we do not have any panic/or other messages in the message file which can relate to some kind of problem.

The only message we see is:

crond[30090]: System error


which keeps appearing on every minute or so.

This has prompted to me to suspect if the server is going down because of cron queue growing ?

Also when I see top command I can see that there are around 800-1400 zombie processes and in the ps output lots of crond [defunct] processes.

Keep :)
linuxentry

Hi.

i dont wish to sound unhelpful, but how do you manage to have that many mrtg commands in the crontab?
1 or 2 should surely be enough?
You certainly wouldnt need one for each device..
All the devices should be listed in the mrtg.conf file

then a command like this

Code:

/usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok
would poll the mrtg config file. I would seriously look at how mrtg is set up on your server :)

Anyway, i would try cacti over mrtg.
It is better (better graphs, output and information)

also you need only one line of command in your crontab to monitor all your devices.

linuxentry 07-10-2009 12:14 AM

hi Thanks for the reply.

The teams here have gone for multiple cron entries as sequencing the routers in a single config file may take more thant 5 minutes for one complete execution. And every router has to be polled for every 5 minutes. And there are almost 250+ routers, geographically spread across.

New Update:
The server stopped resonding after a week of operation and on the console I can see the "Out of memory killed process PID " messages also.

any help on this is greatly appreciated.

Thanks
linuxentry :)

chrism01 07-10-2009 12:32 AM

As unSpawn said, post #2, looks like you've got problems in your perl programs. That many defunct/zombied processes is bad... you need to find out why they do that. It will kill your server, as you've seen.
If you want to poll that many objects that oftewn, don't spawn hundreds of processes every 5 mins from cron, break them in eg 50 router blocks, and look at running each set inside a perl daemon that doesn't have to create a new process for each individual router poll.
Use a timer at the bottom of each daemon's loop and wait 5mins - (time it took to do set-of-routers).

jeromeNP7 07-10-2009 05:30 AM

Instead of cron jobs you should create a daemon application that does all the polling by itself in regular intervalls. The usual problem with often repeated cron jobs is that they are only as good a the script or application that they launch. Launching a buggy tool will simply multiply the number of problems with the number of cron jobs.

Linux


All times are GMT -5. The time now is 09:25 PM.