Shell script required to identify if a cron job is stuck for for several days and notify users through email.
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Shell script required to identify if a cron job is stuck for for several days and notify users through email.
Recently, one of our cron jobs was stuck for several days and we were not aware of the situation.
We want to build a shell/bash script to monitor whether the log trace is rolling.If the log trace is not rolling since quite a long period of time, we want to send an email to our Tech team with the details.
P:S: - We have multiple cron jobs and we want to implement the same for all the jobs. Right now, we are not storing the details of the processing of the cron jobs in any of the log files.
Recently, one of our cron jobs was stuck for several days and we were not aware of the situation. We want to build a shell/bash script to monitor whether the log trace is rolling.If the log trace is not rolling since quite a long period of time, we want to send an email to our Tech team with the details.
P:S: - We have multiple cron jobs and we want to implement the same for all the jobs. Right now, we are not storing the details of the processing of the cron jobs in any of the log files.
What AwesomeMachine posted will work fine, if your scripts are written to return an error. Are they? Or did you mean the process was hung, and never returned at all?? There are lots of way to accomplish this, but they are going to depend on how your scripts work now and what they're doing/calling, and how long these jobs should take to run.
First wild idea I'd suggest is to write a simple shell script, to do a "ps -ef" and look for the name of any of your cron jobs, and run this on an off-cycle schedule. Meaning that if your cron job normally runs every 5 minutes, run the 'checker' every 7 minutes, so you won't catch any jobs running normally. Again, timing will depend on how often these jobs run, how long they take to complete, etc. If this shell script finds the cron script present, it will send an email to whomever. Could even have that script loop through a simple text file with the names of any of your cron scripts and check them all.
In my case, the process was hung and it never returned. This particular job runs once in every 3 hrs 30 mins and sometimes, it runs for more than 5 hrs(depending on the load). The average run time of this job is 2 hrs. We already have some shell scripting in place that checks for duplicate processes and in case a duplicate process is found, it won't trigger the next run.
We are basically looking for a generic solution that would identify if any of the cron jobs are hung and send an email notification to the users if found any. We have 10 cron jobs in total.
Thanks all for your reply.
In my case, the process was hung and it never returned. This particular job runs once in every 3 hrs 30 mins and sometimes, it runs for more than 5 hrs(depending on the load). The average run time of this job is 2 hrs. We already have some shell scripting in place that checks for duplicate processes and in case a duplicate process is found, it won't trigger the next run.
We are basically looking for a generic solution that would identify if any of the cron jobs are hung and send an email notification to the users if found any. We have 10 cron jobs in total.
Not any generic solution in this case. If the job is just hung, it's not returning any errors at all...just sitting there. Again, you're going to have to modify your scripts and do some good planning to make this work. Sounds like you already know average run times, so you have some data points at least.
Unless you write the process checker as suggested, I don't see much past looking into whatever real commands are IN those cron scripts, to see if they have any built-in error/process checking, and enabling those if they do.
Psudocode:
get the start time of the cron job in seconds:
Code:
STARTTIME=$(ps -ef | grep <name of script> | grep -v grep | cut the start time | date +%s)
get the current time from the output of the date command in seconds:
Code:
NOW=$(date +%s)
if the difference is greater than <3 hours>, send email
Code:
if [ $STARTTIME - $NOW > 10800 ] do
mailx -s "<name of script> appears to be hung" support@yourdomain.com
The <name of script> could be fed in at the command line from a file
or
by parsing the output of crontab -l, which would be more dynamic...addition or removal of a cron job would automagicially be added or removed from the process.
Take a shot at that. Get back to us if you run into problems.
One option you have is to do a timout - a small shell script put in the background for a given amount of time...
timeout 3h $$
Code:
#!/bin/sh
# timeout.sh - sleep for a given time then send an alarm to the specified process
# usage:
# timeout.sh 3h $$ &
# where:
# 3h -> sleep for three hours
# $$ -> process to send the alarm to
sleep $1
kill -s SIGALRM $2
for instance.
Then in the cron job have
Code:
#/bin/sh
# sample startup...
mail_error(){
mail -s $1 <<EOF
Timeout error from xyz
EOF
}
trap "mail_error $0" SIGALRM
timeout.sh 3h $$ &
<the rest of your script>
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.