Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
07-28-2017, 02:06 PM
|
#1
|
LQ Newbie
Registered: Jul 2017
Posts: 2
Rep:
|
Shell script required to identify if a cron job is stuck for for several days and notify users through email.
Recently, one of our cron jobs was stuck for several days and we were not aware of the situation.
We want to build a shell/bash script to monitor whether the log trace is rolling.If the log trace is not rolling since quite a long period of time, we want to send an email to our Tech team with the details.
P:S: - We have multiple cron jobs and we want to implement the same for all the jobs. Right now, we are not storing the details of the processing of the cron jobs in any of the log files.
|
|
|
07-28-2017, 02:19 PM
|
#2
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,530
|
Welcome.
Another way might be to put a time limit on how long the job can take and kill it if it goes over the time limit. timeout can do that:
Code:
timeout 600 /usr/local/bin/someslowscript || echo "Script Failed" | mail -s "Fail" techteam@example.com
See "man timeout" If the script is completed before the time limit, there is not problem. If it goes over time it is killed and a mail sent.
A prerequisite for all that however is the ability to send mail from that machine. Is it set up?
Code:
echo "This is a test. $(date)" | mail -s "A test" techteam@example.com
|
|
|
07-28-2017, 04:00 PM
|
#3
|
LQ Guru
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524
|
You can configure cron to send an email on errors. Put this at the top of crontab:
|
|
|
07-30-2017, 09:07 AM
|
#4
|
LQ Guru
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 27,192
|
Quote:
Originally Posted by mpat86
Recently, one of our cron jobs was stuck for several days and we were not aware of the situation. We want to build a shell/bash script to monitor whether the log trace is rolling.If the log trace is not rolling since quite a long period of time, we want to send an email to our Tech team with the details.
P:S: - We have multiple cron jobs and we want to implement the same for all the jobs. Right now, we are not storing the details of the processing of the cron jobs in any of the log files.
|
What AwesomeMachine posted will work fine, if your scripts are written to return an error. Are they? Or did you mean the process was hung, and never returned at all?? There are lots of way to accomplish this, but they are going to depend on how your scripts work now and what they're doing/calling, and how long these jobs should take to run.
First wild idea I'd suggest is to write a simple shell script, to do a "ps -ef" and look for the name of any of your cron jobs, and run this on an off-cycle schedule. Meaning that if your cron job normally runs every 5 minutes, run the 'checker' every 7 minutes, so you won't catch any jobs running normally. Again, timing will depend on how often these jobs run, how long they take to complete, etc. If this shell script finds the cron script present, it will send an email to whomever. Could even have that script loop through a simple text file with the names of any of your cron scripts and check them all.
But more details are needed.
|
|
|
07-31-2017, 10:34 AM
|
#5
|
LQ Newbie
Registered: Jul 2017
Posts: 2
Original Poster
Rep:
|
Thanks all for your reply.
Hi LQ Guru,
In my case, the process was hung and it never returned. This particular job runs once in every 3 hrs 30 mins and sometimes, it runs for more than 5 hrs(depending on the load). The average run time of this job is 2 hrs. We already have some shell scripting in place that checks for duplicate processes and in case a duplicate process is found, it won't trigger the next run.
We are basically looking for a generic solution that would identify if any of the cron jobs are hung and send an email notification to the users if found any. We have 10 cron jobs in total.
|
|
|
07-31-2017, 11:46 AM
|
#6
|
LQ Guru
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 27,192
|
Quote:
Originally Posted by mpat86
Thanks all for your reply.
In my case, the process was hung and it never returned. This particular job runs once in every 3 hrs 30 mins and sometimes, it runs for more than 5 hrs(depending on the load). The average run time of this job is 2 hrs. We already have some shell scripting in place that checks for duplicate processes and in case a duplicate process is found, it won't trigger the next run.
We are basically looking for a generic solution that would identify if any of the cron jobs are hung and send an email notification to the users if found any. We have 10 cron jobs in total.
|
Not any generic solution in this case. If the job is just hung, it's not returning any errors at all...just sitting there. Again, you're going to have to modify your scripts and do some good planning to make this work. Sounds like you already know average run times, so you have some data points at least.
Unless you write the process checker as suggested, I don't see much past looking into whatever real commands are IN those cron scripts, to see if they have any built-in error/process checking, and enabling those if they do.
|
|
|
07-31-2017, 12:04 PM
|
#7
|
LQ Veteran
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: Rocky 9.4
Posts: 5,805
|
Psudocode:
get the start time of the cron job in seconds:
Code:
STARTTIME=$(ps -ef | grep <name of script> | grep -v grep | cut the start time | date +%s)
get the current time from the output of the date command in seconds:
if the difference is greater than <3 hours>, send email
Code:
if [ $STARTTIME - $NOW > 10800 ] do
mailx -s "<name of script> appears to be hung" support@yourdomain.com
The <name of script> could be fed in at the command line from a file
or
by parsing the output of crontab -l, which would be more dynamic...addition or removal of a cron job would automagicially be added or removed from the process.
Take a shot at that. Get back to us if you run into problems.
Last edited by scasey; 07-31-2017 at 07:24 PM.
|
|
|
08-02-2017, 10:50 AM
|
#8
|
Senior Member
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,908
|
One option you have is to do a timout - a small shell script put in the background for a given amount of time...
timeout 3h $$
Code:
#!/bin/sh
# timeout.sh - sleep for a given time then send an alarm to the specified process
# usage:
# timeout.sh 3h $$ &
# where:
# 3h -> sleep for three hours
# $$ -> process to send the alarm to
sleep $1
kill -s SIGALRM $2
for instance.
Then in the cron job have
Code:
#/bin/sh
# sample startup...
mail_error(){
mail -s $1 <<EOF
Timeout error from xyz
EOF
}
trap "mail_error $0" SIGALRM
timeout.sh 3h $$ &
<the rest of your script>
Or something like that.
|
|
1 members found this post helpful.
|
All times are GMT -5. The time now is 02:22 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|