Identfying a hangup / Idle / Crashed Process in Linux
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Identfying a hangup / Idle / Crashed Process in Linux
Hi Guys
Can you help me out in Identifying a hangup / idle / crashed process in Linux.
Assuming that I have a continuously running java program, if at any point of time I would like to know whether that process has hungup / idle / crashed .. how do i use linux commands for that purpose.
Probably the simplest thing to do would be to create, say, a cron job that periodically does ps, then greps it to extract references to the desired program-name, and sees if it's actually out there and does not say <defunct>.
The /proc pseudo-filesystem, in Linux, can tell you these things about processes; in Linux, that's basically how ps and such things work.
Still it's just a script and the minimal cronjob interval still is one long minute. If the application can be monitored through something more tweakable in terms of how ant what to test, like say Monit, I think it would be better.
But this seems to be complex to use for a guy like me.
I have a java standalone app running on RH box .. which writes some standard output "continuously" on to a log file.
If the app gets hanged , I monitor the log , if its unchanged for quite sometime(approx 15mins to 30 mins) then I restart that app.
Whats the best way to monitor that app now using Linux commands so that I can be notified through sendmail that the app has hanged.
The apps a java program which process requests sent to it ..it runs continuously .. its a multithreaded program acts like queue for jobs sent to it and processes one by one ... its name can anything depending on the setup .. like XYZ or something else .. ps reveals the name as XYZ.. so while checking for pending jobs it write a lot of statements in log file..
Well, that's not enough info to go on. Here's a lame way to watch a logfile for changes. Save as "watchfile" and make executable. Change watchFile= param to logfile to watch, pollTime and pollMax to what's reasonable and "/some/dir/binary --with-args --to-match" to the path and name of the binary to kill or else use your own kill command. Works for me but as always YMMV(VM).
Code:
#!/bin/sh
trap 'whoAmI' INT
progn="watchfile"
watchFile="/var/log/messages" # File to watch.
pollTime="10" # int, seconds sleep.
pollMax="2" # int, threshold.
pollCount="0" # Keep this zero
alertMessage() { logger "${progn} [$$] ${watchFile} didnt change for $[${pollMax}*${pollTime}] seconds."; }
restartCmd() { /usr/bin/pkill -HUP -f "/some/dir/binary --with-args --to-match"; sleep ${pollTime}s; nosleep="1"; }
getMtime() { stat -c %Y "$watchFile"; }
whoAmI() { echo "${progn}: ($$)"; }
if [ ! -f "${watchFile}" ]; then echo "${progn} [FATAL] No file ${watchFile}, exiting." >/dev/stderr; exit 127; fi
if [ "$#" -eq "1" -a "$1" = "-k" ]; then /usr/bin/pkill -KILL -f "watchfile"; fi
# Initialise old mtime
mtime_o=$(getMtime)
until [ 1 -eq 0 ]; do
if [ "$(getMtime)" -ne "$mtime_o" ]; then
pollCount=0
[ -n "$nosleep" -a "$nosleep" -eq "1" ] && { sleep ${pollTime}s; unset nosleep; } || sleep ${pollTime}s
elif [ "$(getMtime)" -eq "$mtime_o" ]; then
sleep ${pollTime}s
((pollCount++))
[ "$pollCount" -eq "$pollMax" ] && { alertMessage; pollCount=0; restartCmd; }
fi
done
trap - INT
exit 0
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.