LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 08-11-2006, 02:15 AM   #1
ravi2082
LQ Newbie
 
Registered: Aug 2006
Posts: 5

Rep: Reputation: 0
Identfying a hangup / Idle / Crashed Process in Linux


Hi Guys

Can you help me out in Identifying a hangup / idle / crashed process in Linux.

Assuming that I have a continuously running java program, if at any point of time I would like to know whether that process has hungup / idle / crashed .. how do i use linux commands for that purpose.

Please help me out in this.

Thanks
Ravi
 
Old 08-11-2006, 06:25 AM   #2
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 9,291
Blog Entries: 4

Rep: Reputation: 3318Reputation: 3318Reputation: 3318Reputation: 3318Reputation: 3318Reputation: 3318Reputation: 3318Reputation: 3318Reputation: 3318Reputation: 3318Reputation: 3318
Probably the simplest thing to do would be to create, say, a cron job that periodically does ps, then greps it to extract references to the desired program-name, and sees if it's actually out there and does not say <defunct>.

The /proc pseudo-filesystem, in Linux, can tell you these things about processes; in Linux, that's basically how ps and such things work.
 
Old 08-11-2006, 07:33 AM   #3
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594
Here's an example cronjob:
Code:
#!/bin/sh
/bin/ps ax -eostate,pid,ppid --sort=state 2>/dev/null|grep ^Z|while read zombie pid ppid; do
 msg="Killing zombie PID $pid of parent $ppid: "; [ "$ppid" -gt "20" ] && kill -9 $ppid 2>/dev/null >/dev/null
 case "$?" in 0) msg="${msg} succeeded.";; 1) msg="${msg} FAILED.";; 
 *) msg="${msg} exited with unknown $?";; esac; logger "${msg}"; done; exit 0
Still it's just a script and the minimal cronjob interval still is one long minute. If the application can be monitored through something more tweakable in terms of how ant what to test, like say Monit, I think it would be better.

Last edited by unSpawn; 08-11-2006 at 07:34 AM.
 
Old 08-14-2006, 02:45 AM   #4
ravi2082
LQ Newbie
 
Registered: Aug 2006
Posts: 5

Original Poster
Rep: Reputation: 0
Thanks

But this seems to be complex to use for a guy like me.
I have a java standalone app running on RH box .. which writes some standard output "continuously" on to a log file.
If the app gets hanged , I monitor the log , if its unchanged for quite sometime(approx 15mins to 30 mins) then I restart that app.
Whats the best way to monitor that app now using Linux commands so that I can be notified through sendmail that the app has hanged.

Thanks
Ravi
 
Old 08-14-2006, 04:10 AM   #5
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594
What does the app do? What's the applications name?
 
Old 08-14-2006, 08:39 AM   #6
ravi2082
LQ Newbie
 
Registered: Aug 2006
Posts: 5

Original Poster
Rep: Reputation: 0
The apps a java program which process requests sent to it ..it runs continuously .. its a multithreaded program acts like queue for jobs sent to it and processes one by one ... its name can anything depending on the setup .. like XYZ or something else .. ps reveals the name as XYZ.. so while checking for pending jobs it write a lot of statements in log file..

Thanks
Ravi
 
Old 08-14-2006, 12:01 PM   #7
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594Reputation: 3594
Well, that's not enough info to go on. Here's a lame way to watch a logfile for changes. Save as "watchfile" and make executable. Change watchFile= param to logfile to watch, pollTime and pollMax to what's reasonable and "/some/dir/binary --with-args --to-match" to the path and name of the binary to kill or else use your own kill command. Works for me but as always YMMV(VM).

Code:
#!/bin/sh
trap 'whoAmI' INT
progn="watchfile"
watchFile="/var/log/messages" # File to watch.
pollTime="10" # int, seconds sleep.
pollMax="2" # int, threshold.
pollCount="0" # Keep this zero

alertMessage() { logger "${progn} [$$] ${watchFile} didnt change for $[${pollMax}*${pollTime}] seconds."; }
restartCmd() { /usr/bin/pkill -HUP -f "/some/dir/binary --with-args --to-match"; sleep ${pollTime}s; nosleep="1"; }
getMtime() { stat -c %Y "$watchFile"; }
whoAmI() { echo "${progn}: ($$)"; }

if [ ! -f "${watchFile}" ]; then echo "${progn} [FATAL] No file ${watchFile}, exiting." >/dev/stderr; exit 127; fi
if [ "$#" -eq "1" -a "$1" = "-k" ]; then /usr/bin/pkill -KILL -f "watchfile"; fi
# Initialise old mtime
mtime_o=$(getMtime)

until [ 1 -eq 0 ]; do
        if [ "$(getMtime)" -ne "$mtime_o" ]; then
                pollCount=0
                [ -n "$nosleep" -a "$nosleep" -eq "1" ] && { sleep ${pollTime}s; unset nosleep; } || sleep ${pollTime}s
        elif [ "$(getMtime)" -eq "$mtime_o" ]; then
                sleep ${pollTime}s
                ((pollCount++))
                [ "$pollCount" -eq "$pollMax" ] && { alertMessage; pollCount=0; restartCmd; }
        fi
done

trap - INT
exit 0
 
Old 08-16-2006, 05:51 AM   #8
ravi2082
LQ Newbie
 
Registered: Aug 2006
Posts: 5

Original Poster
Rep: Reputation: 0
Hi

Thanks for the script ... with a little customization it worked well ... Thanks very much to all of u

Thanks
Ravi
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
catching a rogue or unknown process that last miliseconds on an "idle system" Emmanuel_uk Linux - Security 3 06-11-2006 04:42 PM
Suse 10.0 Crashed After Sometime Being Idle TigerLinux SUSE / openSUSE 4 02-14-2006 09:56 AM
Application "nautilus" (process 5342) has crashed due to a fatal error. zorr0 Linux - General 1 05-10-2004 07:56 PM
ppp generate over 1000 process during dial and hangup frankcheong Linux - Networking 0 06-05-2003 03:24 AM
idle a moment in startup process alxle Linux - General 1 12-23-2001 08:53 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 01:25 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration