LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 11-14-2016, 11:57 PM   #1
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Kubuntu 20.04 on workstation, CentOS 6.x on servers
Posts: 1,206

Rep: Reputation: 49
Watchdog script for a program that crashes all the time


I use Synergy to have mouse/keyboard across multiple machines having their own monitor, problem is this program crashes all the damn time, I'm constantly having to reset it. I go on their forums and lot of people have this issue too even with newer versions. I can't get anything past 1.7.6 to install due to too many dependency errors that I can't figure how to resolve but even the 1.8.x crashes.

When it crashes, it actually stays as a task. The only way to stop it is to kill -9 by process ID. Even killall does not work. Otherwise it would be as simple as writing a script that just checks ps aux output and relaunches as required but it will still be in there when it crashes.

How would I go about writing a script that monitors a program to see if it stopped responding, basically? Would this even be doable?
 
Old 11-15-2016, 10:34 AM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, RPi OS, Mint & Android
Posts: 13,009

Rep: Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727
It's probably doable but difficult in a script. A small C patch to the source would probably be a better way to go, along with a script. A debug might be optimal.

Hardware Watchdogs have a timer, and need regular reset pulses. If the timer runs out, it performs a master reset of whatever cpu or micro controller they are running on. Software implements a version of the same thing, except you have more variety on how to get out of it.

How about a patch to touch some memory resident file regularly, along with a script to check the touch is happening. If the memory resident thing is remaining at the same value, kill the app & restart.
 
Old 11-15-2016, 10:43 AM   #3
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,237

Rep: Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656Reputation: 1656
Question: What status does the process start in and what status does it end up in when it is broken?

For example:

Code:
ps auxw
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.6  40896  3444 ?        Ss   15:58   0:00 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
 
Old 11-15-2016, 03:55 PM   #4
dr_agon
Member
 
Registered: Sep 2007
Location: Poland
Distribution: Ubuntu LTS
Posts: 104
Blog Entries: 12

Rep: Reputation: 26
If you can kill -9 PID you can killall -s 9 <name>.

But how do you recognize when it has crashed?
 
Old 11-16-2016, 01:15 AM   #5
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Kubuntu 20.04 on workstation, CentOS 6.x on servers
Posts: 1,206

Original Poster
Rep: Reputation: 49
Ohhh I did not know about being able to use -s 9 in killall. That at very least makes it easier to relaunch. I can maybe write a script that does a killall -s 9, restarts it, SSHes into the other machines and does the same thing. At least then to relaunch it's just a small script I run.

It is an open source program so I may just have to dive into the source and see if I can strip out the basic functionality and write my own program, too. That will be a longer term solution though.

I will also try to notice the status in ps aux. If it changes, I could go by that.
 
Old 11-16-2016, 02:08 AM   #6
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, RPi OS, Mint & Android
Posts: 13,009

Rep: Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727
There's also prep and pkill, which take a process name instead of the PID. Prep searches and pkill kills, accepting the same options as kill.
 
Old 11-16-2016, 04:58 PM   #7
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Kubuntu 20.04 on workstation, CentOS 6.x on servers
Posts: 1,206

Original Poster
Rep: Reputation: 49
yeah so status does not seem to change, first one is before crash, second one is after:

Code:
ryan     26750  0.4  1.0 1189752 84732 ?       Ssl  Nov15   6:25 synergys

ryan     26750  0.4  1.6 1593720 129080 ?      Ssl  Nov15  10:02 synergys
But knowing that I can kill -9 by name I guess I'll just write a script that I can run manually, it will be good enough for now until I can dig through the source to find out how to do some of the things it does, like control the cursor, and then I'll just write my own program.
 
Old 11-17-2016, 09:59 AM   #8
dr_agon
Member
 
Registered: Sep 2007
Location: Poland
Distribution: Ubuntu LTS
Posts: 104
Blog Entries: 12

Rep: Reputation: 26
Funny: synergys even has a dedicated option to restart on failure - it seems that this is a common issue
It may be worth trying to disable this (there is an option for it, too) and see if it still stays as an active task. It may help recognizing the failure.
 
Old 11-17-2016, 10:06 PM   #9
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 17,873

Rep: Reputation: 2600Reputation: 2600Reputation: 2600Reputation: 2600Reputation: 2600Reputation: 2600Reputation: 2600Reputation: 2600Reputation: 2600Reputation: 2600Reputation: 2600
Minor typo by business_kid: prep -> pgrep just in case you (OP) try to google it....
 
Old 11-18-2016, 10:06 AM   #10
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, RPi OS, Mint & Android
Posts: 13,009

Rep: Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727
thanks. this tablet has annoying habit of MIScorrects.
 
Old 11-18-2016, 11:42 AM   #11
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 9,235
Blog Entries: 4

Rep: Reputation: 3263Reputation: 3263Reputation: 3263Reputation: 3263Reputation: 3263Reputation: 3263Reputation: 3263Reputation: 3263Reputation: 3263Reputation: 3263Reputation: 3263
I idly wonder if you could cook-up a little process that spawns this program as a child task?

Well, no ... if the program's interface crashes while the process does not, it might (or might not ...) be difficult for this parent-program to know.

Maybe if it spawned two children: the Synergy application, and a second child which watches its brother to tattle-tale to its Mommy if he starts mis-behaving . . . ?
 
Old 11-22-2016, 10:57 PM   #12
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Kubuntu 20.04 on workstation, CentOS 6.x on servers
Posts: 1,206

Original Poster
Rep: Reputation: 49
Quote:
Originally Posted by dr_agon View Post
Funny: synergys even has a dedicated option to restart on failure - it seems that this is a common issue
It may be worth trying to disable this (there is an option for it, too) and see if it still stays as an active task. It may help recognizing the failure.
Yeah that option does not actually work, oddly. There may be odd cases where it does, but for the server at least, it does not work. (tried with and without with same result).

I think I will have to dive in the code some time to see if I can either fix the crash issue, or add the watchdog built in, it could be as simple as sending a packet to a custom piece of software that simply waits for that packet. Basically a "I'm still alive" packet. It could literally be a single byte every second or something. Or make it a bit smarter and have all the instances call in to the same server process. I only have to start the server process to start it on all machines. It can use SSH to go into the other machines to start the clients.
 
Old 11-22-2016, 11:06 PM   #13
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~arch
Posts: 7,231

Rep: Reputation: Disabled
I use daemontools to keep applications alive, not sure how it handles processes that do not terminate.
 
Old 11-24-2016, 04:17 AM   #14
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, RPi OS, Mint & Android
Posts: 13,009

Rep: Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727Reputation: 1727
It's been a while, but iirc daemontools does nothing with processes that don't terminate, and restarts ones that do on cue. It's sort of inetd functionality but done the obscure and non standard way.

Last edited by business_kid; 11-24-2016 at 04:19 AM.
 
Old 11-24-2016, 05:00 PM   #15
Jjanel
Member
 
Registered: Jun 2016
Distribution: any&all, in VBox; Ol'UnixCLI; NO GUI resources
Posts: 999
Blog Entries: 12

Rep: Reputation: 363Reputation: 363Reputation: 363Reputation: 363
A couple random ideas: (I love 'puzzles' like this!)
I looked at the `ps [-?]` in #7, looking for something like D state
but I see what seems to be TWO of the SAME PID#!?
(I was thinking of a trivial script loop, to look for ... every n seconds)
And then there's the zillion ... in /proc/`pgrep synergys`/...
(I welcome LQgurus' ideas on what to look for there)

How about: strace -f -o myfile -p `pgrep synergys`
(Ctrl-C to detatch it.) ... ForWhatItsWorth... Best wishes...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Measuring the Time That a Program is Running in Shell Script mark.cooray Linux - Newbie 4 11-12-2010 12:22 AM
[SOLVED] how to starting up a script or a program at boot time golden_boy615 Linux - General 2 05-08-2010 03:56 PM
Intel's watchdog support iTCO_wdt - does this mean watchdog is not present? kushalkoolwal Linux - Hardware 3 02-06-2009 03:16 PM
Hardware watchdog in BIOS and Linux watchdog driver are different? travishein Linux - Hardware 1 12-22-2008 09:41 PM
i want to run a script or program at boot time!! FreakboY Linux - Newbie 9 10-12-2003 08:03 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 03:01 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration