LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 05-25-2019, 01:16 PM   #1
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Rep: Reputation: 24
Script to monitor CPU usage, run command at threshold: input?


I'm wanting to put together a script that checks for high CPU usage and, at a certain threshold, runs a command. The reason for the script is that lately my system will unpredictably get high CPU usage, making the GUI difficult or impossible to use.

It seems Xorg is the culprit process soaking up cycles. The fix is rather simple: I simply restart my window manager (JWM) and CPU usage goes back down to normal levels. I can issue jwm -restart from a terminal or I set up a key combination that does the same as an interim solution to the problem. So I want my CPU-monitoring script to run that command so as to automate things--something that would be a real help when I'm not physically present to restart the WM.

I've found a script on the internet that seems like it could be easily adapted to my scenario and I'd like to ask here for some input on it (and on the task in general). The script, with my modifications added, would look as follows:

Code:
#!/bin/bash

CPU_LOAD=$(uptime | cut -d"," -f6 | sed -e "s/\.//g") #this selects uptime's 15 min. field
CPU_THRESHOLD=060
#based on recent monitoring 060 seems like probably a good number for the 15 min. field threshold for this system

if [ $CPU_LOAD -gt $CPU_THRESHOLD ] ; then
  /usr/bin/jwm -restart #do I need to specify display? like :0.0?
fi

exit 0
So I would run this as a cron job, say, at 10 minute intervals. My main question has to with issuing the jwm -restart command. Past experience has shown that I cannot issue that command from another tty and have it be effective (for example, if I log into the system remotely in an ssh session and try to run it from there). I'm guessing that may be because I need to specify the display to which the command needs to be sent. Does that make sense? Also, might it be better to use the 5-minute field for monitoring threshold?

Any input on the task, means for accomplishing it, or improvements to the script, will be appreciated.
 
Old 05-25-2019, 03:11 PM   #2
WideOpenSkies
Member
 
Registered: May 2019
Location: /home/
Distribution: Arch Linux
Posts: 166

Rep: Reputation: 61
Can you run the cron job as is and see if that works, first? We can debug if not setting a specific size works.

As for this:

Quote:
Originally Posted by jamtat View Post
Also, might it be better to use the 5-minute field for monitoring threshold?
I do think every couple of minutes would be good. Ten minutes seems too long. I don't use jwm, but for my wm -- dwm -- I have a script checking CPU usage every half second. Maybe you'll want to do the same thing.
 
Old 05-25-2019, 06:36 PM   #3
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
Quote:
Originally Posted by jamtat View Post
might it be better to use the 5-minute field for monitoring threshold?
The 5-minute field gives you the load average over five minutes. This means it takes very roughly three times as long to detect a high load condition when you use the 15 minutes field. Also, the longer the average is, the more it smoothens out your load, which may make it harder for the system to detect peaks.

It really depends on factors like:
  • How long does your high load condition last. If it’s forever, a longer average will eventually lead to a restart.
  • How much disruption is caused by the restart. If it’s not much, you can restart more often.
  • How painful is it to work under load. If it’s very painful, restart more often.
One more consideration: What you are doing is an interesting exercise, but the real problem is obviously elsewhere. Where does the load come from - misconfiguration? Not enough RAM? A process that runs under jwm causes problems? This is where you should do your troubleshooting.
 
Old 05-27-2019, 09:58 AM   #4
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
Thanks for the input so far. The 15-minute load average seemed to me like the better metric to monitor since there is a greater chance that some legitimate process might meet the designated threshold for 5 minutes. I think about the only thing I ever did that demanded that many CPU cycles over a 5-minute period was video transcoding and I don't do much of that anymore. But in any case the likelihood of a process that demands those kind of cycles extending over a 15-minute period being rogue is greater than the likelihood of one extending over a 5-minute period. So I'll probably go with the 15-minute field in my preliminary testing.

And yes, I do need to determine what exactly is causing this. I did a bit of troubleshooting a few weeks ago and all I was able to determine at that time is that Xorg is using the cycles. The machine has 8 GB RAM and it is not being completely used up so I doubt it's that. So I'll be continuing to try to determine what graphical process/program might lie behind that.

DISPLAY=0.0 jwm -restart is what seems to work to restart the WM from within an ssh session, btw. CORRECTION: no, that works from a tty. Trying to restart from an ssh session is a different matter and involves it's own set of issues. Since I'm focusing in this post on automating this from the host machine (as a cron job), I'm going to set aside the issue of possibly restarting the WM from within an ssh session.

Last edited by jamtat; 05-27-2019 at 10:18 AM.
 
Old 05-29-2019, 11:19 PM   #5
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
I appreciate the effort you put into this, and the expertise required to accomplish it.

However, I question the usefulness of the chosen "solution".

You say:
Quote:
Originally Posted by jamtat View Post
It seems Xorg is the culprit process soaking up cycles. The fix is rather simple: I simply restart my window manager (JWM) and CPU usage goes back down to normal levels.
I think you should really find out what is happening there, and try to fix that.

troubleshooting steps:
  • try a different window manager and see if that fixes it
  • is there always a certain program open when the freeze happens? a full-blown web browser would be a common culprit. what are you doing with it? are you allowing all javascript? using media a lot?
  • is your graphics unit fully supported by its driver, i.e. is hardware accelaration available?
PS: if you want my help, you need to provide more code output.
 
Old 06-03-2019, 04:41 PM   #6
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
Yeah, I know some further troubleshooting of the high CPU usage issue is needed, ondoho. I did a little of that a few weeks ago but didn't get too far. I've done a little more now and have a potential candidate process other than Xorg. But while I continue my efforts I'm hoping to ensure that the machine, while unattended, doesn't get into a state where it's difficult or impossible to use upon my return--thus the rationale for the script I've tried to create. Besides, troubleshooting the issue, if it comes down to soliciting help here, really belongs in its own thread (look for one later, should my current troubleshooting attempts meet with failure).

Meantime, my script for automating the restart of the WM, for reasons that are not yet clear to me, is so far not working. So I've come up with another related script that should be helpful to my troubleshooting efforts, as follows:

Code:
#!/bin/bash
CPULEVEL15=$(cat /proc/loadavg | awk '{print $3}' | sed -e "s/\.//g") #poll 15 min. CPU load avg., remove decimal point
CPUHIST=$(tail -n 40 /home/user/cpu_usage.txt) #file containing record of system's CPU usage grabbed at 5 min. intervals
CPU_THRESHOLD=050 #set 15 min. load avg. threshold above which notification should be sent

if [[ "$CPULEVEL15" -gt "$CPU_THRESHOLD" ]] ; then
  #echo "comparison succeeded" # <----test whether comparison is working
  echo -e "Current 15-minute CPU load average is: $CPULEVEL15%\n$CPUHIST" | mail -s "My-host high CPU load alert" me@my-mail.com
      fi

      exit 0
As may be clear, the script relies on a program like mailx being installed, and an attending valid smtp configuration and installed utility (I personally use msmtp). It also relies on another script I created which polls CPU load averages every 5 minutes and saves them to a file (named cpu_usage.txt, located in the user's home directory). It compares the 15-minute average CPU load with a threshold limit set by the user and, if that load is higher than the stipulated threshold, triggers an e-mail notification. I've now set that up as a cron job that runs at 15-minute intervals; testing indicates it should work as intended.

Likely more to come later.

Last edited by jamtat; 06-03-2019 at 11:56 PM.
 
Old 06-04-2019, 12:57 AM   #7
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
i prefer to troubleshoot the simplest things first, even if they're less likely to be the cause of the problem.
try another window manager.
 
Old 06-05-2019, 10:57 PM   #8
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
The script I created to trigger e-mail notifications when a certain 15-minute CPU load threshold is reached seems to be working great so far. As to troubleshooting strategy and starting with simpler things, a lengthy engagement with computer problems and determining their causes has definitely led me to appreciate that approach and it is one I typically use. In this case it is less applicable since I'm running a WM custom configured to be usable by my wife, and if I switch to some other it may be a barrier to her using the computer. So I'm trying to avoid switching WMs.

When the high load average occurred today, I managed once again to bring loads back down to normal by killing a particular process that runs under Xorg but is neither Xorg itself nor the WM. So it seems I am zeroing in on the true culprit. So perhaps I will wind up modifying my script so that it will kill and then restart that application when high average CPU loads occur, rather than sending me an e-mail notification. Or perhaps both.
 
Old 06-07-2019, 05:21 AM   #9
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,781

Rep: Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198
If you look at the 15 minutes load curve then you'll see that there is quick rise and slow fall.
Better take the minimum of the 5 minutes and the 15 minutes values; the resulting curve becomes more symmetric, i.e. it takes longer to trigger an alert and shorter to cancel it.
 
Old 06-12-2019, 03:58 AM   #10
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
Quote:
Originally Posted by jamtat View Post
In this case it is less applicable since I'm running a WM custom configured to be usable by my wife, and if I switch to some other it may be a barrier to her using the computer. So I'm trying to avoid switching WMs.
"Troubleshooting" is not meant to become a solution, just help you find what's going on.

Quote:
Originally Posted by jamtat View Post
When the high load average occurred today, I managed once again to bring loads back down to normal by killing a particular process that runs under Xorg but is neither Xorg itself nor the WM.
I wonder why you aren't telling us what that process is.
Could help to propose real solutions.
 
Old 06-13-2019, 07:32 AM   #11
Mike_Walsh
Member
 
Registered: Jul 2017
Location: King's Lynn, UK
Distribution: Nowt but Puppies....
Posts: 660

Rep: Reputation: 362Reputation: 362Reputation: 362Reputation: 362
Puppy Linux uses JWM as its default WM. I occasionally get this same problem; re-starting 'X' always seems to 'cure' it, but for me the problem is invariably the same one.

It's not Xorg, or the WM. I'm a long-term Chrome user, and recent versions don't always kill the

Code:
--nacl-helper
....process at Chrome startup, after it's done its part of the startup process. Killing the process in mate-system-monitor always brings it back under control. Your problem, however, sounds a bit different to mine; I just wanted to point out that your assertion that JWM isn't responsible is like as not correct.


Mike.
 
Old 06-15-2019, 01:04 AM   #12
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
Quote:
Originally Posted by Mike_Walsh View Post
Chrome
Settings => Advanced => Uncheck "keep background processes running when chrome is closed" or some such.

also: closed source, big G, grumble grumble.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell script for CPU usage, memory usage, disk partition space and service status reetesh.amity Linux - Server 6 10-12-2015 07:51 PM
Internet Usage Monitor :- Data Usage Monitor / Calculator in Linux jeevanism Linux - Software 1 12-05-2014 01:57 PM
[SOLVED] High CPU load, but low CPU usage (high idle CPU) baffy Linux - Newbie 5 03-13-2013 09:24 AM
CPU Temperature above threshold? matrixtna Linux - General 7 11-30-2008 07:44 AM
how to determine cpu usage, memory usage, I/O usage by a particular user logged on li rags2k Programming 4 08-21-2004 04:45 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 03:14 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration