Script to monitor CPU usage, run command at threshold: input?
I'm wanting to put together a script that checks for high CPU usage and, at a certain threshold, runs a command. The reason for the script is that lately my system will unpredictably get high CPU usage, making the GUI difficult or impossible to use.
It seems Xorg is the culprit process soaking up cycles. The fix is rather simple: I simply restart my window manager (JWM) and CPU usage goes back down to normal levels. I can issue jwm -restart from a terminal or I set up a key combination that does the same as an interim solution to the problem. So I want my CPU-monitoring script to run that command so as to automate things--something that would be a real help when I'm not physically present to restart the WM. I've found a script on the internet that seems like it could be easily adapted to my scenario and I'd like to ask here for some input on it (and on the task in general). The script, with my modifications added, would look as follows: Code:
#!/bin/bash Any input on the task, means for accomplishing it, or improvements to the script, will be appreciated. |
Can you run the cron job as is and see if that works, first? We can debug if not setting a specific size works.
As for this: Quote:
|
Quote:
It really depends on factors like:
|
Thanks for the input so far. The 15-minute load average seemed to me like the better metric to monitor since there is a greater chance that some legitimate process might meet the designated threshold for 5 minutes. I think about the only thing I ever did that demanded that many CPU cycles over a 5-minute period was video transcoding and I don't do much of that anymore. But in any case the likelihood of a process that demands those kind of cycles extending over a 15-minute period being rogue is greater than the likelihood of one extending over a 5-minute period. So I'll probably go with the 15-minute field in my preliminary testing.
And yes, I do need to determine what exactly is causing this. I did a bit of troubleshooting a few weeks ago and all I was able to determine at that time is that Xorg is using the cycles. The machine has 8 GB RAM and it is not being completely used up so I doubt it's that. So I'll be continuing to try to determine what graphical process/program might lie behind that. DISPLAY=0.0 jwm -restart is what seems to work to restart the WM from within an ssh session, btw. CORRECTION: no, that works from a tty. Trying to restart from an ssh session is a different matter and involves it's own set of issues. Since I'm focusing in this post on automating this from the host machine (as a cron job), I'm going to set aside the issue of possibly restarting the WM from within an ssh session. |
I appreciate the effort you put into this, and the expertise required to accomplish it.
However, I question the usefulness of the chosen "solution". You say: Quote:
troubleshooting steps:
|
Yeah, I know some further troubleshooting of the high CPU usage issue is needed, ondoho. I did a little of that a few weeks ago but didn't get too far. I've done a little more now and have a potential candidate process other than Xorg. But while I continue my efforts I'm hoping to ensure that the machine, while unattended, doesn't get into a state where it's difficult or impossible to use upon my return--thus the rationale for the script I've tried to create. Besides, troubleshooting the issue, if it comes down to soliciting help here, really belongs in its own thread (look for one later, should my current troubleshooting attempts meet with failure).
Meantime, my script for automating the restart of the WM, for reasons that are not yet clear to me, is so far not working. So I've come up with another related script that should be helpful to my troubleshooting efforts, as follows: Code:
#!/bin/bash Likely more to come later. |
i prefer to troubleshoot the simplest things first, even if they're less likely to be the cause of the problem.
try another window manager. |
The script I created to trigger e-mail notifications when a certain 15-minute CPU load threshold is reached seems to be working great so far. As to troubleshooting strategy and starting with simpler things, a lengthy engagement with computer problems and determining their causes has definitely led me to appreciate that approach and it is one I typically use. In this case it is less applicable since I'm running a WM custom configured to be usable by my wife, and if I switch to some other it may be a barrier to her using the computer. So I'm trying to avoid switching WMs.
When the high load average occurred today, I managed once again to bring loads back down to normal by killing a particular process that runs under Xorg but is neither Xorg itself nor the WM. So it seems I am zeroing in on the true culprit. So perhaps I will wind up modifying my script so that it will kill and then restart that application when high average CPU loads occur, rather than sending me an e-mail notification. Or perhaps both. |
If you look at the 15 minutes load curve then you'll see that there is quick rise and slow fall.
Better take the minimum of the 5 minutes and the 15 minutes values; the resulting curve becomes more symmetric, i.e. it takes longer to trigger an alert and shorter to cancel it. |
Quote:
Quote:
Could help to propose real solutions. |
Puppy Linux uses JWM as its default WM. I occasionally get this same problem; re-starting 'X' always seems to 'cure' it, but for me the problem is invariably the same one.
It's not Xorg, or the WM. I'm a long-term Chrome user, and recent versions don't always kill the Code:
--nacl-helper Mike. ;) |
Quote:
also: closed source, big G, grumble grumble. |
All times are GMT -5. The time now is 09:03 PM. |