Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Zabbix warns us several times during the night that there are an average of more than 5 cpu queued processes on 2 cpus storage. The trigger is this one below:
{<server>:system.cpu.load[percpu,avg1].last(0)}>5
I would like to save to a file the list of all queued processes during the alerts period. So that I could manage them, how could I do that?
The ps command in your manual says that there is the state:
At any given instant, you can't know (in advance) the difference between running or runnable - hence the single metric.
The command from the fu site looks ok; note the comments re state "D" - likely to be your concern. These type of alerts are largely misleading IMHO.
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,800
Rep:
Quote:
Originally Posted by cesarsj
Zabbix warns us several times during the night that there are an average of more than 5 cpu queued processes on 2 cpus storage. The trigger is this one below:
I would like to save to a file the list of all queued processes during the alerts period. So that I could manage them, how could I do that?
Just about any process could be waiting for CPU time.
Frankly, a queue of five waiting jobs doesn't sound so bad---it surely doesn't seem like the system's being Slashdotted or anything like that. At one site, we used to see alerts like this all the time in Nagios. Users always wanted to schedule all of their jobs to run at midnight because a.) they were processing the previous day's transactions so running them at midnight was necessary (though running them some time after midnight never occurred to them) and b.) they assumed they'd be the only ones using the system at night, conveniently forgetting about the database loads that ran all night, the system backups, etc., etc. As a result we had four beefy CPUs that were, towards the end of the month, saturated with a system load often over 30 for several hours at a time. Even during the day, if response time was slower than normal, some users of the middle part of the three-tier application would resubmit insanely complex ad hoc database queries thinking that because it didn't return results immediately, it must not have "taken" (you know, like a failed vaccination)---now two of them are running. It was a mess until we educated the user community about the ways to run their jobs sequentially rather than in parallel, spread the job start times, and, best of all, let the people whose job it was to schedule jobs within the job scheduler do their job. Fortunately, we weren't getting calls in the wee hours about the load though I got more than one call during the day about why sendmail wasn't emailing job results during the periods of high load.
({TRIGGER.VALUE}=0 and {<server>:system.cpu.load[percpu,avg1].last(0)}>5) or ({TRIGGER.VALUE}=1 and {<server>:system.cpu.load[percpu,avg1].min(10m)}>5)
What this trigger does is, it will be a problem if the last collected value is less than 5, and it will be recovery if one of the last collected values is less than 5.
What I would like is that it would be recovery if ALL values collected in the last 10 minutes were greater than 5. How could I adjust the trigger for this case?
I think I could understand the min and max functions, and I think the expression below will be better!
({TRIGGER.VALUE} = 0 and {<server>:system.cpu.load[percpu,avg1[.last(0)}>5) or ({TRIGGER.VALUE} = 1 and {<server>:system.cpu.load[percpu,avg1[.max(10m)}>5)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.