LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 08-05-2022, 07:47 PM   #1
sixmuga
LQ Newbie
 
Registered: Aug 2022
Posts: 1

Rep: Reputation: 0
Advanced monitoring for load average


I am using nagios for monitoring load average and getting alerts. But it triggers incidents when a job runs and that is an expected behavior. So I am looking for a better nagios plugin or equivalent to get alerts only if system has real issue by considering the diskIO or other metrics. Any suggestion ?
 
Old 08-06-2022, 04:05 AM   #2
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,804

Rep: Reputation: 1203Reputation: 1203Reputation: 1203Reputation: 1203Reputation: 1203Reputation: 1203Reputation: 1203Reputation: 1203Reputation: 1203
You can try my Nagios check, that I used some years ago on Linux and Unix.
Compared to the default check it starts an alert later and ends it sooner.

Code:
#!/bin/sh
#
# check_load5 - Nagios plugin, measures load with uptime

set -f
PATH=/bin:/usr/bin:/usr/sbin:/sbin:/usr/ucb
export PATH

print_usage() {
  echo "Usage: $0 [ -r ] [ -w WLOAD ] [ -c CLOAD ]"
}

checkfloat(){
  case $1 in
  *[!0-9.]*|"") echo "$0: '$1' is not a floatingpoint"; print_usage; exit 3;;
  esac
}

# default thresholds
warn=1.1
crit=2.2

#get args
while getopts "w:c:rh" OPT
do
  case $OPT in
  w) warn=$OPTARG;;
  c) crit=$OPTARG;;
  r)
    if [ -f /proc/cpuinfo ]; then
      ecores=`
awk '/^processor/ {lcpu++} /^core.id/ && s[$NF]++==0 {core++} /^physical id/ && t[$NF]++==0 {phys++} END {c=phys*core; print c?c*(lcpu/c)^0.25:lcpu+0}' /proc/cpuinfo
`
    elif [ -x /usr/sbin/psrinfo ]; then
      ecores=`
kstat -m cpu_info |
nawk '$2=="on-line" {lcpu++} ($1=="core_id" && s[$2]++==0) {c1++} ($1=="pg_id" && t[$2]++==0) {c2++} END {c=(c2==0||(c1&&c1<=c2))?c1:c2; print c?c*(lcpu/c)^0.25:lcpu+0}'
`
    elif [ -x /usr/bin/lparstat ]; then
      lcpu=`lparstat | sed -n 's/.*lcpu=\([0-9]*\).*/\1/p'`
      ecores=`lsdev -c processor | awk 'END {print NR?NR*(lcpu/NR)^0.25:lcpu+0}' lcpu="$lcpu"`
    else
      ecores=1
    fi
  ;;
  h)
    echo "This plugin tests the current system load average.
"
    print_usage
    echo "
Options:
 -w WLOAD
    Exit with WARNING status if all load average exceeds WLOAD
 -c CLOAD
    Exit with CRITICAL status if all load average exceeds CLOAD
    the load average format is the same used by 'uptime' and 'w'
 -r
    Relative load, per CPU. Divide by the number of effective CPUs"
    exit 3
  ;;
  *) print_usage; exit 3;;
  esac
done

if [ $OPTIND -le $# ]; then
  print_usage; exit 3
fi

case $ecores in
""|0)
if [ -x /usr/sbin/ioscan ]; then
  ecores=`/usr/sbin/ioscan -k -C processor | awk '$1~/^[0-9]/ {c++} END {print c?1/c:1}'`
else
  ecores=1
fi
;;
esac

checkfloat "$warn"
checkfloat "$crit"

uptime | tr -d ',' | awk '
{
 v1=$(NF-2)/cores; v2=$(NF-1)/cores; v3=$NF/cores
 msg=sprintf ("- load average: %4.2f, %4.2f, %4.2f",v1,v2,v3)
 graph=sprintf ("|load5=%5.3f;%5.3f;%5.3f;0;",v2,w,c)
}
(v1>c && v2>c && v3>c) { print "CRITICAL",msg graph; exit 2 }
(v1>w && v2>w && v3>w) { print "WARNING",msg graph; exit 1 }
{ print "OK",msg graph; exit 0 }
' w="$warn" c="$crit" cores=$ecores
 
1 members found this post helpful.
Old 08-06-2022, 04:38 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,130

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Neat.

loadavg is a terrible metric, and is universally (except by MadeInGermany apparently ... ) misunderstood. I don't use nagios, but it appears the defaults are for a single CPU - seriously, in this day-and-age ?.
There have to be plugins that do similar to above.
 
Old 08-06-2022, 08:03 AM   #4
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,804

Rep: Reputation: 1203Reputation: 1203Reputation: 1203Reputation: 1203Reputation: 1203Reputation: 1203Reputation: 1203Reputation: 1203Reputation: 1203
AFAIR I ran it with -r that divides by the #CPUs.
And NRPE for remote execution
https://exchange.nagios.org/director...ecutor/details
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] tesseract-4 (pdfsandwich) and high load average/CPU load kaz2100 Linux - Software 2 08-13-2018 09:02 PM
Better buying "advanced linux prog" or "unix advanced prog" Dominik Programming 3 12-31-2003 01:11 PM
Load average calculation ? mikeshn Linux - General 4 08-27-2003 02:53 PM
Load average wrong in RH with cus kernel ckeeper Linux - General 2 06-09-2003 03:52 PM
Average load Cyth Linux - General 1 01-22-2002 03:33 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 03:53 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration