LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   Monitoring uptime (https://www.linuxquestions.org/questions/slackware-14/monitoring-uptime-4175661220/)

upnort 09-20-2019 02:29 PM

Monitoring uptime
 
Any uptimed users here?

Monitoring uptime used to be a badge of honor and a fun way to measure reliability and availability. These days with frequent reboots for security patching, that is now somewhat out of style.

Ok, go to the next level. How many nines of uptime? Most folks won't know without some kind of help.

uptimed seems suited for monitoring how many nines of availability.

Just looking for conversation.

Thanks again. :)

frankbell 09-20-2019 05:47 PM

I use GKrellM and it has a built-in uptime display.

Per the man page, I believe that it monitors /proc/uptime. I think the uptime command also monitors /cat/proc.

upnort 09-20-2019 08:32 PM

My bad for not explaining better. I don't want to monitor or display uptime. I am thinking about uptime history because a reboot restarts the uptime counter.

A cron job run every minute to store data and a script to do the math could suffice.

If on average a server is rebooted once every two weeks, and the reboot takes about 4 minutes, that is about 99.98% uptime. Might be nice to display that history.

0XBF 09-20-2019 08:53 PM

What if you put something like "uptime >> /some_directory/uptime.log" in rc.local_shutdown? Then as long as you shutdown/reboot cleanly you will have a log of all your uptimes. You could also timestamp it and then make a script to parse the log and do the math to average it all out for your percentage stats.

Just a thought

glorsplitz 09-20-2019 09:02 PM

my slackware stable uptime is always the length of time between kernel updates

no need to monitor anything, it's all in the changelog

frankbell 09-20-2019 09:17 PM

Perhaps you could write a script to log the uptime to a file at shutdown.

This article tells how to do it with SysVinit and SystemD: https://opensource.com/life/16/11/ru...shutdown-linux

Here's a (very old) LQ thread on the topic: https://www.linuxquestions.org/quest...utdown-323412/

upnort 09-20-2019 09:49 PM

Thanks for the replies. The original question was whether anybody is using uptimed. :)

That said, a script at shutdown/reboot is okay most of the time but won't help with inadvertent shutdowns. A cron job run every minute or two would be better. The last command contains boot times and might suffice. Either way a script is needed to do the math. How to display the data is another question.

Yeah, I can do that. :) I started thinking about the idea. I love shell scripting but I looked online to see if anybody had already invented the same wheel. I found uptimed and hence the original question.

BTW, late this afternoon at work I installed uptimed on some test systems. Looks like the tool does not understand containers because I got the same results as the host system. I need to look into that after the weekend. :)

0XBF 09-21-2019 12:16 AM

Quote:

Originally Posted by upnort (Post 6038864)
Thanks for the replies. The original question was whether anybody is using uptimed. :)

That said, a script at shutdown/reboot is okay most of the time but won't help with inadvertent shutdowns. A cron job run every minute or two would be better. The last command contains boot times and might suffice. Either way a script is needed to do the math. How to display the data is another question.

Yeah, I can do that. :) I started thinking about the idea. I love shell scripting but I looked online to see if anybody had already invented the same wheel. I found uptimed and hence the original question.

I hope you don't mind but I took a crack at it and this script seems to do the job.

Disclaimer: I've only tested it for a short while since I've only hacked this out over the last hour. Please make suggestions or edits if you please.

Code:

#!/bin/bash
#
#
# %%%%%%%%%%%%%%%%%%%% uptime_report.sh %%%%%%%%%%%%%%%%%%%%%%%%%
#
#
# Script to generate stats about uptime
#
# Place this script in a folder to keep uptime logs (this is <path>)
# Set up the following as root to utilize uptime logging:
# "/<path>/uptime_report.sh shutdown_log" in /etc/rc.d/rc.local_shutdown
# "/<path>/uptime_report.sh check_log" in /etc/rc.d/rc.local
# "* * * * * /<path>/uptime_report.sh temp_log" in crontab -e
# Edit <path> in below two lines to match your uptime log path
#
# Note: above crontab is every minute. Adjust as needed
#
# Executing this script without parameters will generate the
# report if valid files are found
#

# Location of logfiles
UP_LOG="/<path>/_uptime.log"
TMP_LOG="/<path>/.tmplog"

# Generate log report at shutdown and remove temp log after
# Used by rc.local_shutdown
shutdown_log () {
    (echo "LOG@$(date +%F\ %T)"
    echo "UP@$(uptime -s)") >> $UP_LOG
    if [ -f "$TMP_LOG" ]; then
        rm $TMP_LOG
    fi
}

# Generate a temp log
# Used by crontab, once per minute
temp_log () {
    (echo "LOG@$(date +%F\ %T)"
    echo "UP@$(uptime -s)") > $TMP_LOG
}

# Check if there's a temp log at startup.
# If present then shutdown was improper so grab temp log time
# Used by rc.local
check_log () {
    if [ -f $TMP_LOG ]; then
        cat $TMP_LOG | sed 's\LOG\TMP\g' >> $UP_LOG
        rm $TMP_LOG
    fi
}


# Function to fomart time from unixtime
format_time() {
    echo "$(($1 / 86400)) Days, $(date -u -d @$1 +'%H Hours, %M Minutes, %S Seconds')"
}

# Function to subtract time
sub_time() {
    echo $(( $(date -d "$1" +%s) - $(date -d "$2" +%s) ))
}

# Function to read log file and generate report
report_data() {
    if ! [ -f "$UP_LOG" ]; then
        echo "Error: No uptime log <$UP_LOG> found. Please set up logging and/or set UP_LOG variable."
        exit
    fi

    # Initialize variables to add up uptime and read through logfile
    TOTAL_UPTIME=0
    LINE_CNT=0
    LONGEST_UPTIME=0
    SHORTEST_UPTIME=$(date +%s)
    BAD_POWERDN=0

    # Read through log file.
    # Entry date and times are on odd lines
    # uptimes are on following even lines
    while IFS= read -r LINE
    do
        ((LINE_CNT++))
        LOG_TYPE="$(echo $LINE | cut -f1 -d '@')"
        if (( $LINE_CNT % 2 )); then
            case "$LOG_TYPE" in
                'LOG')
                    LOG_TIME="$(echo $LINE | cut -f2 -d '@')"
                    ;;
                'TMP')
                    LOG_TIME="$(echo $LINE | cut -f2 -d '@')"
                    ((BAD_POWERDN++))
                    ;;
                *)
                    echo "Error: Missing or unproper formatted log."
                    exit
                    ;;
            esac
        else
            if [ "$LOG_TYPE" = "UP" ]; then
                UP_TIME="$(echo $LINE | cut -f2 -d '@')"
            else
                echo "Error: Missing or unproper formatted log."
                exit
            fi

            # Get logged uptime increment
            UPTIME_DELTA=$(sub_time "$LOG_TIME" "$UP_TIME")
           
            # Find longest and shortest uptimes
            if [ $UPTIME_DELTA -gt $LONGEST_UPTIME ]; then
                LONGEST_UPTIME=$UPTIME_DELTA
            fi
            if [ $UPTIME_DELTA -lt $SHORTEST_UPTIME ]; then
                SHORTEST_UPTIME=$UPTIME_DELTA
            fi
            # Collect total uptimes
            TOTAL_UPTIME=$(($TOTAL_UPTIME + $UPTIME_DELTA))
        fi
    done < "$UP_LOG"

    # Include the uptime of current session
    CURRENT_UPTIME=$(sub_time "$(date +%F\ %T)" "$(uptime -s)")
    TOTAL_UPTIME=$(($TOTAL_UPTIME + $CURRENT_UPTIME))

    # Read first log entry date and time
    FIRST_LOG=$(head -n 1 $UP_LOG | cut -f2 -d '@')

    # Find uptime for first log entry
    FIRST_LOG_UPTIME=$(sub_time "$FIRST_LOG" "$(sed '2q;d' $UP_LOG | cut -f2 -d '@')")

    # Add first log uptime to total logged time
    # Required because log may be initiated on a system that was already "up" for a while
    TOTAL_LOG_TIME=$(($FIRST_LOG_UPTIME + $(sub_time "$(date +%F\ %T)" "$FIRST_LOG")))

    # Find total downtime
    TOTAL_DOWNTIME=$(($TOTAL_LOG_TIME - $TOTAL_UPTIME))

    # Find average uptime
    AVG_UPTIME=$(($TOTAL_UPTIME / $(($LINE_CNT / 2)) ))

    # Format the report
    echo "Current uptime of $(format_time $CURRENT_UPTIME)"
    echo ""
    echo "Total uptime of $(format_time $TOTAL_UPTIME)"
    echo "Total downtime of $(format_time $TOTAL_DOWNTIME)"
    echo "in the past $(format_time $TOTAL_LOG_TIME)"
    echo ""
    echo "Records:"
    echo "Longest uptime period: $(format_time $LONGEST_UPTIME)"
    echo "Shortest uptime period: $(format_time $SHORTEST_UPTIME)"
    echo "Average uptime period: $(format_time $AVG_UPTIME)"
    echo "Percentage uptime: $(bc<<<"scale=2; $TOTAL_UPTIME*100/$TOTAL_LOG_TIME")%"
    echo ""
    echo "$BAD_POWERDN improper shutdown(s)."
    echo "$(($LINE_CNT / 2)) log entries since $FIRST_LOG"

}

# Main routine:
if ! [ -z "$1" ]; then
    case "$1" in
        "shutdown_log")
            shutdown_log
            exit
            ;;
        "temp_log")
            temp_log
            exit
            ;;
        "check_log")
            check_log
            ;;
        *)
            echo "Error: Improper call. Did you mean to run ./uptime_report.sh?"
            ;;
    esac
else
    report_data
    exit
fi

Output looks like:

Code:

Current uptime of 0 Days, 00 Hours, 15 Minutes, 53 Seconds

Total uptime of 0 Days, 05 Hours, 28 Minutes, 32 Seconds
Total downtime of 0 Days, 01 Hours, 04 Minutes, 47 Seconds
in the past 0 Days, 06 Hours, 33 Minutes, 19 Seconds

Records:
Longest uptime period: 0 Days, 03 Hours, 41 Minutes, 30 Seconds
Shortest uptime period: 0 Days, 00 Hours, 00 Minutes, 18 Seconds
Average uptime period: 0 Days, 00 Hours, 29 Minutes, 52 Seconds
Percentage uptime: 83.52%

3 improper shutdown(s).
11 log entries since 2019-09-28 17:12:01

Edit: Also that percentage is going to hit 99.99% pretty quick, add more precision if ya want it. It would be pretty simple to work out total down time in the as well.

Edit 2: I cleaned up the script a bit and added more statistics

Edit 3: Added cron job routines, streamlined script and better documented

upnort 09-21-2019 06:12 PM

Quote:

I hope you don't mind but I took a crack at it and this script seems to do the job.
Nope, don't mind. :)

I'm not yet committed to uptimed. Just curious if others use the tool. As uptimed might not support containers, a home-grown solution might be preferred.

Quote:

I guess you could also set up a cron to run those logging commands too. The logfile will grow faster though.
One could probably make a more sophisticated way of trimming the log record to just keep a running total in there.
A cron job would help avoid history log errors during inadvertent shutdowns/reboots.

The last command uses /var/log/wtmp. The log stores system boot start times and uptime, which could be used to create a cumulative history. I tried a quick test in a VM and the last command results don't nicely handle inadvertent shutdowns/reboots. The last command probably is not a reliable candidate for creating an uptime history.

0XBF 09-22-2019 10:40 AM

I edited the script in my previous post to add some more stats like shortest, longest, average, and total downtime. See the edited post for more details. The only reason for the crappy stats there is because I rebooted a couple times to test functionality :)

The issue with the cron job will be that you wont get clean stats on individual uptime sessions because it will be logging in intervals determined by the cron's frequency, not actual lengths of full sessions. This wouldn't be an issue if all you're interested in is total uptime and percentage but you wont be able to easily determine the other stats I've mentioned, and you also still have a chance of unclean shutdown between cron jobs. That would be the point where I'd break off into a more of an actual program to keep track of everything and I guess "uptimed" is an option where someone has done this.

Also, I just like practicing my scripting and saw no need to make things more complex with writing/compiling programs in C or some other language. :)

I guess you could use the output of "last" like you mentioned but I think the script would get ugly with cleaning up and parsing that info to get what you want.

Mark Pettit 09-22-2019 11:59 AM

There's a very nice program that is most suitable here, called downtimed. Instead of measuring uptime, it measure downtime. Which if you thing about it, it possibly more important. https://github.com/snabb/downtimed

upnort 09-22-2019 01:43 PM

Quote:

Instead of measuring uptime, it measure downtime.
As Mr. Spock would say, "Fascinating."

ttk 09-23-2019 03:40 PM

I didn't know about uptimed until reading about it here.

Traditionally I've used Nagios to track/monitor system uptimes, etc, but that's a heavy solution compared to uptimed, which might fill the niche for systems which don't warrant Nagios monitoring.

upnort 09-23-2019 07:05 PM

Quote:

which might fill the niche for systems which don't warrant Nagios monitoring.
Seems like a decent choice for bare metal but not containers. In a container, while the uptime command reports correctly, uptimed reports the uptime of the host.

0XBF 09-28-2019 08:13 PM

This thread is getting a little old but I had time today and I updated the bash script in post #8 again to include catching improper shutdowns with a cronjob. It'll require setting lines in rc.local_shutdown, rc.local, and crontab to function. See the script header for details.

The output report has also expanded a bit. Edit out whatever you dont want if you use it, or don't use it, it was fun to play around with scripting again.


All times are GMT -5. The time now is 06:30 PM.