LinuxQuestions.org - Monitoring uptime

- Slackware (https://www.linuxquestions.org/questions/slackware-14/)

- - Monitoring uptime (https://www.linuxquestions.org/questions/slackware-14/monitoring-uptime-4175661220/)

Monitoring uptime

Any uptimed users here?

Monitoring uptime used to be a badge of honor and a fun way to measure reliability and availability. These days with frequent reboots for security patching, that is now somewhat out of style.

Ok, go to the next level. How many nines of uptime? Most folks won't know without some kind of help.

uptimed seems suited for monitoring how many nines of availability.

Just looking for conversation.

Thanks again. :)

I use GKrellM and it has a built-in uptime display.

Per the man page, I believe that it monitors /proc/uptime. I think the uptime command also monitors /cat/proc.

My bad for not explaining better. I don't want to monitor or display uptime. I am thinking about uptime history because a reboot restarts the uptime counter.

A cron job run every minute to store data and a script to do the math could suffice.

If on average a server is rebooted once every two weeks, and the reboot takes about 4 minutes, that is about 99.98% uptime. Might be nice to display that history.

What if you put something like "uptime >> /some_directory/uptime.log" in rc.local_shutdown? Then as long as you shutdown/reboot cleanly you will have a log of all your uptimes. You could also timestamp it and then make a script to parse the log and do the math to average it all out for your percentage stats.

Just a thought

my slackware stable uptime is always the length of time between kernel updates

no need to monitor anything, it's all in the changelog

Perhaps you could write a script to log the uptime to a file at shutdown.

This article tells how to do it with SysVinit and SystemD: https://opensource.com/life/16/11/ru...shutdown-linux

Here's a (very old) LQ thread on the topic: https://www.linuxquestions.org/quest...utdown-323412/

Thanks for the replies. The original question was whether anybody is using uptimed. :)

That said, a script at shutdown/reboot is okay most of the time but won't help with inadvertent shutdowns. A cron job run every minute or two would be better. The last command contains boot times and might suffice. Either way a script is needed to do the math. How to display the data is another question.

Yeah, I can do that. :) I started thinking about the idea. I love shell scripting but I looked online to see if anybody had already invented the same wheel. I found uptimed and hence the original question.

BTW, late this afternoon at work I installed uptimed on some test systems. Looks like the tool does not understand containers because I got the same results as the host system. I need to look into that after the weekend. :)

Quote:

Originally Posted by upnort (Post 6038864)

I hope you don't mind but I took a crack at it and this script seems to do the job.

Disclaimer: I've only tested it for a short while since I've only hacked this out over the last hour. Please make suggestions or edits if you please.

Code:

#!/bin/bash

#

#

# %%%%%%%%%%%%%%%%%%%% uptime_report.sh %%%%%%%%%%%%%%%%%%%%%%%%%

#

#

# Script to generate stats about uptime

#

# Place this script in a folder to keep uptime logs (this is <path>)

# Set up the following as root to utilize uptime logging:

# "/<path>/uptime_report.sh shutdown_log" in /etc/rc.d/rc.local_shutdown

# "/<path>/uptime_report.sh check_log" in /etc/rc.d/rc.local

# "* * * * * /<path>/uptime_report.sh temp_log" in crontab -e

# Edit <path> in below two lines to match your uptime log path

#

# Note: above crontab is every minute. Adjust as needed

#

# Executing this script without parameters will generate the

# report if valid files are found

#



# Location of logfiles

UP_LOG="/<path>/_uptime.log"

TMP_LOG="/<path>/.tmplog"



# Generate log report at shutdown and remove temp log after

# Used by rc.local_shutdown

shutdown_log () {

    (echo "LOG@$(date +%F\ %T)"

    echo "UP@$(uptime -s)") >> $UP_LOG

    if [ -f "$TMP_LOG" ]; then

        rm $TMP_LOG

    fi

}



# Generate a temp log

# Used by crontab, once per minute

temp_log () {

    (echo "LOG@$(date +%F\ %T)"

    echo "UP@$(uptime -s)") > $TMP_LOG

}



# Check if there's a temp log at startup.

# If present then shutdown was improper so grab temp log time

# Used by rc.local

check_log () {

    if [ -f $TMP_LOG ]; then

        cat $TMP_LOG | sed 's\LOG\TMP\g' >> $UP_LOG

        rm $TMP_LOG

    fi

}





# Function to fomart time from unixtime

format_time() {

    echo "$(($1 / 86400)) Days, $(date -u -d @$1 +'%H Hours, %M Minutes, %S Seconds')"

}



# Function to subtract time

sub_time() {

    echo $(( $(date -d "$1" +%s) - $(date -d "$2" +%s) ))

}



# Function to read log file and generate report

report_data() {

    if ! [ -f "$UP_LOG" ]; then

        echo "Error: No uptime log <$UP_LOG> found. Please set up logging and/or set UP_LOG variable."

        exit

    fi



    # Initialize variables to add up uptime and read through logfile

    TOTAL_UPTIME=0

    LINE_CNT=0

    LONGEST_UPTIME=0

    SHORTEST_UPTIME=$(date +%s)

    BAD_POWERDN=0



    # Read through log file. 

    # Entry date and times are on odd lines

    # uptimes are on following even lines

    while IFS= read -r LINE

    do

        ((LINE_CNT++))

        LOG_TYPE="$(echo $LINE | cut -f1 -d '@')"

        if (( $LINE_CNT % 2 )); then

            case "$LOG_TYPE" in

                'LOG')

                    LOG_TIME="$(echo $LINE | cut -f2 -d '@')"

                    ;;

                'TMP')

                    LOG_TIME="$(echo $LINE | cut -f2 -d '@')"

                    ((BAD_POWERDN++))

                    ;;

                *)

                    echo "Error: Missing or unproper formatted log."

                    exit

                    ;;

            esac

        else

            if [ "$LOG_TYPE" = "UP" ]; then

                UP_TIME="$(echo $LINE | cut -f2 -d '@')"

            else

                echo "Error: Missing or unproper formatted log."

                exit

            fi



            # Get logged uptime increment

            UPTIME_DELTA=$(sub_time "$LOG_TIME" "$UP_TIME")

            

            # Find longest and shortest uptimes

            if [ $UPTIME_DELTA -gt $LONGEST_UPTIME ]; then

                LONGEST_UPTIME=$UPTIME_DELTA

            fi

            if [ $UPTIME_DELTA -lt $SHORTEST_UPTIME ]; then

                SHORTEST_UPTIME=$UPTIME_DELTA

            fi

            # Collect total uptimes

            TOTAL_UPTIME=$(($TOTAL_UPTIME + $UPTIME_DELTA))

        fi

    done < "$UP_LOG"



    # Include the uptime of current session

    CURRENT_UPTIME=$(sub_time "$(date +%F\ %T)" "$(uptime -s)")

    TOTAL_UPTIME=$(($TOTAL_UPTIME + $CURRENT_UPTIME))



    # Read first log entry date and time

    FIRST_LOG=$(head -n 1 $UP_LOG | cut -f2 -d '@')



    # Find uptime for first log entry

    FIRST_LOG_UPTIME=$(sub_time "$FIRST_LOG" "$(sed '2q;d' $UP_LOG | cut -f2 -d '@')")



    # Add first log uptime to total logged time

    # Required because log may be initiated on a system that was already "up" for a while

    TOTAL_LOG_TIME=$(($FIRST_LOG_UPTIME + $(sub_time "$(date +%F\ %T)" "$FIRST_LOG")))



    # Find total downtime

    TOTAL_DOWNTIME=$(($TOTAL_LOG_TIME - $TOTAL_UPTIME))



    # Find average uptime

    AVG_UPTIME=$(($TOTAL_UPTIME / $(($LINE_CNT / 2)) ))



    # Format the report

    echo "Current uptime of $(format_time $CURRENT_UPTIME)"

    echo ""

    echo "Total uptime of $(format_time $TOTAL_UPTIME)"

    echo "Total downtime of $(format_time $TOTAL_DOWNTIME)"

    echo "in the past $(format_time $TOTAL_LOG_TIME)"

    echo ""

    echo "Records:"

    echo "Longest uptime period: $(format_time $LONGEST_UPTIME)"

    echo "Shortest uptime period: $(format_time $SHORTEST_UPTIME)"

    echo "Average uptime period: $(format_time $AVG_UPTIME)"

    echo "Percentage uptime: $(bc<<<"scale=2; $TOTAL_UPTIME*100/$TOTAL_LOG_TIME")%"

    echo ""

    echo "$BAD_POWERDN improper shutdown(s)."

    echo "$(($LINE_CNT / 2)) log entries since $FIRST_LOG"



}



# Main routine:

if ! [ -z "$1" ]; then

    case "$1" in

        "shutdown_log")

            shutdown_log

            exit

            ;;

        "temp_log")

            temp_log

            exit

            ;;

        "check_log")

            check_log

            ;;

        *)

            echo "Error: Improper call. Did you mean to run ./uptime_report.sh?"

            ;;

    esac

else

    report_data

    exit

fi

Output looks like:

Code:

Current uptime of 0 Days, 00 Hours, 15 Minutes, 53 Seconds



Total uptime of 0 Days, 05 Hours, 28 Minutes, 32 Seconds

Total downtime of 0 Days, 01 Hours, 04 Minutes, 47 Seconds

in the past 0 Days, 06 Hours, 33 Minutes, 19 Seconds



Records:

Longest uptime period: 0 Days, 03 Hours, 41 Minutes, 30 Seconds

Shortest uptime period: 0 Days, 00 Hours, 00 Minutes, 18 Seconds

Average uptime period: 0 Days, 00 Hours, 29 Minutes, 52 Seconds

Percentage uptime: 83.52%



3 improper shutdown(s).

11 log entries since 2019-09-28 17:12:01

Edit: Also that percentage is going to hit 99.99% pretty quick, add more precision if ya want it. It would be pretty simple to work out total down time in the as well.

Edit 2: I cleaned up the script a bit and added more statistics

Edit 3: Added cron job routines, streamlined script and better documented

Quote:

I hope you don't mind but I took a crack at it and this script seems to do the job.

Nope, don't mind. :)

I'm not yet committed to uptimed. Just curious if others use the tool. As uptimed might not support containers, a home-grown solution might be preferred.

Quote:

I guess you could also set up a cron to run those logging commands too. The logfile will grow faster though.
One could probably make a more sophisticated way of trimming the log record to just keep a running total in there.

A cron job would help avoid history log errors during inadvertent shutdowns/reboots.

The last command uses /var/log/wtmp. The log stores system boot start times and uptime, which could be used to create a cumulative history. I tried a quick test in a VM and the last command results don't nicely handle inadvertent shutdowns/reboots. The last command probably is not a reliable candidate for creating an uptime history.

I edited the script in my previous post to add some more stats like shortest, longest, average, and total downtime. See the edited post for more details. The only reason for the crappy stats there is because I rebooted a couple times to test functionality :)

The issue with the cron job will be that you wont get clean stats on individual uptime sessions because it will be logging in intervals determined by the cron's frequency, not actual lengths of full sessions. This wouldn't be an issue if all you're interested in is total uptime and percentage but you wont be able to easily determine the other stats I've mentioned, and you also still have a chance of unclean shutdown between cron jobs. That would be the point where I'd break off into a more of an actual program to keep track of everything and I guess "uptimed" is an option where someone has done this.

Also, I just like practicing my scripting and saw no need to make things more complex with writing/compiling programs in C or some other language. :)

I guess you could use the output of "last" like you mentioned but I think the script would get ugly with cleaning up and parsing that info to get what you want.

There's a very nice program that is most suitable here, called downtimed. Instead of measuring uptime, it measure downtime. Which if you thing about it, it possibly more important. https://github.com/snabb/downtimed

Quote:

Instead of measuring uptime, it measure downtime.

As Mr. Spock would say, "Fascinating."

I didn't know about uptimed until reading about it here.

Traditionally I've used Nagios to track/monitor system uptimes, etc, but that's a heavy solution compared to uptimed, which might fill the niche for systems which don't warrant Nagios monitoring.

Quote:

which might fill the niche for systems which don't warrant Nagios monitoring.

Seems like a decent choice for bare metal but not containers. In a container, while the uptime command reports correctly, uptimed reports the uptime of the host.

This thread is getting a little old but I had time today and I updated the bash script in post #8 again to include catching improper shutdowns with a cronjob. It'll require setting lines in rc.local_shutdown, rc.local, and crontab to function. See the script header for details.

The output report has also expanded a bit. Edit out whatever you dont want if you use it, or don't use it, it was fun to play around with scripting again.