LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (http://www.linuxquestions.org/questions/slackware-14/)
-   -   Problem with acpupsd (http://www.linuxquestions.org/questions/slackware-14/problem-with-acpupsd-4175461647/)

tronayne 05-12-2013 11:14 AM

Problem with acpupsd
 
Just went through the first log rotate; had a power outage which looked OK. Here's the rotated log
Code:

cat apcupsd.events.1
2013-05-07 15:25:49 -0400  apcupsd 3.14.10 (13 September 2011) slackware startup succeeded
2013-05-11 15:44:38 -0400  Power failure.
2013-05-11 15:44:40 -0400  Power is back. UPS running on mains.
2013-05-12 04:40:01 -0400  apcupsd exiting, signal 15
2013-05-12 04:40:01 -0400  apcupsd shutdown succeeded

I believe the shutdown was the restart direction in the log rotate:
Code:

cat /etc/logrotate.d/apcupsd
/var/log/apcupsd.events {
  rotate 4
  weekly
  notifempty
  postrotate
    /etc/rc.d/rc.apcupsd restart 1>/dev/null || true
  endscript
}

However, bad pizza:
Code:

cat apcupsd.events
2013-05-12 04:40:01 -0400  Valid lock file for pid=3009, but not ours pid=818
2013-05-12 04:40:01 -0400  apcupsd FATAL ERROR in apcupsd.c at line 285
Failed to acquire device lock file
2013-05-12 04:40:01 -0400  apcupsd error shutdown completed
2013-05-12 11:59:33 -0400  apcupsd 3.14.10 (13 September 2011) slackware startup succeeded

I restarted the daemon manually at 11:59:33.

So, I'm wondering if I need to add a line to the log rotate (or to /etc/rd.d/rc.apcupsd to blow away the existing lock file)? Or is there some configuration parameter I might have missed?

tux_dude 05-12-2013 11:45 AM

The issue is apcupsd take a while to shutdown, so another instance get started before the kill completes. Add "sleep" line the the rc script to fix the issue.

My rc script:
Code:

stop)
        echo -n "Stopping apcupsd power management.."
        if [ -f ${APCPID} ]; then
                THEPID=`cat ${APCPID}`
                kill ${THEPID} || return="  Failed."
                rm -f ${APCPID}

                # APC seem to enter race condition when restart by logrotate.
                # Sleep for a bit to allow application to shutdown.
                sleep 5
        else
                return="  Nothing to stop."
        fi
        rm -f /var/lock/subsys/apcupsd
        echo -e "$return"
    ;;

5 second is a bit excessive so you can play around with a lower times.

tronayne 05-12-2013 11:51 AM

OK, digging around (meaning: read /etc/rc.d/rc.apcupsd), I find that the start does
Code:

case "$1" in
    start)
        rm -f /etc/apcupsd/powerfail
        rm -f /etc/nologin
        echo -n "Starting apcupsd power management.."
        if [ -f ${APCPID} ]; then
                return="  Already running."
        else
                mkdir -p /var/lock/subsys
                /sbin/apcupsd && touch /var/lock/subsys/apcupsd \
                || return="  Failed."
        fi

        echo -e "$return"
    ;;

Hmm.

In /etc/apcupsd/apcupsd/.conf,
Code:

# LOCKFILE <path to lockfile>
#  Path for device lock file. Not used on Win32.
LOCKFILE /var/lock

which isn't quite the same, so I changed that to
Code:

# LOCKFILE <path to lockfile>
#  Path for device lock file. Not used on Win32.
LOCKFILE /var/lock/subsys

Start the daemon and
Code:

ls /var/lock/subsys
LCK..  apcupsd

... that LCK.. file seems to be the actual lock file (so why is the file apcupsd created? Is there some other configuration file that needs fiddling with?).

Should I edit /etc/rc.d/rc.apcupsd and add -P /var/lock/subsys/apcupsd? Don't really want to mess with the "standard" rc.apcupsd file but it doesn't seem to work quite right otherwise.

[EDIT]
No, -P /var/lock/subsys/apcupsd is the PID file in /var/run/apcupsd.pid, not the lock file, so that's no good. Arrggh!
[/EDIT]

Tiz a puzzlement.

tux_dude 05-12-2013 12:11 PM

I don't think changing the location of lockfile will solve the issue. You can test it by trying "restart" of apcupsd (that is pretty much what logrotate does). It has been a long, long while since I applied my fix so I can't recall how I found the problem. You can create a simple patch to modify the rc script if you ever need to rebuild apcupsd.

tronayne 05-12-2013 12:28 PM

Thanks for those -- just entering the directory path in apcupsd.conf seems to take care of the problem (I did start, stop, restart and all that works). It's a little odd that the file gets created then not used (the actual file is LCK..); it gets created in /var/lock/subsys along with the apcupsd file and both get blown away when the stop removes the directory tree.

Heck with it, it ain't broke now (but it was by default and I don't remember seeing any hints about that in the documentation).

All is well that ends.

rworkman 05-14-2013 05:39 PM

I'm not sure I understand the problem (or the solution, but then, I guess that logically follows) - I think I got lost in the followups. Can you summarize the problem, and more importantly, what I need to do in order to fix the build at SlackBuilds.org?

tux_dude 05-14-2013 08:31 PM

I'll try to summarize base on what I understand.

When apcupsd starts, it creates a LCK.. base on the path set in the config file (note, the LCK.. file is only created on apcupsd server which has direct access to the UPS device). When apcupsd is stopped, it removes the LCK.. file before shutting down. The issue with using the rc script to restart is the rc script will attempt to restart apcupsd before the previous instance shuts down completely and before the LCK.. file is removed. Thus, generating a lock file error because the PID does not match.

tronayne's solution is the set the lock file path in the config to /var/lock/subsys, which is where the empty "apcupsd" is created by the rc script. I have no idea how this solves the race issue, but it seems to work.

Quote:

Originally Posted by tronayne (Post 4949639)
... both get blown away when the stop removes the directory tree.

I also do no know what this means the becayuse subsys directory is not removed.

My solution is the add a sleep to allow apcupsd to shutdown normally and remove the LCK.. file.

Note that apcupsd file created by "touch /var/lock/subsys/apcupsd" in the rc script is not used and is not required.

tronayne 05-15-2013 06:19 AM

Hi, Robby,

tax_dude's discussion is pretty much on target with the exception of the race condition part (I think). Let's see if I can explain.

I built apcupsd from SlackBuilds, installed it, checked /etc/apcupsd/apcupsd.conf, looked OK, started it and it died. That was because
Code:

UPSTYPE usb
#DEVICE /dev/ttyS0
DEVICE

This UPS is USB-connected, so the /dev/ttyS0 needed to go. No big deal, crawl before you walk. And, right now, the UPS is controlling one server if that matters (later, maybe another).

Everything went along until Sunday morning when log rotation ran and /etc/rc.d/rc.apcupsd restart ran. As far as I can see, the problem was that the lock file, named LCK.., was not removed at shut down by /etc/rc.d/rc.apcupsd -- which creates /var/lock/subsys then touches apcupsd in that directory but the actual lock file, LCK.. just gets created in the defined ${LOCKFILE} location in /etc/apcupsd/apcupsd.conf (so I changed that to LOCKFILE /var/lock/subsys in the conf file).

The daemon is running right now and
Code:

tree /var/lock/subsys
/var/lock/subsys
|-- LCK..
`-- apcupsd

I think the intent of /var/lock/subsys/acpupsd is that it should be the lock file? I just edited the stop section of /etc/rc.d/rc.apcupsd to remove both files so the thing will restart without the problem -- a kludge on my part to be sure (and I've too lazy to go find the lock file creation code yet).

So, I hope that is clear.

rworkman 05-15-2013 01:44 PM

Does this new build fix everything?
http://slackware.com/~rworkman/apcupsd/

The relevant commit is here: http://slackbuilds.org/cgit/slackbui...889924b7b6cee2

tronayne 05-15-2013 02:19 PM

Not real sure -- built it, upgradepkg --install-new, looked at the conf and rc.apcupsd files. Moved the .new files over the old.

This may or may not be relevant: still had to comment out (no big deal)
Code:

UPSTYPE usb
#DEVICE /dev/ttyS0
DEVICE

Started it up, content of /var/lock/subsys:
Code:

ls -al /var/lock/subsys
total 12
drwxr-xr-x 2 root root 4096 May 15 15:07 ./
drwxrwxrwt 5 root root 4096 May 12 12:23 ../
-rw-r--r-- 1 root root  11 May 15 15:07 LCK..
-rw-r--r-- 1 root root    0 May 15 15:07 apcupsd

Was I supposed to place the .diffs in the SlackBuild directory (I did not have the .diffs in the directory)?

It's working, but I think I'll need to explicitly remove /var/lock/subsys/LCK.. along with /var/locak/subsys/apcupsd in /etc/rc.d/rc.apcupsd in the stop section.

Or, probably, I just screwed up and didn't get the right stuff (got it all from your link)?

rworkman 05-15-2013 10:24 PM

Quote:

Originally Posted by tronayne (Post 4951945)
Not real sure -- built it, upgradepkg --install-new, looked at the conf and rc.apcupsd files. Moved the .new files over the old.

This may or may not be relevant: still had to comment out (no big deal)
Code:

UPSTYPE usb
#DEVICE /dev/ttyS0
DEVICE


Yeah, this commit will fix that:

Code:

commit 100b865f014031887d8ff6a5562a4a9228653eda
Author: Robby Workman <rworkman@slackbuilds.org>
Date:  Wed May 15 22:13:14 2013 -0500

    system/apcupsd: Add --with-dev="" to configure flags
   
    This leaves DEVICE blank in apcupsd.conf so that UPSTYPE=usb
    will work properly (it autodetects the UPS).
   
    Thanks to tronayne for the bug report and fix.
   
    Signed-off-by: Robby Workman <rworkman@slackbuilds.org>

Quote:

Started it up, content of /var/lock/subsys:
Code:

ls -al /var/lock/subsys
total 12
drwxr-xr-x 2 root root 4096 May 15 15:07 ./
drwxrwxrwt 5 root root 4096 May 12 12:23 ../
-rw-r--r-- 1 root root  11 May 15 15:07 LCK..
-rw-r--r-- 1 root root    0 May 15 15:07 apcupsd

Was I supposed to place the .diffs in the SlackBuild directory (I did not have the .diffs in the directory)?
They should have been in the patches/ subdir. Based on what you pasted above, they got used.

Quote:

It's working, but I think I'll need to explicitly remove /var/lock/subsys/LCK.. along with /var/locak/subsys/apcupsd in /etc/rc.d/rc.apcupsd in the stop section.
That's possible.

Aside from the "I maintain the apcupsd build at SBo" reason, I waded into this thread because I've had a mysterious problem related to apcupsd and cron. logrotation from cron.daily is throwing a "file not found" or some such error on /var/log/apcupsd.events (and mailing me about it) every night, but the file is actually present (and has been all along). I added "missingok" to the apcupsd logrotate.d entry, but the mails are still arriving, so I'm a bit lost. I kept thinking I'd missed something obvious, but in those cases, I generally have an epiphany in the shower or while cutting grass or some such, and I've not had one of those yet :-) Anyway, given the locking problem and how it affects restart() in rc.apcupsd, I can't help but wonder if it's related, so here I am. I'll grant that I don't see how it could be, but given that my other ideas have been dead ends, it can't hurt to pursue another, right? :-) I upgraded to the rebuilt apcupsd package a bit ago, so maybe I'll know something more tomorrow.


All times are GMT -5. The time now is 05:36 PM.