LinuxQuestions.org - at daemon bugs and proposed fix for Slackware

- Slackware (https://www.linuxquestions.org/questions/slackware-14/)

- - at daemon bugs and proposed fix for Slackware (https://www.linuxquestions.org/questions/slackware-14/at-daemon-bugs-and-proposed-fix-for-slackware-4175484700/)

at daemon bugs and proposed fix for Slackware

Edit: To save newcomers time, the correct problem begins at post No. 8. The original post is not applicable because the -l option applies to batch and not to at. Apparently at is running the scheduled jobs but in an unexpected manner.

I believe I stumbled across some bugs with the at daemon.

One bug I hope Pat agrees can be fixed in the rc.M script.

The default Slackware startup for atd is hard-coded to -l 1. From what I have read, a general rule of thumb nowadays is setting that value to at least the number of CPU cores.

I stumbled across this bug with my HTPC. I use atd to start TV recordings. This works fine 99% of the time. Yet my scheduled recordings occasionally never triggered. There never were log entries. No clues other than a mysterious system mail that I never could trace the source. Just nothing happened.

Last night the same thing happened. Because of the utter lack of log entries, I proceeded in my research with the presumption the at daemon was not executing, despite ps indicating the daemon was running.

I read the atd man page --- with a vague explanation of the -l option. Curious, I wondered whether my using xbmc during the time of the scheduled recording had any effect.

I ran top in another console while running xbmc and a DVD ISO movie. I waited about 10 minutes before I noticed the load factor bump just above 1, but only momentarily. If I understand the atd man page correctly, when the load factor exceeds the -l option, even momentarily, but that event occurs when the scheduled recording should trigger, the atd will not run the job.

Which leads to the second bug: I'm much puzzled there is no log entry from the atd why the scheduled job did not execute. That absolutely nothing happens is a mystery. Silence. One simple log entry could have resolved this mystery long ago. Perhaps I'm not noticing an obvious way to trigger a log entry when the load factor is greater than the -l setting.

I am modifying my rc.d scripts accordingly. I welcome any comments about the issue.

Pat, please consider revising rc.M to programmatically select the -l setting. As CPU intensive tasks such as video consumption increases with modern usage, I suspect -l 1 no longer is adequate.

Edit: My inelegant fix:

Code:

LOAD_FACTOR_LIMIT="`cat /proc/cpuinfo | grep cores | uniq | awk -F ': ' '{print $2}'`"

/usr/sbin/atd -b 15 -l $LOAD_FACTOR_LIMIT

/proc/cpuinfo isn't very reliable as its contents will vary depending on vendor. Something along the lines of "$( grep '^cpu' /proc/stat | wc -l)" would be more reliable, and is what I use for my "make -j"'s in my build scripts.

Couple of thoughts/questions. Does "-l 0" turn the limit checking off completely? Also which loadavg value does it use: 1min, 5min, or 15min? I just tried it with a 3.0+ loadavg on a 2 core box, and it still ran my 'at' job despite the "-l 1".

update: just, tried again after running some long duration tasks for over 15 mins and even with all 3 loadavgs > 2 it still ran it.

I agree the parsing of /proc/cpuinfo is incomplete. For example, on a single proc machine with non-smp kernel, LOAD_FACTOR_LIMIT will be empty.

With /proc/stat, you might prefer $(grep -Ec '^cpu[0-9]+' /proc/stat) to not over-count. Might also look into coreutils' nproc(1).

n.b. In a hyper-threaded environment you'll have more logical processing units than physical ones; Which might matter depending on what you're trying to achieve.

--mancha

Quote:

Originally Posted by mancha (Post 5065402)

With /proc/stat, you might prefer $(grep -Ec '^cpu[0-9]+' /proc/stat) to not over-count.

Yep, what I use does over-count by one - I should have probably pointed that out. In the case of using it on a make -j that was exactly what I wanted.

I wasn't aware 'nproc' existed. Thanks for that tip Mancha,

Quote:

Originally Posted by Woodsman (Post 5065145)

The default Slackware startup for atd is hard-coded to -l 1.

I use atd to start TV recordings. This works fine 99% of the time. Yet my scheduled recordings occasionally never triggered.

It may be a red herring.

You schedule a recording with 'at'. atd runs it at the specified time regardless of the load average.

You might want to use batch processing e.g. if you run large numerical simulations. Then you submit the job using 'batch'. atd runs it when the load is low enough (specified with -l).

Quote:

Originally Posted by Petri Kaukasoina (Post 5065583)

:doh: Ahh, of course. Thanks Petri, I completely missed that: -l only applies to 'batch' jobs.

Quote:

Something along the lines of "$( grep '^cpu' /proc/stat | wc -l)" would be more reliable

Quote:

grep -Ec '^cpu[0-9]+' /proc/stat

After reading GazL's first post I had changed my usage to egrep -c '^cpu[0-9]+' /proc/stat. Then I read mancha's bit about nproc. That seems the easiest method. :)

Quote:

Does "-l 0" turn the limit checking off completely?

I don't know.

Quote:

Also which loadavg value does it use: 1min, 5min, or 15min?

I don't know C, but browsing through the atd sources indicates all three.

Quote:

update: just, tried again after running some long duration tasks for over 15 mins and even with all 3 loadavgs > 2 it still ran it.

Thank you. The only thing I know with certainty is atd is not running my scheduled jobs in the manner intended or expected. As there are no log entries I am as yet clueless as to what happens.

Quote:

It may be a red herring.

Could be. As I mentioned, all I know with certainty is the jobs are not executing. If the problem occurred often I might have a clue even without log entries. This happens only once in a blue moon.

Quote:

You schedule a recording with 'at'. atd runs it at the specified time regardless of the load average.

Okay, atd runs the scheduled job at the specified time, but something happens to prevent the job from executing. :)

Quote:

You might want to use batch processing e.g. if you run large numerical simulations.

Would that work? This is for an HTPC to run schedule recordings. The recordings have to execute at a specified time. If I understand correctly, at is used to run jobs at specific times whereas batch runs jobs as soon as load average allows. Seems my use case needs at rather than batch.

Quote:

I completely missed that: -l only applies to 'batch' jobs.

Therefore changing atd to '-l `nproc`' has no effect when using 'at' and is useful only when using 'batch'. Therefore I'm back to the beginning with not knowing anything at all why the scheduled jobs fail to execute.

I might have a clue but I need help deciphering what might have happened.

In my original post I mentioned a mysterious system mail that I never could trace the source. Now I have traced the source. The email:

sh: line 61: cd: /root: Permission denied
Execution directory inaccessible

From backups I learned that last line is from the at job script created when I scheduled the recording, but the at job script contains the following:

Code:

cd /home/htpc || {

  echo 'Execution directory inaccessible' >&2

  exit 1

}

'Line 61' in the email matches the at job script 'exit 1' line number.

The time of the email matches the time of the scheduled recording.

Basically, the reason the job did not execute (as expected) is something changed the script to cd to /root rather than cd /home/htpc.

Confusing to me is why the script changed to cd to /root rather than cd to /home/htpc. Backups of the script show /home/htpc, not /root.

Any ideas?

Isn't "atd -l" for batch jobs? Are at jobs affected?

Quote:

Isn't "atd -l" for batch jobs? Are at jobs affected?

From what I've learned through this thread, yes, the -l option applies only to batch and not at.

So the correct question is in my previous post.

As mentioned in my previous post, apparently the at job is executing but the mysterious email indicates the at job script is being changed from 'cd /home/htpc' to 'cd /root' and I have no idea why.

Gee, I'd be paranoid if any of my executable scripts got modified without my knowledge, particularly to attempt to get something executed in root's directory. Perhaps an attempt to attack the machine?

Paranoid? No.

The mysterious email does not necessarily mean the at job scripts are being modified. Something is awry with looking for 'cd /root' rather than 'cd /home/htpc'. I won't be suprised at all if the problem is PEBKAC. I just want to know what actually happens. As this happens only once in a blue moon, debugging is a challenge.

/var/spool/atjobs is owned daemon:daemon. Each at job (scheduled recording) within that directory is owned htpc:daemon (chmod 700).