at daemon bugs and proposed fix for Slackware
Edit: To save newcomers time, the correct problem begins at post No. 8. The original post is not applicable because the -l option applies to batch and not to at. Apparently at is running the scheduled jobs but in an unexpected manner.
I believe I stumbled across some bugs with the at daemon. One bug I hope Pat agrees can be fixed in the rc.M script. The default Slackware startup for atd is hard-coded to -l 1. From what I have read, a general rule of thumb nowadays is setting that value to at least the number of CPU cores. I stumbled across this bug with my HTPC. I use atd to start TV recordings. This works fine 99% of the time. Yet my scheduled recordings occasionally never triggered. There never were log entries. No clues other than a mysterious system mail that I never could trace the source. Just nothing happened. Last night the same thing happened. Because of the utter lack of log entries, I proceeded in my research with the presumption the at daemon was not executing, despite ps indicating the daemon was running. I read the atd man page --- with a vague explanation of the -l option. Curious, I wondered whether my using xbmc during the time of the scheduled recording had any effect. I ran top in another console while running xbmc and a DVD ISO movie. I waited about 10 minutes before I noticed the load factor bump just above 1, but only momentarily. If I understand the atd man page correctly, when the load factor exceeds the -l option, even momentarily, but that event occurs when the scheduled recording should trigger, the atd will not run the job. Which leads to the second bug: I'm much puzzled there is no log entry from the atd why the scheduled job did not execute. That absolutely nothing happens is a mystery. Silence. One simple log entry could have resolved this mystery long ago. Perhaps I'm not noticing an obvious way to trigger a log entry when the load factor is greater than the -l setting. I am modifying my rc.d scripts accordingly. I welcome any comments about the issue. Pat, please consider revising rc.M to programmatically select the -l setting. As CPU intensive tasks such as video consumption increases with modern usage, I suspect -l 1 no longer is adequate. Edit: My inelegant fix: Code:
LOAD_FACTOR_LIMIT="`cat /proc/cpuinfo | grep cores | uniq | awk -F ': ' '{print $2}'`" |
/proc/cpuinfo isn't very reliable as its contents will vary depending on vendor. Something along the lines of "$( grep '^cpu' /proc/stat | wc -l)" would be more reliable, and is what I use for my "make -j"'s in my build scripts.
Couple of thoughts/questions. Does "-l 0" turn the limit checking off completely? Also which loadavg value does it use: 1min, 5min, or 15min? I just tried it with a 3.0+ loadavg on a 2 core box, and it still ran my 'at' job despite the "-l 1". update: just, tried again after running some long duration tasks for over 15 mins and even with all 3 loadavgs > 2 it still ran it. |
I agree the parsing of /proc/cpuinfo is incomplete. For example, on a single proc machine with non-smp kernel, LOAD_FACTOR_LIMIT will be empty.
With /proc/stat, you might prefer $(grep -Ec '^cpu[0-9]+' /proc/stat) to not over-count. Might also look into coreutils' nproc(1). n.b. In a hyper-threaded environment you'll have more logical processing units than physical ones; Which might matter depending on what you're trying to achieve. --mancha |
Quote:
I wasn't aware 'nproc' existed. Thanks for that tip Mancha, |
Quote:
You schedule a recording with 'at'. atd runs it at the specified time regardless of the load average. You might want to use batch processing e.g. if you run large numerical simulations. Then you submit the job using 'batch'. atd runs it when the load is low enough (specified with -l). |
Quote:
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
Update
I might have a clue but I need help deciphering what might have happened.
In my original post I mentioned a mysterious system mail that I never could trace the source. Now I have traced the source. The email: sh: line 61: cd: /root: Permission denied Execution directory inaccessible From backups I learned that last line is from the at job script created when I scheduled the recording, but the at job script contains the following: Code:
cd /home/htpc || { The time of the email matches the time of the scheduled recording. Basically, the reason the job did not execute (as expected) is something changed the script to cd to /root rather than cd /home/htpc. Confusing to me is why the script changed to cd to /root rather than cd to /home/htpc. Backups of the script show /home/htpc, not /root. Any ideas? |
Isn't "atd -l" for batch jobs? Are at jobs affected?
|
Quote:
So the correct question is in my previous post. As mentioned in my previous post, apparently the at job is executing but the mysterious email indicates the at job script is being changed from 'cd /home/htpc' to 'cd /root' and I have no idea why. |
Gee, I'd be paranoid if any of my executable scripts got modified without my knowledge, particularly to attempt to get something executed in root's directory. Perhaps an attempt to attack the machine?
|
Paranoid? No.
The mysterious email does not necessarily mean the at job scripts are being modified. Something is awry with looking for 'cd /root' rather than 'cd /home/htpc'. I won't be suprised at all if the problem is PEBKAC. I just want to know what actually happens. As this happens only once in a blue moon, debugging is a challenge. /var/spool/atjobs is owned daemon:daemon. Each at job (scheduled recording) within that directory is owned htpc:daemon (chmod 700). |
All times are GMT -5. The time now is 11:24 AM. |