LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 11-15-2013, 01:39 PM   #1
Woodsman
Senior Member
 
Registered: Oct 2005
Distribution: Slackware 14.1
Posts: 3,482

Rep: Reputation: 546Reputation: 546Reputation: 546Reputation: 546Reputation: 546Reputation: 546
at daemon bugs and proposed fix for Slackware


Edit: To save newcomers time, the correct problem begins at post No. 8. The original post is not applicable because the -l option applies to batch and not to at. Apparently at is running the scheduled jobs but in an unexpected manner.

I believe I stumbled across some bugs with the at daemon.

One bug I hope Pat agrees can be fixed in the rc.M script.

The default Slackware startup for atd is hard-coded to -l 1. From what I have read, a general rule of thumb nowadays is setting that value to at least the number of CPU cores.

I stumbled across this bug with my HTPC. I use atd to start TV recordings. This works fine 99% of the time. Yet my scheduled recordings occasionally never triggered. There never were log entries. No clues other than a mysterious system mail that I never could trace the source. Just nothing happened.

Last night the same thing happened. Because of the utter lack of log entries, I proceeded in my research with the presumption the at daemon was not executing, despite ps indicating the daemon was running.

I read the atd man page --- with a vague explanation of the -l option. Curious, I wondered whether my using xbmc during the time of the scheduled recording had any effect.

I ran top in another console while running xbmc and a DVD ISO movie. I waited about 10 minutes before I noticed the load factor bump just above 1, but only momentarily. If I understand the atd man page correctly, when the load factor exceeds the -l option, even momentarily, but that event occurs when the scheduled recording should trigger, the atd will not run the job.

Which leads to the second bug: I'm much puzzled there is no log entry from the atd why the scheduled job did not execute. That absolutely nothing happens is a mystery. Silence. One simple log entry could have resolved this mystery long ago. Perhaps I'm not noticing an obvious way to trigger a log entry when the load factor is greater than the -l setting.

I am modifying my rc.d scripts accordingly. I welcome any comments about the issue.

Pat, please consider revising rc.M to programmatically select the -l setting. As CPU intensive tasks such as video consumption increases with modern usage, I suspect -l 1 no longer is adequate.

Edit: My inelegant fix:

Code:
LOAD_FACTOR_LIMIT="`cat /proc/cpuinfo | grep cores | uniq | awk -F ': ' '{print $2}'`"
/usr/sbin/atd -b 15 -l $LOAD_FACTOR_LIMIT

Last edited by Woodsman; 11-18-2013 at 07:14 PM.
 
Old 11-15-2013, 03:29 PM   #2
GazL
LQ Veteran
 
Registered: May 2008
Posts: 7,115

Rep: Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268
/proc/cpuinfo isn't very reliable as its contents will vary depending on vendor. Something along the lines of "$( grep '^cpu' /proc/stat | wc -l)" would be more reliable, and is what I use for my "make -j"'s in my build scripts.

Couple of thoughts/questions. Does "-l 0" turn the limit checking off completely? Also which loadavg value does it use: 1min, 5min, or 15min? I just tried it with a 3.0+ loadavg on a 2 core box, and it still ran my 'at' job despite the "-l 1".

update: just, tried again after running some long duration tasks for over 15 mins and even with all 3 loadavgs > 2 it still ran it.

Last edited by GazL; 11-15-2013 at 03:51 PM.
 
Old 11-15-2013, 10:00 PM   #3
mancha
Member
 
Registered: Aug 2012
Posts: 484

Rep: Reputation: Disabled
I agree the parsing of /proc/cpuinfo is incomplete. For example, on a single proc machine with non-smp kernel, LOAD_FACTOR_LIMIT will be empty.

With /proc/stat, you might prefer $(grep -Ec '^cpu[0-9]+' /proc/stat) to not over-count. Might also look into coreutils' nproc(1).

n.b. In a hyper-threaded environment you'll have more logical processing units than physical ones; Which might matter depending on what you're trying to achieve.

--mancha

Last edited by mancha; 11-15-2013 at 10:02 PM.
 
3 members found this post helpful.
Old 11-16-2013, 04:55 AM   #4
GazL
LQ Veteran
 
Registered: May 2008
Posts: 7,115

Rep: Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268
Quote:
Originally Posted by mancha View Post
With /proc/stat, you might prefer $(grep -Ec '^cpu[0-9]+' /proc/stat) to not over-count.
Yep, what I use does over-count by one - I should have probably pointed that out. In the case of using it on a make -j that was exactly what I wanted.

I wasn't aware 'nproc' existed. Thanks for that tip Mancha,
 
1 members found this post helpful.
Old 11-16-2013, 06:35 AM   #5
Petri Kaukasoina
Senior Member
 
Registered: Mar 2007
Posts: 2,421

Rep: Reputation: 2036Reputation: 2036Reputation: 2036Reputation: 2036Reputation: 2036Reputation: 2036Reputation: 2036Reputation: 2036Reputation: 2036Reputation: 2036Reputation: 2036
Quote:
Originally Posted by Woodsman View Post
The default Slackware startup for atd is hard-coded to -l 1.

I use atd to start TV recordings. This works fine 99% of the time. Yet my scheduled recordings occasionally never triggered.
It may be a red herring.

You schedule a recording with 'at'. atd runs it at the specified time regardless of the load average.

You might want to use batch processing e.g. if you run large numerical simulations. Then you submit the job using 'batch'. atd runs it when the load is low enough (specified with -l).
 
1 members found this post helpful.
Old 11-16-2013, 06:41 AM   #6
GazL
LQ Veteran
 
Registered: May 2008
Posts: 7,115

Rep: Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268Reputation: 5268
Quote:
Originally Posted by Petri Kaukasoina View Post
It may be a red herring.

You schedule a recording with 'at'. atd runs it at the specified time regardless of the load average.

You might want to use batch processing e.g. if you run large numerical simulations. Then you submit the job using 'batch'. atd runs it when the load is low enough (specified with -l).
Ahh, of course. Thanks Petri, I completely missed that: -l only applies to 'batch' jobs.
 
Old 11-16-2013, 03:38 PM   #7
Woodsman
Senior Member
 
Registered: Oct 2005
Distribution: Slackware 14.1
Posts: 3,482

Original Poster
Rep: Reputation: 546Reputation: 546Reputation: 546Reputation: 546Reputation: 546Reputation: 546
Quote:
Something along the lines of "$( grep '^cpu' /proc/stat | wc -l)" would be more reliable
Quote:
grep -Ec '^cpu[0-9]+' /proc/stat
After reading GazL's first post I had changed my usage to egrep -c '^cpu[0-9]+' /proc/stat. Then I read mancha's bit about nproc. That seems the easiest method.

Quote:
Does "-l 0" turn the limit checking off completely?
I don't know.

Quote:
Also which loadavg value does it use: 1min, 5min, or 15min?
I don't know C, but browsing through the atd sources indicates all three.

Quote:
update: just, tried again after running some long duration tasks for over 15 mins and even with all 3 loadavgs > 2 it still ran it.
Thank you. The only thing I know with certainty is atd is not running my scheduled jobs in the manner intended or expected. As there are no log entries I am as yet clueless as to what happens.

Quote:
It may be a red herring.
Could be. As I mentioned, all I know with certainty is the jobs are not executing. If the problem occurred often I might have a clue even without log entries. This happens only once in a blue moon.

Quote:
You schedule a recording with 'at'. atd runs it at the specified time regardless of the load average.
Okay, atd runs the scheduled job at the specified time, but something happens to prevent the job from executing.

Quote:
You might want to use batch processing e.g. if you run large numerical simulations.
Would that work? This is for an HTPC to run schedule recordings. The recordings have to execute at a specified time. If I understand correctly, at is used to run jobs at specific times whereas batch runs jobs as soon as load average allows. Seems my use case needs at rather than batch.

Quote:
I completely missed that: -l only applies to 'batch' jobs.
Therefore changing atd to '-l `nproc`' has no effect when using 'at' and is useful only when using 'batch'. Therefore I'm back to the beginning with not knowing anything at all why the scheduled jobs fail to execute.
 
Old 11-17-2013, 05:39 PM   #8
Woodsman
Senior Member
 
Registered: Oct 2005
Distribution: Slackware 14.1
Posts: 3,482

Original Poster
Rep: Reputation: 546Reputation: 546Reputation: 546Reputation: 546Reputation: 546Reputation: 546
Update

I might have a clue but I need help deciphering what might have happened.

In my original post I mentioned a mysterious system mail that I never could trace the source. Now I have traced the source. The email:

sh: line 61: cd: /root: Permission denied
Execution directory inaccessible

From backups I learned that last line is from the at job script created when I scheduled the recording, but the at job script contains the following:

Code:
cd /home/htpc || {
  echo 'Execution directory inaccessible' >&2
  exit 1
}
'Line 61' in the email matches the at job script 'exit 1' line number.

The time of the email matches the time of the scheduled recording.

Basically, the reason the job did not execute (as expected) is something changed the script to cd to /root rather than cd /home/htpc.

Confusing to me is why the script changed to cd to /root rather than cd to /home/htpc. Backups of the script show /home/htpc, not /root.

Any ideas?

Last edited by Woodsman; 11-17-2013 at 05:43 PM.
 
Old 11-17-2013, 06:47 PM   #9
guanx
Senior Member
 
Registered: Dec 2008
Posts: 1,191

Rep: Reputation: 239Reputation: 239Reputation: 239
Isn't "atd -l" for batch jobs? Are at jobs affected?
 
Old 11-18-2013, 12:53 PM   #10
Woodsman
Senior Member
 
Registered: Oct 2005
Distribution: Slackware 14.1
Posts: 3,482

Original Poster
Rep: Reputation: 546Reputation: 546Reputation: 546Reputation: 546Reputation: 546Reputation: 546
Quote:
Isn't "atd -l" for batch jobs? Are at jobs affected?
From what I've learned through this thread, yes, the -l option applies only to batch and not at.

So the correct question is in my previous post.

As mentioned in my previous post, apparently the at job is executing but the mysterious email indicates the at job script is being changed from 'cd /home/htpc' to 'cd /root' and I have no idea why.
 
Old 11-18-2013, 03:35 PM   #11
mostlyharmless
Senior Member
 
Registered: Jan 2008
Distribution: Arch/Manjaro, might try Slackware again
Posts: 1,859
Blog Entries: 14

Rep: Reputation: 284Reputation: 284Reputation: 284
Gee, I'd be paranoid if any of my executable scripts got modified without my knowledge, particularly to attempt to get something executed in root's directory. Perhaps an attempt to attack the machine?
 
Old 11-18-2013, 07:10 PM   #12
Woodsman
Senior Member
 
Registered: Oct 2005
Distribution: Slackware 14.1
Posts: 3,482

Original Poster
Rep: Reputation: 546Reputation: 546Reputation: 546Reputation: 546Reputation: 546Reputation: 546
Paranoid? No.

The mysterious email does not necessarily mean the at job scripts are being modified. Something is awry with looking for 'cd /root' rather than 'cd /home/htpc'. I won't be suprised at all if the problem is PEBKAC. I just want to know what actually happens. As this happens only once in a blue moon, debugging is a challenge.

/var/spool/atjobs is owned daemon:daemon. Each at job (scheduled recording) within that directory is owned htpc:daemon (chmod 700).
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Updates for PostgreSQL 9.1 and 9.2 fix critical bugs LXer Syndicated Linux News 0 09-25-2012 07:30 AM
Want to fix bugs in Linux Kernal urfi Linux - Newbie 11 03-14-2011 04:44 PM
How to fix these two Ubuntu 10.04 bugs? BennyLava Linux - Newbie 2 08-17-2010 04:32 PM
how to fix bugs on control center pin07 Linux - Newbie 3 12-21-2007 01:36 PM
how to fix the perl bugs lzyking Linux - Software 12 02-10-2006 05:04 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 03:09 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration