cript interrupted at same time every night
I'm writing a python script for my TS-7550 (running Debian) that reads data from a serial port every 10 seconds. The script works perfectly, getting data in every second ending in 5 (e.g. 05, 15, 25, etc.), until it reaches 12:25am CST (GMT-6). At this time almost every other night, our program is interrupted for 7-11 seconds, which means we miss one of these ten second readings, or it is off by almost 10 seconds. Here is the error in my log file:
Code:
2010-08-12 00:25:25,017 1879 (DEBUG) : Wrote Q In this particular case, the reading is off by 11 seconds, meaning we missed an entire 10 second reading that should have been at 12:25:45. The delay above occurred between sending the Q and receiving data, but it has also appeared after receiving the data and before calculating the sleep time. I looked at /var/log/syslog and found the following: Code:
ts7500 kernel: imklog 3.18.6, log source = /proc/kmsg started. /etc/crontab reveals that the daily cron jobs are run at 6:25am GMT: Code:
# m h dom mon dow user command Code:
apache2 Code:
#!/bin/sh Code:
nice -n -20 python Main.py Does anyone know what might be causing this? It's very important to us that we don't miss a 10-second reading, but it's acceptable if the reading is off by a few seconds. Thanks! EDIT: Is there a way to change the title? It should say 'script', not 'cript' |
Try disabling all other tasks for a night or two
Greetings from industrial control nightmare land!
1. I'd try disabling all other jobs (just chmod -x the scripts in daily, weekly, etc or comment out the lines in crontab). Your case is strong for it likely being the logrotate job or something else in daily. Once you can confirm that it is definitely the logrotate job, you can fix the issue. 2. logrotate runs scripts that may disable services. You should examine the logrotate.d directory to see what is rotating and whether or not the services are being restarted/stopped/etc. I've run aground of this kind of issue before. 3. Also, try nice -20 to launch your query job. However, if the logrotate job is killing services you need then priority isn't going to count. IF you find that logrotate is killing your services, just clean up the job to either allow for a blank period or, better, disable the non-error logging of your services so you don't even need logrotate. That way you can schedule it to happen once a year and just toss this issue. Good luck! Edit: Ignore point 3, I somehow read past that without registering... |
It just hit me.... Hard.
Howdy again, sorry to spam...
Part of the logrotate is to kill HUP the syslog service. If you're using it, there's your issue. That's the service that is dead when your system attempts to write. Have your system create a manual log instead of relying on syslog if you can't disable the logrotate for syslog (unadvisable as it is VERY chatty). HTH. |
Thanks, Lyle. I've made changes to crontab to run the daily cron jobs at a different time, but I won't know until tomorrow morning if that had any effect. I'm also going to disable all of the other cron scripts as you recommended (as soon as we're done with some slight hardware modifications).
Looking at logrotate.d, only syslog appears to be rotating daily. apache2 and ppp are weekly, and the others are all monthly. In rsyslog: Code:
/var/log/syslog The logging for my program is handled with the python logging module. Specifically, I use logging.getlogger() and add a RotatingFileHandler to that. I'm not familiar with the details of logging, so I'm not sure how I would know if this is using whatever it is that logrotate kills. Replacing this logging would be very time consuming, so I'd rather consider other options. |
With options reduced... Go for better options...
Hello again.
Just a thought, but comment out the compress option in the syslog rotate. It DOES only keep 7 rotates of the log, so compress (which is the slowest portion of the operation) would be the thing killing your operation, likely... Syslog's restart shouldn't take more than a couple of seconds at most. Unless, of course, you're running this on a 486 rig. :-D Probably the best option, then, provided you are still having the issue at the new time and without compress, is to consider disabling the logrotate for the syslog. As I mentioned before, this is definitely non-optimal. However, if you disable it for a week or two and check the log file size vs. your free space you may find that you're ok. Often people over-provision their space (the best option for maintaining quality in MOST linux filesystems), if you have done so and your /var/log containing partition is of sufficient size to handle your syslog not rotating as often, you may wish to move the syslog rotation to annual or monthly at a time where you will not be bothered by the skippage. Depending on your operations you may find that this is suitable for even weekly (Sunday night or whatever). Again, I hope this helps. |
I've commented out compress and delaycompress and hopefully that will make enough of a difference that we won't have to make any more drastic changes. Since this machine is really only responsible for running our program and running ntpdate, I think we could easily set it to rotate weekly as well.
For now, I'm leaving for a week of vacation. I'll let you know what I find when I get back. Thanks again. -Ben |
NUDGE
Ben, how is it working now? I'm guessing we go with the IT motto that no news is good news, but I'd like to hear for certain if it's not too much trouble... Thanks, Lyle |
Your guess is mostly correct. Commenting out those lines didn't make a significant difference, so we decided to go a slightly different route, but this requires a little explanation:
The reason that we needed every ten second reading is because our old website and scripts (which we don't have time to update right now) expect to find a file every minute that contains six readings. We wanted the six readings to be every ten seconds, but there's no requirement that they are, only that they represent a minute. So in our main loop we wrote a line that checks to see if we have as many ten second readings as we should based on how far into the minute we are. If we don't have enough readings, we'll take more readings until we catch up to the expected value. This way every minute will have six readings, even when the daily cron jobs run, and the vast majority of these readings will be evenly spaced. Here's the basic code behind it: Code:
if (int(time.time())%60/5%11+1)/2 != self.counter: |
All times are GMT -5. The time now is 02:09 PM. |