LinuxQuestions.org - cript interrupted at same time every night

- Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)

- - cript interrupted at same time every night (https://www.linuxquestions.org/questions/linux-general-1/cript-interrupted-at-same-time-every-night-828190/)

cript interrupted at same time every night

I'm writing a python script for my TS-7550 (running Debian) that reads data from a serial port every 10 seconds. The script works perfectly, getting data in every second ending in 5 (e.g. 05, 15, 25, etc.), until it reaches 12:25am CST (GMT-6). At this time almost every other night, our program is interrupted for 7-11 seconds, which means we miss one of these ten second readings, or it is off by almost 10 seconds. Here is the error in my log file:

Code:

2010-08-12 00:25:25,017 1879 (DEBUG)        : Wrote Q

2010-08-12 00:25:26,271 1879 (DEBUG)        : line: 5FF        8DA        17DCA        8DB        0        0        0        0

2010-08-12 00:25:26,306 1879 (DEBUG)        : Calculated sleep time:8.69449496269

2010-08-12 00:25:35,007 1879 (DEBUG)        : Wrote Q

2010-08-12 00:25:35,843 1879 (DEBUG)        : line: 601        8DD        17DCA        8DB        0        0        0        0

2010-08-12 00:25:35,878 1879 (DEBUG)        : Calculated sleep time:9.12337088585

2010-08-12 00:25:45,017 1879 (DEBUG)        : Wrote Q

2010-08-12 00:25:56,217 1879 (DEBUG)        : line: 5FF        8DD        17DC9        8DB        0        0        0        0

2010-08-12 00:25:56,232 1879 (WARNING)        : Gap in data between 1281594335 and 1281594356 (21 seconds)

2010-08-12 00:25:56,565 1879 (DEBUG)        : Calculated sleep time:8.43716692924

2010-08-12 00:26:05,007 1879 (DEBUG)        : Wrote Q

2010-08-12 00:26:05,963 1879 (DEBUG)        : line: 602        8DD        17DC6        8DB        0        0        0        0

2010-08-12 00:26:05,997 1879 (DEBUG)        : Calculated sleep time:9.00356197357

(Sending the 'Q' over the serial port tells our PIC chip to send us data)
In this particular case, the reading is off by 11 seconds, meaning we missed an entire 10 second reading that should have been at 12:25:45. The delay above occurred between sending the Q and receiving data, but it has also appeared after receiving the data and before calculating the sleep time.

I looked at /var/log/syslog and found the following:

Code:

ts7500 kernel: imklog 3.18.6, log source = /proc/kmsg started.

ts7500 rsyslogd: [origin software="rsyslogd" swVersion="3.18.6" x-pid="745" x-info="http://www.rsyslog.com"] restart

at 12:25am CST (the system lists it as 06:25 because it is set to GMT). This was the only thing logged at about that time.

/etc/crontab reveals that the daily cron jobs are run at 6:25am GMT:

Code:

# m h dom mon dow user        command

17 *        * * *        root    cd / && run-parts --report /etc/cron.hourly

25 6        * * *        root        test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )

47 6        * * 7        root        test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )

52 6        1 * *        root        test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )

The following files are in /etc/cron.daily:

Code:

apache2

apt

aptitude

bsdmainutils

logrotate

man-db

standard

But as far as I can tell only logrotate actually executes. I decided to modify logrotate to use the nice command in hopes that lowering the priority would help

Code:

#!/bin/sh



test -x /usr/sbin/logrotate || exit 0

nice -n 19 /usr/sbin/logrotate /etc/logrotate.conf

and then executed my script by running

Code:

nice -n -20 python Main.py

but this doesn't seem to have alleviated the problem.

Does anyone know what might be causing this? It's very important to us that we don't miss a 10-second reading, but it's acceptable if the reading is off by a few seconds.

Thanks!

EDIT: Is there a way to change the title? It should say 'script', not 'cript'

Try disabling all other tasks for a night or two

Greetings from industrial control nightmare land!

1. I'd try disabling all other jobs (just chmod -x the scripts in daily, weekly, etc or comment out the lines in crontab). Your case is strong for it likely being the logrotate job or something else in daily. Once you can confirm that it is definitely the logrotate job, you can fix the issue.

2. logrotate runs scripts that may disable services. You should examine the logrotate.d directory to see what is rotating and whether or not the services are being restarted/stopped/etc. I've run aground of this kind of issue before.

3. Also, try nice -20 to launch your query job. However, if the logrotate job is killing services you need then priority isn't going to count.

IF you find that logrotate is killing your services, just clean up the job to either allow for a blank period or, better, disable the non-error logging of your services so you don't even need logrotate. That way you can schedule it to happen once a year and just toss this issue.

Good luck!

Edit: Ignore point 3, I somehow read past that without registering...

It just hit me.... Hard.

Howdy again, sorry to spam...

Part of the logrotate is to kill HUP the syslog service.

If you're using it, there's your issue. That's the service that is dead when your system attempts to write. Have your system create a manual log instead of relying on syslog if you can't disable the logrotate for syslog (unadvisable as it is VERY chatty).

HTH.

Thanks, Lyle. I've made changes to crontab to run the daily cron jobs at a different time, but I won't know until tomorrow morning if that had any effect. I'm also going to disable all of the other cron scripts as you recommended (as soon as we're done with some slight hardware modifications).

Looking at logrotate.d, only syslog appears to be rotating daily. apache2 and ppp are weekly, and the others are all monthly. In rsyslog:

Code:

/var/log/syslog

{

        rotate 7

        daily

        missingok

        notifempty

        delaycompress

        compress

        postrotate

                invoke-rc.d rsyslog reload > /dev/null

        endscript

}

I'm not familiar with what these files are supposed to look like, but it appears that rsyslog is being restarted (reloaded?), which is what the log files suggested.

The logging for my program is handled with the python logging module. Specifically, I use logging.getlogger() and add a RotatingFileHandler to that. I'm not familiar with the details of logging, so I'm not sure how I would know if this is using whatever it is that logrotate kills. Replacing this logging would be very time consuming, so I'd rather consider other options.

With options reduced... Go for better options...

Hello again.

Just a thought, but comment out the compress option in the syslog rotate. It DOES only keep 7 rotates of the log, so compress (which is the slowest portion of the operation) would be the thing killing your operation, likely... Syslog's restart shouldn't take more than a couple of seconds at most. Unless, of course, you're running this on a 486 rig. :-D

Probably the best option, then, provided you are still having the issue at the new time and without compress, is to consider disabling the logrotate for the syslog.

As I mentioned before, this is definitely non-optimal. However, if you disable it for a week or two and check the log file size vs. your free space you may find that you're ok.

Often people over-provision their space (the best option for maintaining quality in MOST linux filesystems), if you have done so and your /var/log containing partition is of sufficient size to handle your syslog not rotating as often, you may wish to move the syslog rotation to annual or monthly at a time where you will not be bothered by the skippage. Depending on your operations you may find that this is suitable for even weekly (Sunday night or whatever).

Again, I hope this helps.

I've commented out compress and delaycompress and hopefully that will make enough of a difference that we won't have to make any more drastic changes. Since this machine is really only responsible for running our program and running ntpdate, I think we could easily set it to rotate weekly as well.

For now, I'm leaving for a week of vacation. I'll let you know what I find when I get back.

Thanks again.

-Ben

NUDGE

Ben, how is it working now?

I'm guessing we go with the IT motto that no news is good news, but I'd like to hear for certain if it's not too much trouble...

Thanks,

Lyle

Your guess is mostly correct. Commenting out those lines didn't make a significant difference, so we decided to go a slightly different route, but this requires a little explanation:

The reason that we needed every ten second reading is because our old website and scripts (which we don't have time to update right now) expect to find a file every minute that contains six readings. We wanted the six readings to be every ten seconds, but there's no requirement that they are, only that they represent a minute. So in our main loop we wrote a line that checks to see if we have as many ten second readings as we should based on how far into the minute we are. If we don't have enough readings, we'll take more readings until we catch up to the expected value. This way every minute will have six readings, even when the daily cron jobs run, and the vast majority of these readings will be evenly spaced.

Here's the basic code behind it:

Code:

if (int(time.time())%60/5%11+1)/2 != self.counter:

        timeLeft = 0

else:

        timeLeft = (Config.UPDATE_FREQUENCY/2-(time.time()%Config.UPDATE_FREQUENCY))%Config.UPDATE_FREQUENCY

time.sleep(timeLeft)

self.getData()

So, we didn't really fix the original error, we just accommodated it. I go back to school in a week, but I'll probably be working on this project again over winter break and may take another go at it then. Thanks again for your help.