Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
I'm writing a python script for my TS-7550 (running Debian) that reads data from a serial port every 10 seconds. The script works perfectly, getting data in every second ending in 5 (e.g. 05, 15, 25, etc.), until it reaches 12:25am CST (GMT-6). At this time almost every other night, our program is interrupted for 7-11 seconds, which means we miss one of these ten second readings, or it is off by almost 10 seconds. Here is the error in my log file:
(Sending the 'Q' over the serial port tells our PIC chip to send us data)
In this particular case, the reading is off by 11 seconds, meaning we missed an entire 10 second reading that should have been at 12:25:45. The delay above occurred between sending the Q and receiving data, but it has also appeared after receiving the data and before calculating the sleep time.
I looked at /var/log/syslog and found the following:
1. I'd try disabling all other jobs (just chmod -x the scripts in daily, weekly, etc or comment out the lines in crontab). Your case is strong for it likely being the logrotate job or something else in daily. Once you can confirm that it is definitely the logrotate job, you can fix the issue.
2. logrotate runs scripts that may disable services. You should examine the logrotate.d directory to see what is rotating and whether or not the services are being restarted/stopped/etc. I've run aground of this kind of issue before.
3. Also, try nice -20 to launch your query job. However, if the logrotate job is killing services you need then priority isn't going to count.
IF you find that logrotate is killing your services, just clean up the job to either allow for a blank period or, better, disable the non-error logging of your services so you don't even need logrotate. That way you can schedule it to happen once a year and just toss this issue.
Edit: Ignore point 3, I somehow read past that without registering...
Last edited by lylemwood; 08-24-2010 at 10:15 AM.
Part of the logrotate is to kill HUP the syslog service.
If you're using it, there's your issue. That's the service that is dead when your system attempts to write. Have your system create a manual log instead of relying on syslog if you can't disable the logrotate for syslog (unadvisable as it is VERY chatty).
Thanks, Lyle. I've made changes to crontab to run the daily cron jobs at a different time, but I won't know until tomorrow morning if that had any effect. I'm also going to disable all of the other cron scripts as you recommended (as soon as we're done with some slight hardware modifications).
Looking at logrotate.d, only syslog appears to be rotating daily. apache2 and ppp are weekly, and the others are all monthly. In rsyslog:
I'm not familiar with what these files are supposed to look like, but it appears that rsyslog is being restarted (reloaded?), which is what the log files suggested.
The logging for my program is handled with the python logging module. Specifically, I use logging.getlogger() and add a RotatingFileHandler to that. I'm not familiar with the details of logging, so I'm not sure how I would know if this is using whatever it is that logrotate kills. Replacing this logging would be very time consuming, so I'd rather consider other options.
Just a thought, but comment out the compress option in the syslog rotate. It DOES only keep 7 rotates of the log, so compress (which is the slowest portion of the operation) would be the thing killing your operation, likely... Syslog's restart shouldn't take more than a couple of seconds at most. Unless, of course, you're running this on a 486 rig. :-D
Probably the best option, then, provided you are still having the issue at the new time and without compress, is to consider disabling the logrotate for the syslog.
As I mentioned before, this is definitely non-optimal. However, if you disable it for a week or two and check the log file size vs. your free space you may find that you're ok.
Often people over-provision their space (the best option for maintaining quality in MOST linux filesystems), if you have done so and your /var/log containing partition is of sufficient size to handle your syslog not rotating as often, you may wish to move the syslog rotation to annual or monthly at a time where you will not be bothered by the skippage. Depending on your operations you may find that this is suitable for even weekly (Sunday night or whatever).
I've commented out compress and delaycompress and hopefully that will make enough of a difference that we won't have to make any more drastic changes. Since this machine is really only responsible for running our program and running ntpdate, I think we could easily set it to rotate weekly as well.
For now, I'm leaving for a week of vacation. I'll let you know what I find when I get back.
Your guess is mostly correct. Commenting out those lines didn't make a significant difference, so we decided to go a slightly different route, but this requires a little explanation:
The reason that we needed every ten second reading is because our old website and scripts (which we don't have time to update right now) expect to find a file every minute that contains six readings. We wanted the six readings to be every ten seconds, but there's no requirement that they are, only that they represent a minute. So in our main loop we wrote a line that checks to see if we have as many ten second readings as we should based on how far into the minute we are. If we don't have enough readings, we'll take more readings until we catch up to the expected value. This way every minute will have six readings, even when the daily cron jobs run, and the vast majority of these readings will be evenly spaced.
So, we didn't really fix the original error, we just accommodated it. I go back to school in a week, but I'll probably be working on this project again over winter break and may take another go at it then. Thanks again for your help.