LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
Search this Thread
Old 03-06-2007, 09:58 AM   #1
Avatar
Member
 
Registered: May 2001
Location: Canada
Distribution: old ones
Posts: 532

Rep: Reputation: 30
Angry server hangs at same time every day


This post is a continuation of this thread: http://www.linuxquestions.org/questi...d.php?t=517683

But due to new developments, it's not the problem I originally thought so I believe it is appropriate to start a new thread.

The problem is: my old, Mandrake Multi Network Firewall-based server (well the server isn't old, but the OS is.) crashes every single night just after 4:00 AM. We come in to find the server hung, and have to do a hard reset to get it back up.

/var/log/messages log looks like this:

Code:
Mar  6 04:02:01 MDKSERV CROND[12928]: (root) CMD (nice -n 19 run-parts /etc/cron.daily)
Mar  6 04:02:01 MDKSERV CROND[12929]: (root) CMD (   /usr/share/msec/promisc_check.sh)
Mar  6 04:02:01 MDKSERV anacron[12939]: Updated timestamp for job `cron.daily' to 2007-03-06
Mar  6 04:03:00 MDKSERV CROND[12996]: (root) CMD (   /usr/share/msec/promisc_check.sh)
Mar  6 04:04:00 MDKSERV CROND[13048]: (root) CMD (   /usr/share/msec/promisc_check.sh)
Mar  6 08:09:23 MDKSERV syslogd 1.4.1: restart.
Mar  6 08:09:23 MDKSERV kernel: klogd 1.4.1, log source = /proc/kmsg started.
Mar  6 08:09:23 MDKSERV kernel: Inspecting /boot/System.map-2.4.18-8.1mdksecure
Mar  6 08:09:24 MDKSERV kernel: Loaded 16536 symbols from /boot/System.map-2.4.18-8.1mdksecure.
Mar  6 08:09:24 MDKSERV kernel: Symbols match kernel version 2.4.18.
(etc...)
How can I disable whatever is causing the server to hang? This log doesn't tell me what the problem is, how can I find it?

Thanks! This has been driving me nuts for a long time!
 
Old 03-06-2007, 10:23 AM   #2
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
The problem appears to be in a daily cron job. Look in the files in /etc/cron.daily/.... You may be able to see which one is blowing up. If you can't see any obvious problem, start removing various jobs until the pain stops. You may be able to test individual components by running them at the commandline (as root).

--- rod.
 
Old 03-06-2007, 10:57 AM   #3
Avatar
Member
 
Registered: May 2001
Location: Canada
Distribution: old ones
Posts: 532

Original Poster
Rep: Reputation: 30
Hi rod, thanks for your answer. I looked in my cron.dailydirectory but I did not see what could be causing a problem.

Do you think it would be possible for me to edit all those file with a line like
Code:
echo -n " Running <filename>"
and then run all the files at once with the command that is inside my /etc/crontab? i.e.
Code:
nice -n 19 run-parts /etc/cron.daily
What I am hoping this will do, is show me an output like
Code:
Running 0anacrontab
Running 0sarg
Running clean-naat
Running logrotate
(...)
and it would stop at the one causing the problem? Do you think this would work?
 
Old 03-06-2007, 12:23 PM   #4
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
That should be a valid diagnostic. Note that some things that run as cron jobs will behave differently when run repetitively in the manner we are discussing. Things that cleanup logfiles, temp directories, etc may not do anything on the second or third iterations in close succession.

--- rod.
 
Old 03-06-2007, 02:13 PM   #5
TigerOC
Senior Member
 
Registered: Jan 2003
Location: Devon, UK
Distribution: Debian Etc/kernel 2.6.18-4K7
Posts: 2,380

Rep: Reputation: 49
Quote:
Mar 6 04:04:00 MDKSERV CROND[13048]: (root) CMD ( /usr/share/msec/promisc_check.sh)
Look at your crontab and the problem either lies in this line or probably the succeeding line. Examine the instruction carefully and see if it is valid.
Also check /var/log/dmesg and syslog for errors.

Last edited by TigerOC; 03-06-2007 at 02:24 PM.
 
Old 03-06-2007, 02:54 PM   #6
Avatar
Member
 
Registered: May 2001
Location: Canada
Distribution: old ones
Posts: 532

Original Poster
Rep: Reputation: 30
Hi TigerOC,

That promisc_check.sh thing runs every single minute. So it would be in another directory, I think. The problem I have happens only once a day at just after 4:00 AM. How can I know which instruction would execute immediately after? It seems to me, and I'm just guessing, that the daily, hourly, and minute crons can run at the same time? As in, the minute one can run in between the daily ones?

The only cron.daily entry I see in the log is the one from "anacron" as there is an entry in /etc/cron.daily called "0anacron"

Edit: I checked /var/log/dmesg; it looks like all the stuff that my screen says when I boot up. There are no other kinds of things logged in there. If you want it, I will post it.

I also checked /var/log/syslog and was shocked to find it is 2.9 GB in size. I could hardly believe my eyes: 2.9 GB of TEXT?? Yikes. Looks like that log hasn't been rotated since December 10, 4:03 AM (right around the time the crashing started). I am trying to pull out only the stuff from today but it will take a while...

Last edited by Avatar; 03-06-2007 at 03:03 PM.
 
Old 03-06-2007, 03:08 PM   #7
Avatar
Member
 
Registered: May 2001
Location: Canada
Distribution: old ones
Posts: 532

Original Poster
Rep: Reputation: 30
Here is my syslog: It's the same as the other one.
Code:
(...)
Mar  6 04:00:26 MDKSERV adsl: adsl-start startup succeeded
Mar  6 04:00:59 MDKSERV CROND[12900]: (root) CMD (nice -n 19 run-parts /etc/cron.hourly)
Mar  6 04:00:59 MDKSERV CROND[12902]: (root) CMD (   /usr/share/msec/promisc_check.sh)
Mar  6 04:02:01 MDKSERV CROND[12928]: (root) CMD (nice -n 19 run-parts /etc/cron.daily)
Mar  6 04:02:01 MDKSERV CROND[12929]: (root) CMD (   /usr/share/msec/promisc_check.sh)
Mar  6 04:02:01 MDKSERV anacron[12939]: Updated timestamp for job `cron.daily' to 2007-03-06
Mar  6 04:03:00 MDKSERV CROND[12996]: (root) CMD (   /usr/share/msec/promisc_check.sh)
Mar  6 04:04:00 MDKSERV CROND[13048]: (root) CMD (   /usr/share/msec/promisc_check.sh)
Mar  6 08:09:23 MDKSERV syslogd 1.4.1: restart.
Mar  6 08:09:23 MDKSERV kernel: klogd 1.4.1, log source = /proc/kmsg started.
Mar  6 08:09:23 MDKSERV kernel: Inspecting /boot/System.map-2.4.18-8.1mdksecure Mar  6 08:09:24 MDKSERV kernel: Loaded 16536 symbols from /boot/System.map-2.4.18-8.1mdksecure.
Mar  6 08:09:24 MDKSERV kernel: Symbols match kernel version 2.4.18.
Mar  6 08:09:24 MDKSERV kernel: Loaded 257 symbols from 11 modules.
(...)
 
Old 03-06-2007, 03:09 PM   #8
BillyGalbreath
Member
 
Registered: Nov 2005
Location: Houston Texas
Distribution: Debian Sid
Posts: 379

Rep: Reputation: 31
Is there a backup application installed on the server?

Last time I've seen a server do this was due to a daily overnight backup system freezing because it ran out of memory. It was also at 4:07AM every single night like clockwork.

Try disableing your backup system for a night or two and see what happens - Or just upgrade your RAM (and maybe SWAP too) to at least double what you currently have.
 
Old 03-07-2007, 01:57 AM   #9
TigerOC
Senior Member
 
Registered: Jan 2003
Location: Devon, UK
Distribution: Debian Etc/kernel 2.6.18-4K7
Posts: 2,380

Rep: Reputation: 49
There are some funnies here and also relates to the previous thread. 1stly there is no way your syslog should be that big. Normally the system (mine anyway) starts a new log every day and the oldest one is dumped. So how old is the syslog? Apache is stopped and restarted once a week by cron and a new log started. This is not normal. I would say that this is not a crash but the system freezing up because of lack of resources???? Is the system totally unresponsive to input? Do you have to reboot and if so are you using reset or powering off?
 
Old 03-07-2007, 02:59 PM   #10
Avatar
Member
 
Registered: May 2001
Location: Canada
Distribution: old ones
Posts: 532

Original Poster
Rep: Reputation: 30
Edit: to Billy: No there is no backup system, but you are right SOMETHING is freezing up.
Tiger: Yes it is locking up completely, keyboard is unresponsive. By the time we come in in the morning, the screen has gone to sleep and I never saw the error message that was (apparently) being displayed (see below). We had to power off/on the server by using the power button.

OK So, removing everything from /etc/cron.d and then re-adding one at a time, I managed to trace it down to the logrotate script. (This explains the huge syslog file.)

So then I looked in /etc/logrotate.d and by process of elimination narrowed it down to the squid's logrotate script. (I found that 2 of squid's logs were being rotated but not the other 2. So it must crash in between). The actual error message causing the server to hang is

Code:
Serverworks OSB4 in impossible state.
Disable UDMA or if you are using Seagate then try switching disk types on this controller.
OSB4: Continuing might cause disk corruption
I have seen this error message before and it apparently is a bug in the kernel version I am using. I tried to upgrade to the latest 2.4 kernel before, because of this error would happen sometimes on boot, and that didn't work at all, so I had to revert.

Anyway, a workaround for now is to remove the squid script from logrotate entirely. The bad news is, we use squid and its logs are going to be huge.

Any suggestions welcome.

Last edited by Avatar; 03-07-2007 at 03:05 PM.
 
Old 03-07-2007, 04:49 PM   #11
BillyGalbreath
Member
 
Registered: Nov 2005
Location: Houston Texas
Distribution: Debian Sid
Posts: 379

Rep: Reputation: 31
Try upgrading to the newest 2.4 kernel again. If no dice, then try 2.6 kernel. If no dice, disable DMA. If no dice, just dont run that script.
 
Old 03-08-2007, 01:52 AM   #12
TigerOC
Senior Member
 
Registered: Jan 2003
Location: Devon, UK
Distribution: Debian Etc/kernel 2.6.18-4K7
Posts: 2,380

Rep: Reputation: 49
I would suggest installing a 2.6 kernel. Mandrake must have a package for download which would be easy to install. I am very surprised that you have not had corruption already either from hard reboots or the kernel bug. At least you know the cause and it should be fairly easy to correct. If all else fails install a new drive and dd the contents over.
 
Old 03-09-2007, 12:56 PM   #13
Avatar
Member
 
Registered: May 2001
Location: Canada
Distribution: old ones
Posts: 532

Original Poster
Rep: Reputation: 30
Thanks for the replies! I just wanted to confirm that it is that script, I moved it out of the logrotate.d directory and no more lock ups in 2 days!.

I am installing Ubuntu Edgy 6.10 which has kernel 2.6.17, on another machine and I will replace this one. Hopefully I will never see that error message again!

Thanks for the help, i appreciate it.
 
  


Reply

Tags
crash, mandrake, server


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Allow a user to use gaim only at a certain time of the day Menestrel Linux - Security 5 07-04-2005 12:30 AM
USB mouse hangs about once a day, needs to be replugged blimbo Linux - Hardware 3 08-14-2004 08:32 PM
time of day problem gw1500se Mandriva 9 03-23-2004 09:25 AM
ls sometimes year sometimes the time of day suguru Linux - Software 1 02-18-2004 10:27 PM
contrlling login by day and time starx Linux - General 1 12-07-2003 03:24 PM


All times are GMT -5. The time now is 06:49 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration