[SOLVED] httpd not restarting at logrotate (seg fault or similar nasty error)

niels.horn · 12-26-2009, 04:23 AM

Hi,

Several times a month, when log rotation takes place and httpd is restarted by the /etc/logrotate.d/httpd "script", I get the following message:

Code:

error_log:[Sat Dec 26 04:44:57 2009] [notice] seg fault or similar nasty error detected in the parent process

(this one is from today...)

I searched Google and found several older articles / posts about problems with PHP etc., but this box is running Slackware 13.0 with all the security patches.

Restarting by hand (/etc/rc.d/rc.httpd restart) causes the same crash.

Since it's an "older" machine (PIII 800MHz, 1GB RAM) running several other processes, I tried the following change in /etc/rc.d/rc.httpd:

Code:

case "$1" in
...
  'restart')
    #/usr/sbin/apachectl -k restart
    echo "Restarting httpd..."
    /usr/sbin/apachectl -k stop
    killall httpd
    rm -f /var/run/httpd/*.pid
    for s in 5 4 3 2 1 ; do sleep 1; echo -n "$s "; done
    /usr/sbin/apachectl -k start
    echo "done..."
  ;;
...
esac

Basically, I swapped the "restart" for a "stop" => "Wait for five seconds" => "start" sequence.
Restarting manually does not crash httpd any longer.

I know that this will leave my httpd server off-line for a few seconds, but this is no professional box, just a private server for the family (pictures etc.) and nobody accesses it at 04:40am when log rotation takes place.
But on a 24x7 production server this would be out of the question.

Any thoughts on this? Anyone having the same problems?

kbp · 12-26-2009, 07:34 AM

Hi,

httpd should only be sent a SIGHUP by logrotate, not a restart. Could you please compare your logrotate postrotate command to this:

Code:

/bin/kill -HUP `cat /var/run/httpd/httpd.pid 2>/dev/null` 2> /dev/null || true

<edit>
By the way.. this is an excerpt from my standard httpd init script:

Code:

# When stopping httpd a delay of >10 second is required before SIGKILLing the
# httpd parent; this gives enough time for the httpd parent to SIGKILL any
# errant children.
stop() {
	echo -n $"Stopping $prog: "
	killproc -p ${pidfile} -d 10 $httpd
	RETVAL=$?
	echo
	[ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile}
}

</edit>

cheers

niels.horn · 12-26-2009, 07:49 AM

@kbp: I *think* that this is how it's done in the original logrotate file from upstream (I've seen this on non-Slackware systems), but Slackware uses the "rc.httpd restart" command (a shell script) in the logrotate file, which calls "apachectl -k restart", which is also a shell script, that does a "httpd -k restart".

This should be the same as the "kill -HUP" sequence, but I tried and guess what: kill -HUP causes the same "seg fault or similar nasty error" message

Well, thanks for the suggestion anyway...

GooseYArd · 12-26-2009, 08:12 AM

Quote:

Originally Posted by niels.horn

@kbp: I *think* that this is how it's done in the original logrotate file from upstream (I've seen this on non-Slackware systems), but Slackware uses the "rc.httpd restart" command (a shell script) in the logrotate file, which calls "apachectl -k restart", which is also a shell script, that does a "httpd -k restart".

This should be the same as the "kill -HUP" sequence, but I tried and guess what: kill -HUP causes the same "seg fault or similar nasty error" message

Well, thanks for the suggestion anyway...

if it dies on a sighup, my guess is that it chokes while parsing httpd.conf. Are you using the stock httpd.conf, or have you added some additional loadmodule directives? If you've modified your httpd.conf, maybe post a diff, or the whole file.

btw is it producing a core file? That will simplify debugging.

.andy

niels.horn · 12-26-2009, 08:30 AM

@GooseYArd: It dies on a SIGHUP, but it stops and starts without problems if I wait a few seconds... If I run the stop & start commands immediately one after the other, http also crashes with that same error.

My httpd.conf is definitely different, but nothing extraordinary. This box runs test installations for ntop, nagios, cacti, zabbix, zarafa & some sites with pictures, files to download, etc. - all in PHP.
Nothing of "production" value, that's why I can use the 5-second interval at night, but this would be unacceptable on a "real" server.

The differences between a stock httpd.conf and mine:
- "Listen" to a non-standard port on one of the interfaces
- "ServerAdmin" with a real e-mail address
- "Options" Indexes disabled, left FollowSynLinks enabled
- "DirectoryIndex" index.php added
- "Include /etc/httpd/mod_php.conf" enabled
- One "Include" command for nagios (defining authentication etc. for the nagios directories)

If it is producing a core file, where would I find it?

GooseYArd · 12-26-2009, 08:44 AM

oh weird- I can think of a few random things that might cause that kind of behavior.

if it dumps core, it'll be in the working directory of the parent process. The best way I know to determine that is go into /proc/<parent pid> while its running, and poke around for a file called "cwd" (I think thats it). It'll show you the current directory that the process is in. Get it to die and just check for a file called "core" there. If you find one of those, set it aside and we can look at it later.

I guess the next thing I'd do is to try disabling the modules one at a time. Start with mod_php first, try the hup/restart, and see if thats the module causing it to choke. Once we know whether one of those modules is the culprit, we can try to find out why.

niels.horn · 12-26-2009, 09:18 AM

There is no core dump... (cwd=/)
Disabling php stops httpd from crashing, so the problem seems to be there.

I checked my php.ini and these are the changes compared to a "stock" version:
- "short_open_tag = On" (needed by Zarafa)
- "max_execution_time = 300" (needed by Zabbix)
- "extension=mapi.so" (needed by Zarafa)
- "date.timezone = America/Sao_Paulo"

The only really different thing is the mapi.so extension. So I disabled it and did some tests and indeed, httpd stops segfaulting without it.
Now I know what the cause is... It probably takes a bit longer to stop and causes the segfault if httpd receives the SIGHUP.

I'll start pestering upstream about this....

Thanks for helping me out!

niels.horn · 12-26-2009, 09:39 AM

UPDATE

I checked the Zarafa forum (now that I knew that the problem lies in mapi.so) and found this post.

They give a "workaround" that worked for me: enabling ssl (disabled by default in Slackware).
With mod_ssl enabled, httpd stopped crashing

I'll mark this thread as "solved".

Thanks for all the help!

GooseYArd · 12-26-2009, 09:56 AM

Quote:

Originally Posted by niels.horn

There is no core dump... (cwd=/)
Disabling php stops httpd from crashing, so the problem seems to be there.

I checked my php.ini and these are the changes compared to a "stock" version:
- "short_open_tag = On" (needed by Zarafa)
- "max_execution_time = 300" (needed by Zabbix)
- "extension=mapi.so" (needed by Zarafa)
- "date.timezone = America/Sao_Paulo"

The only really different thing is the mapi.so extension. So I disabled it and did some tests and indeed, httpd stops segfaulting without it.
Now I know what the cause is... It probably takes a bit longer to stop and causes the segfault if httpd receives the SIGHUP.

I'll start pestering upstream about this....

Thanks for helping me out!

aha- I think I can at least give you a hint about where to look. We saw this same issue a few years back, and the cause was that we had a mod_php that was statically linked with openssl, whereas apache mod_ssl was using the system shared openssl libs. At initialization time, the crash occurred, and we traced it down to an openssl symbol that was duplicated in one of the apache modules. It makes sense here, since MAPI is almost certainly linking with libssl.

I'd start out by running ldd against mod_php.so, mapi.so, and mod_ssl.so. If one of those is missing a dynamic link against libssl.so/libcrypto.so, one of them is probably statically linked with libssl, and just needs to be rebuilt to use the shared libs. The other less likely possibility is that you have a second set of shared openssl libs around someplace, and one of those modules is using the wrong set.

Good luck!

niels.horn · 12-26-2009, 10:50 AM

Well, I haven't found any missing libs... I tried ldd with the modules, but everything seems to be fine

Also did not find duplicate ssl libs on this system.

Well, enabling mod_ssl stopped the crashes and I posted in the thread on the Zarafa forum, where they say it is "hard to solve"...
I'll keep this on my radar for now.

GooseYArd · 12-26-2009, 10:56 AM

Quote:

Originally Posted by niels.horn

Well, enabling mod_ssl stopped the crashes and I posted in the thread on the Zarafa forum, where they say it is "hard to solve"...
I'll keep this on my radar for now.

welp, good luck with the hunt!

niels.horn · 12-26-2009, 10:59 AM

Thanks... I think I'll need it

I'll post updates here if / when I find out more.

rpedrica · 04-03-2011, 12:44 AM

Hi Neils, I've just run into this same issue however with Centos not Slackware, and this time it's a 3rd party module that kills httpd when rotating logs with 'service httpd reload'. I'm already running ssl so that wouldn't be my problem. Changing this to 'restart' in the logrotate httpd script, sorts the issue out.