Centos 7 fully updated

rylan76 · 10-02-2017, 09:12 AM

Hi All

On my fully updated Centos 7 box (usually after being up for about three weeks) systemd will go to 100% CPU and not respond to any commands.

E. g. doing

Code:

systemctl status mysqld

(for example)

will timeout after about two minutes with "cannot connect to systemd: timeout" in the BASH command line.

Also trying

Code:

systemctl reboot

or

Code:

shutdown -r now

will also give the timeout error message.

There is no recourse other than to physically power-off the machine in an uncontrolled shutdown which is obviously dangerous for MySQL etc.

The machine exhibits behaviour where certain network services will die in this condition, notably SSH sessions are very slow to log in and sometimes timeout completely (which is why it HAS to be restarted.)

Anybody else experience this with Centos 7's systemd going 100% CPU and you are then incapable of shutting down or rebooting the system save physically pulling the power plug out of the wall socket?

Thx!

bimboleum · 10-02-2017, 12:40 PM

Hi,
Try "reboot -fn" ... this is a brute-force reboot so it may work when other efforts don't

I can't give you any advice on systemd .. I only use Centos as a reference for other projects at the job

cheers
pete

pete hilton

saruman@ruvolo-hilton.org

bimboleum · 10-02-2017, 01:00 PM

Hi,
Further reading indicates that reboot is a symlink to /bin/systemctl and the following

/bin/systemctl --force --force reboot

will brute-force a reboot without trying to kill any processes/services. With a bit of luck, this will obviate the need to pull the power plug

As always YMMV.

cheers
pete

pete hilton
saruman@ruvolo-hilton.org

MadeInGermany · 10-02-2017, 04:55 PM

Any frequent error messages?

Code:

journalctl -f

syg00 · 10-02-2017, 06:59 PM

I would be looking just prior to the problem for messages - CentOS should have persistent journals, so they can be checked at any time. Especially check for stack traces indicating a recovery loop.

rylan76 · 10-03-2017, 02:34 AM

Thx guys for the pointers I'll see what I can find.

journalctl -f

doesn't show any frequent errors (at the moment, I hard powered off the system yesterday as mentioned) - this always occurs after three or four weeks' worth of uptime.

As an aside, I'm already doing

Code:

/usr/bin/systemctl daemon-reexec

every ten minutes from a crontab.

Thx alot for the responses!

Stefan

syg00 · 10-03-2017, 02:45 AM

After a hard poweroff, from a terminal run "journalctl -b -1" - that's the journal for the previous boot. <Shift>-g will get you to the end of it (it's just less as a viewer). Start paging up and see what happened.

MadeInGermany · 10-04-2017, 06:29 AM

Do you clean /tmp or /var/tmp?
Be careful, systemd stores files in them!
The systemd-tmpfiles-clean service is safe: it has the appropriate excludes in /usr/lib/tmpfiles.d/tmp.conf

rylan76 · 10-06-2017, 01:41 AM

Quote:

Originally Posted by MadeInGermany

Do you clean /tmp or /var/tmp?
Be careful, systemd stores files in them!
The systemd-tmpfiles-clean service is safe: it has the appropriate excludes in /usr/lib/tmpfiles.d/tmp.conf

Hi

Thanks for the reply.

No, I'm not cleaning /tmp or /var/tmp, nor have I written my own scripts that touch them...

Regards

Stefan

rylan76 · 10-06-2017, 01:42 AM

Quote:

Originally Posted by syg00

After a hard poweroff, from a terminal run "journalctl -b -1" - that's the journal for the previous boot. <Shift>-g will get you to the end of it (it's just less as a viewer). Start paging up and see what happened.

Hi Syg00

Thanks for the reply!

I ran the above but got

Code:

# journalctl -b -1
Specifying boot ID has no effect, no persistent journal was found
#

Guess I'll have to figure out how to make the journal persistent first....

Regards

Stefan

syg00 · 10-06-2017, 03:25 AM

That can only mean the logs are copied over to /var/log to be consistent with how it was done historically. Check there.
I don't have a CentOS to check at present.