Extremely high load average around 03:30 (AM) each night
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
They tell me nothing of interest... I temporarily removed the logging, to try other things (and didn't want that inflicting on the loadavg), but I'll put them back now and send you a list tomorrow!
This was uptime from tonight:
...
03:33:08 up 205 days, 7:20, 0 users, load average: 0.05, 0.08, 0.09
03:33:09 up 205 days, 7:20, 0 users, load average: 0.05, 0.08, 0.09
03:33:10 up 205 days, 7:20, 0 users, load average: 0.05, 0.08, 0.09
03:33:11 up 205 days, 7:20, 0 users, load average: 0.05, 0.08, 0.09
03:33:12 up 205 days, 7:20, 0 users, load average: 93.58, 19.47, 6.36
03:33:13 up 205 days, 7:20, 0 users, load average: 93.58, 19.47, 6.36
03:33:14 up 205 days, 7:20, 0 users, load average: 93.58, 19.47, 6.36
03:33:15 up 205 days, 7:20, 0 users, load average: 93.58, 19.47, 6.36
03:33:16 up 205 days, 7:20, 0 users, load average: 93.58, 19.47, 6.36
...
you can run a ps too to list processes and compare them before/after
I've done so, many times, I even have a script that starts automatically as soon as I get high loadavg and spits out a bunch of ps, top, iotop, iostat stuff... Nothing looks out of the ordinary.. It's an extreme spike for a very short period of time it seems... But, yeah, it _didn't_ happen for the last two days... Don't ask me why... =)
No. There already is some standard process accounting installed - default - so I can run a lot of this commands and get output. But I only see accumulated information, like loadavg and such - not which processes are involved. This I think is a little bit weird, since I can see in my top output that linux spawns a bunch of process-gathering processes (python scripts and various other stuff) when it believes loadavg is too high. This is done per default, I haven't installed or set this up. But for the love of me I can't figure out which command to run, or where to look, to actually see the process information detail gathered by these utilites - but I think it should be there somewhere. I don't really see why the accounting processes should spawn otherwise.
I'll try and look through your link, and see if that pushes me in the right direction. This is a production server, so I rather not install a bunch of new stuff on it - or restart the machine (heavens no!)...
PS. Fun fact, the link doesn't work in Chrome - can't scroll page, javasript errors (probably because of errors in the cookie-dialogue) - but works in Firefox.
Then check your Chrome settings. It works with mine.
Display everything in one pidstat:
Code:
pidstat -urdwhl 1 400
-h combine to one line per process
-l long command (args)
If this is a container and you don't see relevant things then run it on the container's host.
If on the host you still do not see any high per-process values then include the kernel tasks:
Code:
pidstat -p ALL -urdwhl 2 400
Last edited by MadeInGermany; 11-08-2023 at 04:30 AM.
hm. I don't know if load average counts the threads (LWPs) or processes. Also I don't know how pidstat works with threads (try -t -v). Probably it is just a single multithreaded application.
This is a production server, so I rather not install a bunch of new stuff on it - or restart the machine (heavens no!)...
I don't see where restarting was suggested, but a server which cannot go down for maintenance is a disaster waiting to happen.
Anything important enough that it must stay online is important enough to have sufficient redundancy such that any single server can be swapped out of a pool for a while without causing issues.
Then check your Chrome settings. It works with mine.
Display everything in one pidstat:
Code:
pidstat -urdwhl 1 400
-h combine to one line per process
-l long command (args)
If this is a container and you don't see relevant things then run it on the container's host.
If on the host you still do not see any high per-process values then include the kernel tasks:
There already is some standard process accounting installed - default - so I can run a lot of this commands and get output. But I only see accumulated information, like loadavg and such - not which processes are involved. This I think is a little bit weird, since I can see in my top output that linux spawns a bunch of process-gathering processes (python scripts and various other stuff) when it believes loadavg is too high. This is done per default, I haven't installed or set this up. But for the love of me I can't figure out which command to run, or where to look, to actually see the process information detail gathered by these utilites - but I think it should be there somewhere. I don't really see why the accounting processes should spawn otherwise.
Okay this is my fault. You said earlier you were already getting summary statistics from process accounting, and therefore I assumed you had turned it on. Now I'm pretty sure you are unfamiliar with process accounting, so never turned it on. It is very common for a Linux distro to come with process accounting support but not actually start it at boot time. The distro I run, Slackware, checks for the existence of the log file at boot time, and when it exists, starts process accounting. But if the log file doesn't exist, it doesn't bother.
You need the accton command. This tells the kernel to start (or stop) writing every process termination to a file. For usage, check man 8 accton.
Before running accton, /var/log/pacct should be created if it doesn't exist. For security reasons, /var/log/pacct should not be world readable. You can touch /var/log/pacct and then chmod 640 /var/log/pacct.
The startup command is typically: accton /var/log/pacct. And when you're done collecting data, use accton off to stop process accounting so you don't fill up your drive later.
Okay this is my fault. You said earlier you were already getting summary statistics from process accounting, and therefore I assumed you had turned it on. Now I'm pretty sure you are unfamiliar with process accounting, so never turned it on. It is very common for a Linux distro to come with process accounting support but not actually start it at boot time. The distro I run, Slackware, checks for the existence of the log file at boot time, and when it exists, starts process accounting. But if the log file doesn't exist, it doesn't bother.
You need the accton command. This tells the kernel to start (or stop) writing every process termination to a file. For usage, check man 8 accton.
Before running accton, /var/log/pacct should be created if it doesn't exist. For security reasons, /var/log/pacct should not be world readable. You can touch /var/log/pacct and then chmod 640 /var/log/pacct.
The startup command is typically: accton /var/log/pacct. And when you're done collecting data, use accton off to stop process accounting so you don't fill up your drive later.
Perfect! Thank you so much! Will look into this on monday!
I'm sorry if I made you confused, what _is_ installed is some kind of monitoring program - since I see it waking up when shit happens. But it's most probably not the process accounting stuff you write about above, instead it's some sort of process information gathering utilities. They've been enabled by the default installation (RHEL clone).
There's some process that creates en extreme amount of threads.
So, as I've already suspected, there's not really high CPU usage. Instead there's around 3k threads starting up, probably trampling each others toes, which spikes the load average. I'm now gonna try to run a ps command to show me amount of threads per process.
So far I've been trying to find high CPU usage and/or high i/o (blocking), dead processes/zombies, but haven't found anything yet.
Too bad accton only logs processes, not thread creation.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.