LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 11-14-2023, 11:29 AM   #46
metaed
Member
 
Registered: Apr 2022
Location: US
Distribution: Slackware64 15.0
Posts: 366

Rep: Reputation: 171Reputation: 171

Quote:
Originally Posted by MadeInGermany View Post
the main pid should have the sum of the threads
Maybe that's kernel-dependent. The Linux 5.15 kernel in Slackware 15.0 does not log a sum of threads to the process accounting file. If that's what you meant.

According to man 5 acct, Linux kernels before 2.6.10 logged threads, but not any more. Which is a shame.
 
Old 11-15-2023, 03:14 AM   #47
elgholm
LQ Newbie
 
Registered: Oct 2023
Posts: 24

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by pan64 View Post
see post #38, pidstat has a -t flag
Oh, yes, I use that... => pidstat -p ALL -turdwhl 1 300
I just don't see anything that weird in the output. But there's a LOT of processes and threads... So...
Need some way to flush out what the f is happening...
For some reason my pidstat log stopped after 3 min and 15 sec tonight, which is weird since it should've gone for 5 minutes.
And the extremely high load avg of course happened after that.

03:33:15 up 213 days, 7:20, 0 users, load average: 1.36, 0.81, 0.49
03:33:16 up 213 days, 7:20, 0 users, load average: 22.23, 5.15, 1.89
03:33:17 up 213 days, 7:20, 0 users, load average: 22.23, 5.15, 1.89
...
03:33:56 up 213 days, 7:21, 0 users, load average: 64.38, 15.51, 5.38
...
03:34:00 up 213 days, 7:21, 0 users, load average: 64.38, 15.51, 5.38
03:34:01 up 213 days, 7:21, 0 users, load average: 243.73, 53.51, 17.72
03:34:02 up 213 days, 7:21, 0 users, load average: 243.73, 53.51, 17.72
03:34:03 up 213 days, 7:21, 0 users, load average: 243.73, 53.51, 17.72
...

My accton just yield "ERROR: unknown acct file format" when trying to show it with the sa command, or dump-acct.
 
Old 11-22-2023, 04:24 AM   #48
elgholm
LQ Newbie
 
Registered: Oct 2023
Posts: 24

Original Poster
Rep: Reputation: 0
I'm giving up on this... It's a shame!

There's something spawning around 3k threads each night, which pushes the load average through the roof - and makes monitoring the server for true problems much harder.
03:34:20 up 220 days, 7:21, 0 users, load average: 0.95, 0.37, 0.19
03:34:21 up 220 days, 7:21, 0 users, load average: 66.38, 13.94, 4.58

I've tried dumping pidstat*, with thread-information, and also ps** information with focus on threads, but I can't seem to manage to catch the culprit when it's doing its thing. I've dumped the information each second, to try to catch it in the act - but to no luck. =(
I just can't find information on what process it is that is causing this. =(

I'm thinking of installing a new server, and moving our production environment over to that one, and sunsetting this one.
It's a shame, since this one was supposed to be the new one.
(And moving our largest production environment is not plug'n'play...)

Unfortunately we can't have a server doing stuff in the middle of the night that we don't know about or understand.
So far so good, but this could easily create issues down the road.

* pidstat -p ALL -turdwhl 1 600
** ps axo nlwp,pid,cmd | sort -rn | head -10

We're running 5.4.17 on a RHEL clone. So much for "enterprise" kernels....
I'd love to go back to Debian, but since we're running Oracle RDBMS that's not supported.

Thank you all for trying to point me in the right direction, and I've learned some new flags for the standard Linux utilities which will help me in the future! <3

/Charlie
 
Old 11-22-2023, 04:46 AM   #49
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,925

Rep: Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320
if you don't know what it is you can't avoid it on your next system too. Probably won't happen again, but probably it belongs to your production environment.
Anyway, I don't know why is it that important at all.
 
Old 11-22-2023, 02:12 PM   #50
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,806

Rep: Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207
Quote:
For some reason my pidstat log stopped after 3 min and 15 sec tonight, which is weird since it should've gone for 5 minutes.
Perhaps this is related!?

What happens at night?
Does the system time jump backward? This would cause all threads to be overdue, and load would go high.
Is it a virtual system? Then examine the host.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need Help with High Load Average & High Sleeping Task js9028 Linux - Server 3 07-20-2019 08:55 AM
[SOLVED] tesseract-4 (pdfsandwich) and high load average/CPU load kaz2100 Linux - Software 2 08-13-2018 09:02 PM
[SOLVED] Redshift transition from day-night extremely slow Lysander666 Slackware 8 08-05-2018 12:36 PM
Load average stay as high as around 1.00 lawrence_lee_lee Linux - Software 2 09-10-2008 01:22 AM
CPU high load every night invent Linux - Server 2 11-22-2007 10:36 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 04:46 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration