SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Is it possible to limit number of /sbin/sh instances???
I recently was forced to do a power button reboot on my server, basically, I couldnt ssh in with any username other than root and noticed that when I ran ps -A there were 100's of "sh" entries in there.
Also noticed my messages log shown multiple failed password attempts for root and the other usual suspects.
so basically my mahicne had several hundred processes running and cpu usage was through the roof so I'm kind of thinking is there a way to limit the number of sh instances that can be opened and this might possibly prevent this going forward.
If you limit the number of sh processes you'd still have the same problem because at some point you wouldn't be able to login as your login opens a shell.
It sounds to me almost as if someone is existing the system improperly and leaving sh processes running. You can kill those with a "kill -1 <pid>" but I'd try to track down owners of the processes and find out how they're exiting the system. My guess is they're just closing windows or turning off workstations.
I think you might be interested in the "ulimit" section of "man bash", maybe the -u option.
Quote:
ulimit [-HSTabcdefilmnpqrstuvx [limit]]
Provides control over the resources available to the shell and to processes started by it, on systems that allow such control. The -H and -S options specify that the hard or soft limit is set for the given resource. A hard limit cannot be increased by a non-root user once it is set; a soft limit may be increased up to the value of the hard limit. If neither -H nor -S is specified, both the soft and hard limits are set. The value of limit can be a number in the unit specified for the resource or one of the special values hard, soft, or unlimited, which stand for the current hard limit, the current soft limit, and no limit, respectively. If limit is omitted, the current value of the soft limit of the resource is printed, unless the -H option is given.
When more than one resource is specified, the limit name and unit are printed before the value. Other options are interpreted as follows:
-a All current limits are reported
-b The maximum socket buffer size
-c The maximum size of core files created
-d The maximum size of a process's data segment
-e The maximum scheduling priority ("nice")
-f The maximum size of files written by the shell and its children
-i The maximum number of pending signals
-l The maximum size that may be locked into memory
-m The maximum resident set size (many systems do not honor this limit)
-n The maximum number of open file descriptors (most systems do not allow this value to be set)
-p The pipe size in 512-byte blocks (this may not be set)
-q The maximum number of bytes in POSIX message queues
-r The maximum real-time scheduling priority
-s The maximum stack size
-t The maximum amount of cpu time in seconds
-u The maximum number of processes available to a single user
-v The maximum amount of virtual memory available to the shell and, on some systems, to its children
-x The maximum number of file locks
-T The maximum number of threads
and you really should have a look also at "man initscript" (reference).
A few times now, I've had to reboot my server as a result of 100's of /bin/sh processes spawning, not entirely sure the cause of this yet but regardless, i'm having problems killing these processes.
using kill -9 PID or killall sh doesnt seem to remove any of them.
Now I'm assuming I cant kill the init process, so without actually rebooting the machine, are there any other options open to me, until of course I find out why so many are spawning.
Cant change to runlevel 1 or run the init 6, I'm forced to do a power off/on to get the server back to normal.
@OP: I've merged your "Is it possible to limit number of /sbin/sh instances???" thread with this recent one as it is the same topic. Also note that you never responded to replies in that thread. If you did you might have solved or mitigated the problem over a month ago. Next to that it shouldn't just be one way traffic and any usable replies should warrant a response from you.
Sorry, I couldnt find that post actually, sorry and thanks for the heads up.
Update:
I had a cron script that run every 5 minutes, basically I'm now thinking this may have been the cause of the problem and if so, just wont bother using it. I was using it to log ADSL connection drops.
With regards to checking the logs, all logs after my logrotate were empty, shouldnt normally be the case I know.
Boring bit, I didnt think as was posted above that limiting the number of sh instances running is actually what I was looking for which might have contributed to me not replying to the original post.
My cry for help now is that when this happened again, I was unable to kill the offending processes and thats why I posted again.
Sorry for any hassle and thanks for all interest and posts.
Also noticed my messages log shown multiple failed password attempts for root and the other usual suspects.
Are you not worried about this ? Looks to me as if someone was trying to hack into your box. Perhaps consider using a program like 'denyhosts' or 'fail2ban'.
Are you not worried about this ? Looks to me as if someone was trying to hack into your box. Perhaps consider using a program like 'denyhosts' or 'fail2ban'.
If you look at his process table you'll see he already does.
I had a cron script that run every 5 minutes, basically I'm now thinking this may have been the cause of the problem
If that is the case then deleting the cron job should show the load go down after the processes die or get killed or after the box is rebooted. And there's no need to "think": MensaWater is right about tracking down the UID of the processes first for analysis. If you can't do that manually then run Atop, collectl or dstat to gather system stats automagically.
Quote:
Originally Posted by plisken
With regards to checking the logs, all logs after my logrotate were empty, shouldnt normally be the case I know.
That is odd. Has this happened before?
Do you run a standard logrotate configuration?
Are all logs empty including all rotated ones?
Does your syslog, cron or any other daemon log show any anomalies around the time of the log rotation?
Are there any login (attempts) during or prior to this?
Quote:
Originally Posted by plisken
I was unable to kill the offending processes
See Ponce's advice: if the processes ran as root then see if the cron job can be run from an unprivileged account and apply a process limit.
A few remarks if I may in random order:
Quote:
Originally Posted by plisken
Code:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 403 0.0 0.2 3212 1052 ? S May14 0:33 /usr/sbin/sshd
root 29247 0.0 0.3 5892 1840 ? S 10:32 0:00 sshd: root@pts/1
root 29266 0.0 0.2 2304 1316 pts/1 S 10:32 0:00 -bash
root 29596 0.0 0.1 2948 1004 pts/1 R 10:37 0:00 ps -ux
root 29597 0.0 0.1 2764 872 pts/1 S 10:37 0:00 mail ***.*******@*******.com
- 'sshd' doesn't show the "[priv]" tag on your login process Id 29247 and I don't know if that's due to 0) your distros (which one?) implementation of 'ps' (unlikely), 1) ps output doctored by you (please confirm), 2) your distros implementation of 'sshd' (unlikely), or 3) you running OpenSSH without privilege separation. In case of the latter please verify your binaries integrity and correct it as it shouldn't be configured to run without.
- you seem to be logging in as root user. That is not a security best practice regardless of any seemingly mitigating arguments. Do use an unprivileged user account with pubkey auth to log in with.
Quote:
Originally Posted by plisken
Code:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 581 0.0 0.3 9184 1968 ? S May14 0:14 /usr/bin/perl /usr/local/webmin/miniserv.pl /etc/webmin/miniserv.conf
Please ensure your Webmin installation is current, access is restricted to "known good" IP (ranges?) and preferably over SSL.
Quote:
Originally Posted by plisken
Code:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 761 0.0 0.0 1368 416 tty1 S May14 0:00 [agetty]
root 762 0.0 0.0 1368 416 tty2 S May14 0:00
root 763 0.0 0.0 1368 416 tty3 S May14 0:00
root 764 0.0 0.0 1368 416 tty4 S May14 0:00 :?? @q??) 0u?? p???
root 765 0.0 0.0 1368 416 tty5 S May14 0:00 ? ???
root 766 0.0 0.0 1368 416 tty6 S May14 0:00
I don't know what to make of this but it seems odd.
Quote:
Originally Posted by plisken
Code:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 415 0.0 0.2 4836 1380 ? S May14 0:39 sendmail: rejecting connections on daemon MSA: load average: 266
root 6517 0.0 0.4 5772 2520 ? S May19 0:00 sendmail: ./q4JHR1EF006516 from queue
root 6520 0.0 0.4 5772 2516 ? S May19 0:00 sendmail: ./q4JHTTEF006519 from queue
root 6526 0.0 0.4 5776 2516 ? S May19 0:00 sendmail: ./q4JHUeEF006525 from queue
root 6531 0.0 0.4 5776 2516 ? S May19 0:00 sendmail: ./q4JHXLEF006530 from queue
root 6541 0.0 0.4 5776 2516 ? S May19 0:00 sendmail: ./q4JHcBEF006540 from queue
root 6554 0.0 0.4 5776 2516 ? S May19 0:00 sendmail: ./q4JHiYEF006553 from queue
root 6560 0.0 0.4 5776 2512 ? S May19 0:00 sendmail: ./q4JHk9EF006559 from queue
root 6599 0.0 0.4 5776 2516 ? S May19 0:00 sendmail: ./q4JHoWEF006598 from queue
root 6602 0.0 0.4 5776 2516 ? S May19 0:00 sendmail: ./q4JHovEF006601 from queue
root 6608 0.0 0.4 5776 2524 ? S May19 0:00 sendmail: ./q4JHshEF006607 from queue
root 6614 0.0 0.4 5776 2512 ? S May19 0:00 sendmail: ./q4JHw1EF006613 from queue
root 6627 0.0 0.4 5780 2532 ? S May19 0:00 sendmail: ./q4JI2ZEF006626 from queue
root 6637 0.0 0.4 5776 2516 ? S May19 0:00 sendmail: ./q4JI6HEF006635 from queue
root 6640 0.0 0.4 5772 2508 ? S May19 0:00 sendmail: ./q4JI6YEF006639 from queue
root 6644 0.0 0.4 5776 2520 ? S May19 0:00 sendmail: ./q4JI74EF006643 from queue
root 6659 0.0 0.4 5776 2524 ? S May19 0:00 sendmail: ./q4JICQEF006658 from queue
root 6662 0.0 0.4 5776 2520 ? S May19 0:00 sendmail: ./q4JICfEF006661 from queue
root 6665 0.0 0.4 5776 2516 ? S May19 0:00 sendmail: ./q4JICrEF006664 from queue
root 6672 0.0 0.4 5776 2504 ? S May19 0:00 sendmail: ./q4JIDSEF006671 from queue
Apart from a load of 266 being ludicrous please check the mail spool for clues why these messages aren't sent.
If that is the case then deleting the cron job should show the load go down after the processes die or get killed or after the box is rebooted.
killing the crond I thought would have done this but it along with certain other processes wouldnt kill, but aye, rebooting returns to normal.
Quote:
That is odd. Has this happened before?
Do you run a standard logrotate configuration?
Are all logs empty including all rotated ones?
Does your syslog, cron or any other daemon log show any anomalies around the time of the log rotation?
Are there any login (attempts) during or prior to this?
Never noticed it before, my logrotate is pretty much standard, a few extra entries in there for things but nothing has been changed in this for years, with the exception of adding my fail2ban entry.
From what I remember, only the secure/messages/maillog entries were empty, rotated ones were populated as expected.
Additionally, I've killed webmin for the time being.
I could only login as root, all other users simply hung after password entry from console.
As always all comments are appreciated and I'm looking into the other points mentioned and when/if this happens again, I'll try and better gather the information to answer the questions I've been asked but as yet not been able to answer.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.