LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Linux - File descriptors exhausted, how to recover (https://www.linuxquestions.org/questions/linux-newbie-8/linux-file-descriptors-exhausted-how-to-recover-4175417070/)

shib4u 07-16-2012 09:25 PM

Linux - File descriptors exhausted, how to recover
 
I am logged in as "root" and after a while my CentOS box runs out of file descriptors. How do I find out who (user/process) is polluting my server and how to get the server back without rebooting? I am on a bash shell.

TB0ne 07-17-2012 08:58 AM

Quote:

Originally Posted by shib4u (Post 4730076)
I am logged in as "root" and after a while my CentOS box runs out of file descriptors. How do I find out who (user/process) is polluting my server and how to get the server back without rebooting? I am on a bash shell.

You don't say what version of CentOS you're running, how many users, what's running on the server, the environment it's in, etc., so there's no way we can even guess as to what's causing it.

A brief Google search turns up lots about increasing your file descriptors:
https://www.centos.org/modules/newbb...&viewmode=flat
http://prefetch.net/blog/index.php/2...linux-servers/

but that only addresses the initial problem, and doesn't identify the root cause. Unless you provide details, we can't help.

shib4u 07-25-2012 04:54 PM

CentOS 5.5. 64-bit. 4-5 users. It is a test box and guys just run test code. I use a bash shell.

So one of the guys runs a script which creates a lot of files and keeps them open so that the system runs out of file-descriptors. I am on another shell, still logged in as root. How can I recover the system?

chrism01 07-25-2012 07:23 PM

If he's opening too many files (and the default settings are pretty high, so it shouldn't be a problem or he's doing something odd/wrong), then he needs to close them or you can kill some of his processes, which will have the same effect.

Don't mess with the system settings; that's addressing the symptom(s), not the root cause.

shib4u 07-26-2012 10:15 AM

That is what my question is... how? All the file descriptors are exhausted by the system. You cannot run any commands (kill, lsof, ls, ps, cat, etc.) except the inbuilt shell commands. I first need to find out the offending user, the processes and then take action... since I can't run most of the commands, how to go about it?

chrism01 07-26-2012 07:39 PM

Taking this quote literally as accurate
Quote:

All the file descriptors are exhausted by the system. You cannot run any commands (kill, lsof, ls, ps, cat, etc.) except the inbuilt shell commands
then, quite honestly, I believe reboot is the only answer in this case.
HOWEVER, you might want to wait to see if one of the Mods has a better answer.

I can't think of a built-in cmd that would do what you need without using a file descriptor.
Why don't you ask your users, maybe the offender will know of a way of signalling his/her procs to die?

kauuttt 07-26-2012 09:59 PM

Quote:

How do I find out who (user/process) is polluting my server and how to get the server back without rebooting?
Please check /proc/<PID>/fd directory..
If some process is not closing their file descriptor properly, in the above directory you'll see a hell lot of fds!

TB0ne 07-27-2012 04:28 PM

Quote:

Originally Posted by kauuttt (Post 4739057)
Please check /proc/<PID>/fd directory..
If some process is not closing their file descriptor properly, in the above directory you'll see a hell lot of fds!

Did you read what the OP posted???
Quote:

Originally Posted by shib4u
You cannot run any commands (kill, lsof, ls, ps, cat, etc.) except the inbuilt shell commands

...so, if they can't run an ls or cat....HOW will they check that directory???

Quote:

Originally Posted by shib4u
So one of the guys runs a script which creates a lot of files and keeps them open so that the system runs out of file-descriptors. I am on another shell, still logged in as root. How can I recover the system?

You don't. You reboot the system, and go to the guy who wrote that script, and tell them to not run it again, or learn how to program correctly. To solve ANY problem, you need to identify the root cause. You have...now fix it.

sharadchhetri 07-27-2012 05:08 PM

Quote:

Originally Posted by TB0ne (Post 4739668)
Did you read what the OP posted???

...so, if they can't run an ls or cat....HOW will they check that directory???


You don't. You reboot the system, and go to the guy who wrote that script, and tell them to not run it again, or learn how to program correctly. To solve ANY problem, you need to identify the root cause. You have...now fix it.

I stuck with same situation.
Let us know if you are able to run . df -i command. Here you may find 100% inode in suspected partition.

I used find command to remove 20 days older file.
Quote:

find /path/to/files* -mtime +20 -exec rm {} \;
Here, if you know what is the extention of file or the starting name of file ,you can modify the command.
In my system it was .txt so I used the command. (beware: extention of file must be unique with file otherwise do not take risk. use starting name of file)

for eg.
Quote:

find /path/to/test* -mtime +20 -exec rm {} \;

PTrenholme 07-27-2012 06:59 PM

If you have busybox on your system, you could try using it as your shell. That might let you run (some) commands without needing additional descriptors.

You might find this tuning tip interesting.

chrism01 07-29-2012 05:18 PM

@sharadchhetri: inodes != file descriptors..
He basically can't run any cmds because all the useful ones require one or more file descriptors to be opened...

sharadchhetri 07-29-2012 10:53 PM

Quote:

Originally Posted by chrism01 (Post 4740823)
@sharadchhetri: inodes != file descriptors..
He basically can't run any cmds because all the useful ones require one or more file descriptors to be opened...



"So one of the guys runs a script which creates a lot of files and keeps them open so that the system runs out of file-descriptors. I am on another shell, still logged in as root. How can I recover the system?"

Shibu, are you only not able to run ls command in that directory or ls command in all directory?

shib4u 07-30-2012 03:32 AM

@sharadchhetri - No, 'ls' won't run.

rknichols 07-30-2012 09:08 AM

The only commands that have any chance of running are the shell builtins, and any of those that need to access a file or directory won't work either. If the PID of the offending process were known, you could use the kill command (a shell builtin) to terminate the process, but I don't know of any way to find that PID without being able to access /proc, and you'd need a file descriptor to do that.

PTrenholme 07-30-2012 06:25 PM

Perhaps the OPs problem is as simple as changing the ulimit -n value for his shell (if his distribution permits it). From the manual page:
Code:

      ulimit [-HSTabcdefilmnpqrstuvx [limit]]
              Provides  control  over the resources available to the shell and
              to processes started by it, on systems that allow such  control.
              The -H and -S options specify that the hard or soft limit is set
              for the given resource.  A hard limit cannot be increased  by  a
              non-root  user  once it is set; a soft limit may be increased up
              to the value of the hard limit.  If neither -H nor -S is  speci‐
              fied, both the soft and hard limits are set.  The value of limit
              can be a number in the unit specified for the resource or one of
              the special values hard, soft, or unlimited, which stand for the
              current hard limit,  the  current  soft  limit,  and  no  limit,
              respectively.  If  limit  is  omitted, the current value of the
              soft limit of the resource is printed, unless the -H  option  is
              given.  When more than one resource is specified, the limit name
              and unit are printed before the value.  Other options are inter‐
              preted as follows:
              -a    All current limits are reported
              -b    The maximum socket buffer size
              -c    The maximum size of core files created
              -d    The maximum size of a process's data segment
              -e    The maximum scheduling priority ("nice")
              -f    The  maximum  size  of files written by the shell and its
                    children
              -i    The maximum number of pending signals
              -l    The maximum size that may be locked into memory
              -m    The maximum resident set size (many systems do not  honor
                    this limit)
              -n    The maximum number of open file descriptors (most systems
                    do not allow this value to be set)

Note that it's implied that this is a "per-process" limit, so a simple ctrl-alt-f2 to start a new tty session might be all that's needed. (Provided that the OP is not in an X-session with VTSwitch turned off, which is the default setting in newer Xorg releases.)


All times are GMT -5. The time now is 02:50 AM.