LinuxQuestions.org - kill -9 not working

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - kill -9 not working (https://www.linuxquestions.org/questions/linux-newbie-8/kill-9-not-working-712488/)

mdarwin

03-18-2009 07:59 AM

kill -9 not working

Hi guys,

I'm a newbie to Linux (but not POSIX), and I have the following problem:

There seem to be 32 instances of the "df" command running which have been hanging for nearly a month in some cases.
[root@ussd-apps2 root]# ps -ef |grep " df" |wc -l
33

I tried killing them all:
for proc in `ps -ef |grep " df" |grep -v grep |awk '{print $2}'`; do echo "killing ${proc}"; kill ${proc}; done

But it simply didn't work. No error message, and the processes are all still there.

I tried killing them individually:
root 14800 14397 0 14:49 pts/11 00:00:00 df -kh
root 14896 14397 0 15:10 pts/11 00:00:00 grep df
[root@ussd-apps2 root]# kill -9 14800
[root@ussd-apps2 root]# ps -ef |grep 14800
root 14800 14397 0 14:49 pts/11 00:00:00 df -kh
root 14898 14397 0 15:11 pts/11 00:00:00 grep 14800

I'm probably just being thick - I'm sure it's something simple I'm missing.

Can anyone help?

More info in case you need it:

[root@ussd-apps2 root]# uname -a
Linux ussd-apps2 2.4.21-9.ELsmp #1 SMP Thu Jan 8 17:08:56 EST 2004 i686 i686 i386 GNU/Linux

Matt

kpraveen455

03-18-2009 08:45 AM

Hi,

I don't understand why you are following such a lengthy process to kill your applications

You can use 'pkill' command to kill all your `df' applications

pkill -09 <application name>

Example: pkill -09 df

or Else kill all the processes with command "killall" (see more for man page)
killall <signal name> <process name>

Example: killall -s KILL df

mdarwin

03-18-2009 09:07 AM

Hi there kpraveen,

Thanks for your help.

Neither of the commands seem to work:

[root@ussd-apps2 root]# pkill -9 df
[root@ussd-apps2 root]# killall -s KILL df
[root@ussd-apps2 root]# ps -ef |grep df |head
root 18413 18311 0 Feb23 ? 00:00:00 df -kh
root 18803 18802 0 Feb24 ? 00:00:00 df -h
cvs 19647 1 0 Feb24 ? 00:00:00 df -kh
root 20611 20610 0 Feb25 ? 00:00:00 df -h
root 21246 21245 0 Feb26 ? 00:00:00 df -h
root 21969 21968 0 Feb27 ? 00:00:00 df -h

Any thoughts?

Matt

openSauce

03-18-2009 09:27 AM

Could they be zombie processes?

http://www.linux-mag.com/id/5707

Quote:

Even an “uncatchable” signal may not terminate a process. For instance, a zombie is the remnant of a process that’s waiting to exit. (A zombie has Z in the ps STAT column.) A process that’s being traced can also be unkillable.

Unfortunately that article doesn't tell you what you can do about zombies, if anything.

bitpicker

03-18-2009 09:40 AM

root 18413 18311 0 Feb23 ? 00:00:00 df -kh
root 18803 18802 0 Feb24 ? 00:00:00 df -h
cvs 19647 1 0 Feb24 ? 00:00:00 df -kh
root 20611 20610 0 Feb25 ? 00:00:00 df -h
root 21246 21245 0 Feb26 ? 00:00:00 df -h
root 21969 21968 0 Feb27 ? 00:00:00 df -h

You should find out what those parent processes are (like 21968 in the case of the last one). These processes are not waiting for their spawned df processes.

Maybe it helps to visualize what is going on if you install and use the program htop and use it to display the process structure as a tree. It will update the situation regularly and you can see what those processes spawning all those instances of df are. Normally df is too fast to even learn its process number, it just prints its results then exits. I suppose there is a badly written script or something which spawns these zombie processes.

Robin

mdarwin

03-18-2009 09:46 AM

Ok so now I have a bit more info:

Code:

[root@ussd-apps2 root]# ps -axl |grep " df" |head

F  UID  PID  PPID PRI  NI  VSZ  RSS WCHAN  STAT TTY        TIME COMMAND

0    0 18413 18311  15  0  3580  508 end    D    ?          0:00 df -kh

0    0 18803 18802  25  0  3772  456 end    D    ?          0:00 df -h

0  506 19647    1  15  0  3580  508 end    D    ?          0:00 df -kh

0    0 20611 20610  25  0  3792  456 end    D    ?          0:00 df -h

0    0 21246 21245  25  0  3780  456 end    D    ?          0:00 df -h

0    0 21969 21968  25  0  3772  456 end    D    ?          0:00 df -h

0    0 25745 25744  25  0  3796  456 end    D    ?          0:00 df -h

0    0 26112 26111  25  0  3772  456 end    D    ?          0:00 df -h

0    0  1036  1035  25  0  3784  456 end    D    ?          0:00 df -h

0    0  3502  3501  25  0  3784  456 end    D    ?          0:00 df -h

From the same site (http://www.slackbook.org/html/process-control-ps.html)

Quote:

D stands for a process that has entered an uninterruptible sleep. Often, these processes refuse to die even when passed a SIGKILL.

Ok so what now?

I was thinking that it may have something to do with nfs mounts. When I run the df command I do get some output - it hangs when it gets to the nfs mount.

[root@ussd-apps2 root]# df -kh &
[1] 15292
[root@ussd-apps2 root]# Filesystem Size Used Avail Use% Mounted on
/dev/cciss/c0d0p2 99G 64G 31G 68% /
/dev/cciss/c0d0p1 97M 15M 77M 17% /boot
none 881M 0 881M 0% /dev/shm

So maybe I could try turning off the mountd or something?

mdarwin

03-18-2009 09:49 AM

Hi bitpicker,

There is no script spawning these processes - it's curious people like me who want to find out what the disk usage is and run df -kh. So even if I can kill these processes I'll still create a new one every time I run df -kh. I suppose I should try and find out the reason the df command is hanging - as per my last post, I think it's something to do with nfs....

bitpicker

03-18-2009 09:56 AM

Have you started all those df instances yourself or is there an automatism? All those ppids look as if they got created just to spawn a df instance. There must be a reason why df cannot read the nfs mount, so there's something wrong there.

Robin

bitpicker

03-18-2009 09:57 AM

Never mind the previous post, I wrote it while you were replying.

Robin

mdarwin

03-18-2009 10:06 AM

Ok I found the problem and fixed it:

Code:

[root@ussd-apps2 root]# grep nfs /etc/fstab 

10.113.97.24:/var/SP    /nfs            nfs    rsize=8192,wsize=8192,timeo=14,intr

[root@ussd-apps2 root]# umount -f /nfs

umount2: Device or resource busy

umount: /nfs: device is busy

[root@ussd-apps2 root]#

[root@ussd-apps2 root]# ps -axl |grep " df" |wc -l

      1

Looks like the nfs mount was hanging.

I found some useful info here:
http://linuxgazette.net/issue83/tag/6.html:

Quote:

If the NFS server (or the network connection thereto) becomes unavailable all processes that try to access any part of that share will be set into D state. (Use intr or soft mount options on NFS to avoid all that).

Thanks for your help and note the solution for future reference!

AlucardZero

03-18-2009 11:38 AM

Yep, a little more info, state "D" is uninterruptible sleep. In that state a program is off in a driver call and cannot be interrupted even by kill -9. Fixing or removing the NFS mount, or mounting with different options, solves that as you saw.

All times are GMT -5. The time now is 12:48 PM.