LinuxQuestions.org - High machine load when idle

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - High machine load when idle (https://www.linuxquestions.org/questions/linux-newbie-8/high-machine-load-when-idle-624831/)

High machine load when idle

Hi,

I have two Linux WS4 machines on Dell PE1950 that seems to be the load average is 3.00 and the number of jobs in the "D" state is 3. When I run ps aux | grep " D", I get the following output:

root 3693 0.0 0.0 3788 608 ? D 2007 0:14 df -h
root 27617 0.0 0.0 3788 612 ? D 2007 0:00 df -h
root 28433 0.0 0.0 3780 608 ? D 2007 0:00 df -h
root 23192 0.0 0.0 3796 608 ? D 2007 0:00 df -h
root 32255 0.0 0.0 3792 608 ? D Jan19 0:00 df -h
root 25120 0.0 0.0 3800 608 ? D Jan25 0:00 df -h
root 6591 0.0 0.0 1540 544 ? D Feb15 0:00 df -k
root 6980 0.0 0.0 3780 608 ? D Feb16 0:00 df -h
root 30379 0.0 0.0 1544 544 ? D Feb19 0:00 df -k
root 30390 0.0 0.0 1536 544 ? D Feb19 0:00 df -k
root 30793 0.0 0.0 3788 608 ? D Feb20 0:00 df -h

If I try to kill the process with kill -9 nothing, even kill -11 doesn't even work. Any clue as to what might be causing this as it seems that the only thing that I can do is reboot the machine. This is unexpected. High load average is due to either a task chewing a lot of CPU time or a task stuck in uninterruptible sleep.

Any help would be greatly appreciated.

Why do you have so many 'df' processes running? Are you calling these?

What does 'top' have to say about them?

What is your iowait percentage ???.
Looks like a looping script - check cron.

One way I have found to deal with runaway processes that cannot seem to be killed in an ordinary fashion, is doing a

Code:

gdb pid

(replace pid with the process id of the program), and then inside gdb giving the command

Code:

kill

Well, when I do top, I get the following;

16:01:17 up 187 days, 3:30, 1 user, load average: 3.00, 3.00, 3.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
adams pts/0 nyusge03 3:54pm 0.00s 0.06s 0.01s w

But then after running
ps -aux | grep D
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2321 0.0 0.0 11636 3396 ? D 2007 1:32 /usr/sbin/snmpd -s -l /dev/null -P /var/run/snmpd -C -c /etc/snmp/snmpd.conf
root 27823 0.0 0.0 3788 612 ? D 2007 0:14 df -h
root 11305 0.0 0.0 3784 612 ? D 2007 0:00 df -h

It seems to me that snmpd is doing a call out, but is not able to start. When trying to stop snmpd, it fails. Not sure why it would show so many df processes running and there is no script running in the crontab...What I had to do was reboot the machine in order to get the machine back to normal... We use a monitoring service called watch tower that uses snmpd walk to communicate with the machine... not sure if this may be a factor in this issue.

I am having a similar issue, caused by an nfs mount that broke.
I can umount, but still have the high cpu load

Well, its seems that if I do a kill RPCIOD, this seems to resolve the high CPU load without having to reboot the machine. Still trying to figure out why this would hapeen though.

thanks, I think "kill RPCIOD" will help me.
It seems this was caused because we mounted the nfs without "intr" or "soft" option. I am getting that from:
http://www.redhat.com/magazine/005ma...s/tips_tricks/