LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   lsof: WARNING: can't stat() nfs file system (https://www.linuxquestions.org/questions/linux-server-73/lsof-warning-cant-stat-nfs-file-system-4175618518/)

grzeslaw 11-28-2017 03:28 AM

lsof: WARNING: can't stat() nfs file system
 
Hi guys,

On some servers, I get NFS issues randomly. This thread is not to resolve those issues, but to help me, to get lsof command, which will not hang.

Whenever on affected system I am doing following command it hangs, as per example:
Code:

nagios@myhost:~$ sudo lsof -u appuser -b
lsof: avoiding readlink(/): -b was specified.
lsof: avoiding stat(/): -b was specified.
lsof: WARNING: can't stat() rootfs file system /
      Output information may be incomplete.
lsof: avoiding readlink(/sys): -b was specified.
lsof: avoiding stat(/sys): -b was specified.
lsof: WARNING: can't stat() sysfs file system /sys
      Output information may be incomplete.
lsof: avoiding readlink(/proc): -b was specified.
lsof: avoiding stat(/proc): -b was specified.
lsof: WARNING: can't stat() proc file system /proc
      Output information may be incomplete.
lsof: avoiding readlink(/dev): -b was specified.
lsof: avoiding stat(/dev): -b was specified.
lsof: WARNING: can't stat() devtmpfs file system /dev
      Output information may be incomplete.
lsof: avoiding readlink(/dev/pts): -b was specified.
lsof: avoiding stat(/dev/pts): -b was specified.
lsof: WARNING: can't stat() devpts file system /dev/pts
      Output information may be incomplete.
lsof: avoiding readlink(/dev/pts): -b was specified.
lsof: avoiding stat(/dev/pts): -b was specified.
lsof: WARNING: can't stat() devpts file system /dev/pts
      Output information may be incomplete.
lsof: avoiding readlink(/run): -b was specified.
lsof: avoiding stat(/run): -b was specified.
lsof: WARNING: can't stat() tmpfs file system /run
      Output information may be incomplete.
lsof: avoiding readlink(/): -b was specified.
lsof: avoiding stat(/): -b was specified.
lsof: WARNING: can't stat() ext4 file system /
      Output information may be incomplete.
lsof: avoiding readlink(/run/lock): -b was specified.
lsof: avoiding stat(/run/lock): -b was specified.
lsof: WARNING: can't stat() tmpfs file system /run/lock
      Output information may be incomplete.
lsof: avoiding readlink(/run/shm): -b was specified.
lsof: avoiding stat(/run/shm): -b was specified.
lsof: WARNING: can't stat() tmpfs file system /run/shm
      Output information may be incomplete.
lsof: avoiding readlink(/boot): -b was specified.
lsof: avoiding stat(/boot): -b was specified.
lsof: WARNING: can't stat() ext2 file system /boot
      Output information may be incomplete.
lsof: avoiding readlink(/home): -b was specified.
lsof: avoiding stat(/home): -b was specified.
lsof: WARNING: can't stat() ext4 file system /home
      Output information may be incomplete.
lsof: avoiding readlink(/tmp): -b was specified.
lsof: avoiding stat(/tmp): -b was specified.
lsof: WARNING: can't stat() ext4 file system /tmp
      Output information may be incomplete.
lsof: avoiding readlink(/usr): -b was specified.
lsof: avoiding stat(/usr): -b was specified.
lsof: WARNING: can't stat() ext4 file system /usr
      Output information may be incomplete.
lsof: avoiding readlink(/var): -b was specified.
lsof: avoiding stat(/var): -b was specified.
lsof: WARNING: can't stat() ext4 file system /var
      Output information may be incomplete.
lsof: avoiding readlink(/app-logs): -b was specified.
lsof: avoiding stat(/app-logs): -b was specified.
lsof: WARNING: can't stat() ext4 file system /app-logs
      Output information may be incomplete.
lsof: avoiding readlink(/var/log/app): -b was specified.
lsof: avoiding stat(/var/log/app): -b was specified.
lsof: WARNING: can't stat() ext4 file system /var/log/app
      Output information may be incomplete.
lsof: avoiding readlink(/opt): -b was specified.
lsof: avoiding stat(/opt): -b was specified.
lsof: WARNING: can't stat() ext4 file system /opt
      Output information may be incomplete.
lsof: avoiding readlink(/var/lib/nfs/rpc_pipefs): -b was specified.
lsof: avoiding stat(/var/lib/nfs/rpc_pipefs): -b was specified.
lsof: WARNING: can't stat() rpc_pipefs file system /var/lib/nfs/rpc_pipefs
      Output information may be incomplete.
lsof: avoiding readlink(/NFS_DATA): -b was specified.
lsof: avoiding stat(/NFS_DATA): -b was specified.
lsof: WARNING: can't stat() nfs file system /NFS_DATA
      Output information may be incomplete.

As you see I already use "-b" option which should skip those kernel functions that might block, but looks like its hanging anyway.

I need to run this command in my monitoring script, to list files open as user. On affected systems, when they get nfs issue, command hangs, and leave orphaned processes, what increase system load. So whenever issue is present, it might possible that in 1 day we get 288 orphaned processes, as plugin is executing every 5minutes.

Your help will be kindly appreciated!
Thanks,

MadeInGermany 12-02-2017 05:08 AM

I think you need to supervise lsof, and let the supervision thread kill -9 lsof after some time.
Do you have the timeout command?

grzeslaw 12-02-2017 05:47 AM

Thanks for your comment!

Yes, I've a timeout value, but it hangs forever.. Other thing is that I am running script as nagios user, so I can't kill it after some time because of user permissions. Ok, I can add kill to sudo for nagios, but this is not a solution, as it require sudoers modification on thousands of hosts, so I really prefer to do some workaround in the script which I am responsible for, to validate if NFS share is working properly, than if yes, start the lsof command, otherwise put error and exit0.

Interesting thing which I found, is that in kern.log I see messages, regarding NFS:
Code:

~# dmesg -T|tail -2
[Fri Dec  1 00:23:15 2017] nfs: server 10.10.4.30 not responding, still trying
[Fri Dec  1 02:37:32 2017] nfs: server 10.10.4.30 not responding, still trying

I get an idea to simply do "dmesg -T|tail 2|grep "not responding" and put it into my script, that way I can avoid doing lsof when there is an issue with NFS, but.. those alerts are regarding old Netapp. now we've a new one with different IP from months, so this is strange.. I think I need to find another way to check if nfs is have no issues..

MadeInGermany 12-03-2017 02:27 AM

I meant the timeout command
Code:

man timeout
Then you can run lsof with another timeout, for example
Code:

timout --signal=9 55s lsof ...

grzeslaw 12-03-2017 01:52 PM

Thanks!

Looks that this command works from CLI, what is great!
Sadly, when I put this code into script, it hangs forever ;/

Code:

nagios@host002:~$ timeout --signal=9 5s sudo lsof -u user1 2>/dev/null|wc -l
Killed
nagios@host002:~$ echo "timeout --signal=9 5s sudo lsof -u user1 2>/dev/null|wc -l" >test.sh
nagios@host002:~$
nagios@host002:~$ chmod +x test.sh
nagios@host002:~$
nagios@host002:~$
nagios@host002:~$ ./test.sh

^C
nagios@host002:~$


MadeInGermany 12-06-2017 12:34 PM

Wrong order: first sudo then timeout lsof!
Then the kill -9 is done with root rights.
Code:

sudo timeout --signal=9 5s lsof -u user1 2>/dev/null|wc -l

grzeslaw 12-14-2017 03:27 AM

None of those solution works, when server have NFS issue. lsof constantly hangs.
To resolve it, I wrote own lsof, basing on /proc/PID/(smaps|fd) variables.
Taking this into account, we could assume that issue is resolved.

MadeInGermany 12-14-2017 01:17 PM

Well done.
More and more often I face "featurism" that puts the base function at risk, and would like to write my own "simply works" programs...

grzeslaw 12-14-2017 01:53 PM

Sometimes you have no choice. This is not the first time, when was forced to code my own functions, its life.. But the good thing is that you know exactly what it does, and you can quickly implement some fix in case of other issues :)

PinoyUser 12-27-2017 09:27 AM

Use strace to find out where it is hanging
 
Try:
strace lsof
ps aux | grep <pid where it is hanging>


All times are GMT -5. The time now is 10:09 PM.