LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Enterprise (https://www.linuxquestions.org/questions/linux-enterprise-47/)
-   -   File System Locked Up when NFS mount went offline (https://www.linuxquestions.org/questions/linux-enterprise-47/file-system-locked-up-when-nfs-mount-went-offline-665534/)

Ziggie 08-26-2008 10:31 AM

File System Locked Up when NFS mount went offline
 
Aight, first the basics:
Running RHEL 3 (2.4 Kernel), Dell Poweredge 2850, 4GB RAM, 2TB EMC RAID on ServerA.
Solaris Sun system on ServerB.

I do not control Server B.

ServerB publishes an NFS share that I have mounted on ServerA as /mnt/serverb.

ServerA has the 2TB raid mounted as /mnt/raid

On the raid is a samba shared directory we'll call public. Mounted at /mnt/raid/public

Inside this samba shared directory is a symlink to /mnt/serverb which is contained in /mnt/raid/public/serverb

Yesterday, serverb went into Single User Mode (as it tends to do from time to time for whatever reason) for approximately 45 minutes. 15 minutes into the outage, ServerA noticed that ServerB was missing a threw an error into the system log:
(ServerA kernel: nfs: server {ip address} not responding, still trying)

About this time, the directory /mnt/raid/public stopped responding. Completely. Any attempt to ls on the directory froze, with kill -9 commands being uneffective. Windows clients could not connect to the share at all (except an old Win98 box, but it couldn't list the directory contents). I even attempted to use a Java based file manager to look at the file system, but it locked up as well (webmin). Oddly enough, I could still tabcomplete in the directory with no problems.

No other shares were affected. Server utilization was no stranger than normal. Only this one share/directory seemed to be affected.

When serverb came back up, all of my frozen console sessions began responding again and the Windows clients could connect. The Samba logs show no errors during this time period.

So, the short question is this: Should a symlink be able to bring an entire directory to a standstill?
The longer question is: How do I keep the symlink but prevent being held hostage by serverb's erratic uptime.

Thanks in advance and sorry for the long explanation. I've never seen anything like this before.

--zigg

CRC123 08-26-2008 11:04 AM

Are you mounting the shares manually or automatically via /etc/fstab?

also look at the man 5 page of nfs:

Code:

man 5 nfs
That will bring up the fstab entries specific to nfs. Specifically look at the soft/hard mount options. NFS is default to hard which keeps requesting for the data indefinitely (hence your hang until server is back up). the 'soft' option can be given a maximum number of tries at which point it would return and error.

Good luck! I'm fairly confident the info you need is in that manual page ;)

Ziggie 08-26-2008 11:43 AM

Wow, a quick reply. Thank you!

Yes, we do use fstab to mount this at startup. The entry is here:

{IP Address}:/iu6/autoproc /mnt/serverb nfs suid,dev,exec 0 0


I have added soft to the fstab entry (you're right, I believe that will solve all the issues I saw).

Given the uptime of serverb (or lack thereof) I'm sure I'll know if it worked fairly soon.

Thanks!

(amazing-a simple RTFM moment after all)


All times are GMT -5. The time now is 01:25 PM.