Solaris / OpenSolarisThis forum is for the discussion of Solaris, OpenSolaris, OpenIndiana, and illumos.
General Sun, SunOS and Sparc related questions also go here. Any Solaris fork or distribution is welcome.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
We have a number of Solaris 10 servers that are paired i.e. 1 middleware server (MW) with 1 database server (DB). The MW server mounts 3 NFS shares exported by the DB server. We have 7 identical pairs of MW/DB servers. Although the databases have different data, 6 pairs are clones on the original 'Master' set.
All the servers are virtuals running under VMware.
What is happening is some (not all) of the MW servers are having a problem with the NFS shares that have been mounted
if you: df -h the command will hang
if you: umount the share then attempt to mount it again you get an error
I changed the NIC driver from E1000 to VMXNET3 and rebooted all 7 pairs of servers, now a different group of servers are experiencing the problem but not all of them.
I have tried using NFSv3 instead of NFSv4 but that lasted 2 days and has failed as well.
So back to square one...
Q) Does anyone know why this is happening?
Q) Does anyone know how to stop it from happening (and I don't mean the df -h hanging)
I have spent several fruitless days on Google researching this problem but none of the answers actually explain what the underlying fault is. Nor do they give an explanation as to how to fix the problem.
Yours very sincerely
Mr Frustrated
Ready To Have A Nervous Breakdown
What is the error you get when you attempt to mount it again?
Are the clients using DHCP?
What are the permissions on the share on the NFS server (showmount -e nfsserver)?
1. We don't use DHCP at all - static IP addresses only
2. The shares have been mounted for some days before the problem manifests itself. However, once the problem occurs and the share is umounted and an a re-mount is attempted we get:
NFS compound failed for server DB707: error 5 (RPC: Timed out)
3. The result of showmount:
export list for DB707:
/patch (everyone)
/stage (everyone)
/share (everyone)
The client server (MW707) mounts all 3 shares (/share, /patch and /stage) and all 3 can no longer be listed from the MW server (which includes doing a df -h).
I've already tried using NFSv3 and we still get the problem.
The problem with 'udp' is that it's not really suitable for use when read/write operations are in use - it doesn't report errors which can result in a currupt file. 'udp' is best used in read-only 'fire-and-forget' type transactions. However, whilst I was away a colleague tried using 'udp' and it failed as well.
The Sun NFS client has a robust error correction in its application layer (that is used on top of udp).
At least you should temporarily try it, in order to sort out a tcp problem.
I had a similar problem mounting shares from a virtual server. Turned out that time on virtual server was 2 seconds behind physical system. After correcting time on server (which also took some time, because it was a virtual one), I was able to mount them again.
Since the last post I have managed to download the rolled-up zip file of recommended patches for Solaris 10 1/13 u11 and applied to 1 set of servers. The fault seems to have disappeared have applied the patches. I will be rolling out the patches to the other servers over the next week or so. Hopefully, that should be the end of this NFS issue.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.