LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Other *NIX Forums > Solaris / OpenSolaris
User Name
Password
Solaris / OpenSolaris This forum is for the discussion of Solaris, OpenSolaris, OpenIndiana, and illumos.
General Sun, SunOS and Sparc related questions also go here. Any Solaris fork or distribution is welcome.

Notices


Reply
  Search this Thread
Old 05-29-2015, 05:15 AM   #1
cpseed
LQ Newbie
 
Registered: May 2015
Posts: 5

Rep: Reputation: Disabled
df -h hangs - nfs shares causing problem


We have a number of Solaris 10 servers that are paired i.e. 1 middleware server (MW) with 1 database server (DB). The MW server mounts 3 NFS shares exported by the DB server. We have 7 identical pairs of MW/DB servers. Although the databases have different data, 6 pairs are clones on the original 'Master' set.

All the servers are virtuals running under VMware.

What is happening is some (not all) of the MW servers are having a problem with the NFS shares that have been mounted

if you: df -h the command will hang

if you: umount the share then attempt to mount it again you get an error

I changed the NIC driver from E1000 to VMXNET3 and rebooted all 7 pairs of servers, now a different group of servers are experiencing the problem but not all of them.

I have tried using NFSv3 instead of NFSv4 but that lasted 2 days and has failed as well.

So back to square one...

Q) Does anyone know why this is happening?
Q) Does anyone know how to stop it from happening (and I don't mean the df -h hanging)

I have spent several fruitless days on Google researching this problem but none of the answers actually explain what the underlying fault is. Nor do they give an explanation as to how to fix the problem.

Yours very sincerely
Mr Frustrated
Ready To Have A Nervous Breakdown

Last edited by cpseed; 05-29-2015 at 06:53 AM.
 
Old 05-29-2015, 09:08 AM   #2
AlucardZero
Senior Member
 
Registered: May 2006
Location: USA
Distribution: Debian
Posts: 4,824

Rep: Reputation: 615Reputation: 615Reputation: 615Reputation: 615Reputation: 615Reputation: 615
What is the error you get when you attempt to mount it again?
Are the clients using DHCP?
What are the permissions on the share on the NFS server (showmount -e nfsserver)?
 
Old 06-01-2015, 01:24 AM   #3
cpseed
LQ Newbie
 
Registered: May 2015
Posts: 5

Original Poster
Rep: Reputation: Disabled
1. We don't use DHCP at all - static IP addresses only
2. The shares have been mounted for some days before the problem manifests itself. However, once the problem occurs and the share is umounted and an a re-mount is attempted we get:

NFS compound failed for server DB707: error 5 (RPC: Timed out)

3. The result of showmount:

export list for DB707:
/patch (everyone)
/stage (everyone)
/share (everyone)

The client server (MW707) mounts all 3 shares (/share, /patch and /stage) and all 3 can no longer be listed from the MW server (which includes doing a df -h).

4. In anticipation of queries regarding rpc:

On the NFS Server:
bash-3.2# rpcinfo -p DB707
program vers proto port service
100000 4 tcp 111 rpcbind
100000 3 tcp 111 rpcbind
100000 2 tcp 111 rpcbind
100000 4 udp 111 rpcbind
100000 3 udp 111 rpcbind
100000 2 udp 111 rpcbind
100024 1 udp 32772 status
100024 1 tcp 32771 status
100133 1 udp 32772
100133 1 tcp 32771
1073741824 1 tcp 32772
100021 1 udp 4045 nlockmgr
100021 2 udp 4045 nlockmgr
100021 3 udp 4045 nlockmgr
100021 4 udp 4045 nlockmgr
100021 1 tcp 4045 nlockmgr
100021 2 tcp 4045 nlockmgr
100021 3 tcp 4045 nlockmgr
100021 4 tcp 4045 nlockmgr
100011 1 udp 32773 rquotad
100005 1 udp 32774 mountd
100005 1 tcp 32777 mountd
100005 2 udp 32774 mountd
100005 2 tcp 32777 mountd
100005 3 udp 32774 mountd
100005 3 tcp 32777 mountd
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100227 2 udp 2049 nfs_acl
100227 3 udp 2049 nfs_acl
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 2 tcp 2049 nfs_acl
100227 3 tcp 2049 nfs_acl

On the Client (MW707):
bash-3.2# rpcinfo -p
program vers proto port service
100000 4 tcp 111 rpcbind
100000 3 tcp 111 rpcbind
100000 2 tcp 111 rpcbind
100000 4 udp 111 rpcbind
100000 3 udp 111 rpcbind
100000 2 udp 111 rpcbind
100024 1 udp 32772 status
100024 1 tcp 32771 status
100133 1 udp 32772
100133 1 tcp 32771
1073741824 1 tcp 32772
100021 1 udp 4045 nlockmgr
100021 2 udp 4045 nlockmgr
100021 3 udp 4045 nlockmgr
100021 4 udp 4045 nlockmgr
100021 1 tcp 4045 nlockmgr
100021 2 tcp 4045 nlockmgr
100021 3 tcp 4045 nlockmgr
100021 4 tcp 4045 nlockmgr
100011 1 udp 32773 rquotad

Regards

Last edited by cpseed; 06-01-2015 at 01:26 AM.
 
Old 06-01-2015, 04:12 AM   #4
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris 11.4, Oracle Linux, Mint, Tribblix, Ubuntu/WSL
Posts: 9,761

Rep: Reputation: 459Reputation: 459Reputation: 459Reputation: 459Reputation: 459
On the NFS client and when the issue happen, what says :

Code:
svcs -xv nfs/client
?
 
Old 06-01-2015, 04:50 AM   #5
cpseed
LQ Newbie
 
Registered: May 2015
Posts: 5

Original Poster
Rep: Reputation: Disabled
bash-3.2# svcs -xv nfs/client
svc:/network/nfs/client:default (NFS client)
State: online since Tue May 19 14:01:52 2015
See: man -M /usr/share/man -s 1M mount_nfs
See: /var/svc/log/network-nfs-client:default.log
Impact: None.
 
Old 06-01-2015, 02:50 PM   #6
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 1,071

Rep: Reputation: 466Reputation: 466Reputation: 466Reputation: 466Reputation: 466
Could be a problem with tcp; try to mount with options vers=3,proto=udp
 
Old 06-03-2015, 04:09 AM   #7
cpseed
LQ Newbie
 
Registered: May 2015
Posts: 5

Original Poster
Rep: Reputation: Disabled
I've already tried using NFSv3 and we still get the problem.

The problem with 'udp' is that it's not really suitable for use when read/write operations are in use - it doesn't report errors which can result in a currupt file. 'udp' is best used in read-only 'fire-and-forget' type transactions. However, whilst I was away a colleague tried using 'udp' and it failed as well.

Thanks anyway for the suggestion.
 
Old 06-06-2015, 10:36 AM   #8
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 1,071

Rep: Reputation: 466Reputation: 466Reputation: 466Reputation: 466Reputation: 466
The Sun NFS client has a robust error correction in its application layer (that is used on top of udp).
At least you should temporarily try it, in order to sort out a tcp problem.
 
Old 08-29-2015, 05:53 AM   #9
Axel van Moorsel
Member
 
Registered: Jan 2011
Location: Netherlands (Zuid Holland)
Distribution: Debian 8
Posts: 31

Rep: Reputation: 4
Maybe time problem?

I had a similar problem mounting shares from a virtual server. Turned out that time on virtual server was 2 seconds behind physical system. After correcting time on server (which also took some time, because it was a virtual one), I was able to mount them again.
 
Old 09-01-2015, 07:24 AM   #10
cpseed
LQ Newbie
 
Registered: May 2015
Posts: 5

Original Poster
Rep: Reputation: Disabled
The problem affects several servers.

Since the last post I have managed to download the rolled-up zip file of recommended patches for Solaris 10 1/13 u11 and applied to 1 set of servers. The fault seems to have disappeared have applied the patches. I will be rolling out the patches to the other servers over the next week or so. Hopefully, that should be the end of this NFS issue.

Regards,

Paul Seed
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Shutdown problem unmounting NFS shares srinishrews Linux - Newbie 15 04-05-2014 08:08 AM
Mounting NFS Shares ... UID Problem Riddick Linux - Software 1 01-06-2005 09:48 AM
NFS - Shares (File Caching Problem) smaida Linux - Networking 4 04-24-2004 12:39 AM
seeing nfs shares problem (firewall) rosscopeeko Linux - General 3 03-17-2004 03:46 PM
NFS-shares: Problem with locales/charset. Clemente Linux - Networking 1 01-19-2004 02:24 PM

LinuxQuestions.org > Forums > Other *NIX Forums > Solaris / OpenSolaris

All times are GMT -5. The time now is 03:40 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration