Linux - NetworkingThis forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I’m a DBA working across a number of Linux servers (Redhat 2.6.18-274.el5) and have a common NFS mount available to all of them. This mounted filesystem is used has a general purpose storage location for housing extracts, patches, some utility scripts, etc. It is not contain any actual database files.
We have two data centers approximately 70 miles (113 km) apart connected by a 10 gigabit network.
There is one NSF server.
In the “remote” data center I am experiencing significant performance issues with read operations.
Copying a 10 Gig file from the NFS mount to SAN storage, attached to the server, in the data center where the NFS server lives (“local”) takes roughly 7 minutes 57 seconds (477 sec). The same copy performed in the “remote” data center takes 1 hour, 31 minutes, 48 seconds (5508 sec).
I understand that I should expect some performance degradation but how much. Should I expect it to be 11 times slower ?
Copying the same file from one SAN based filesystem to another completes in approximately 60 seconds on both servers.
A ping from the “local” server to the NFS sever averages 0.200 MS. From the “remote” server the ping averages 1.652 MS.
Using SCP to copy the file from the “local” server’s NFS mount to the “remote” server’s SAN takes 4 minutes 10 seconds (250 sec).
So it would appear to be an issue with the NFS protocol itself.
My sysadmin and network admin agree that the difference seems excessive, but none of us have any prior experience with NFS over this distance.
Do others run NFS over WANs ?
If so, are they experience similar read performance ?
Write performance while degraded is much better than reads, e.g. copying from SAN disk to the NFS mount on the “remote” sever completes in approximately 15 minutes. I’m guessing this is an asynchronous operation.
Any suggestions on reference materials and/or diagnostics would be appreciated.
The speed of light is 186,000 miles per second, or 186 miles per 1 msec. If you are doing a 140-mile round-trip in 0.2 msec then I would like to buy some of the cable you are using, because mine only gets about 0.7 C.
NFS operations happen via RPC calls, each of which should have about 1 msec latency in your setup.
Oh, sorry, I misread. The problem is you have plenty of bandwidth but can't keep the pipe filled. Every time you have to wait on an operation, it takes 1-2 msec. Over long latency lines its typical to use local cache and remote replication of some kind.
Could you add a file checkout stage to your process? Use git, maybe? That's easy to set up and allows many concurrent users.
An approach that's probably overkill is a multi-node, replicated-data filesystem like lustre.
I do not run NFS over wan links. On the internal VPN networks and links I will use FUSE to mount an unencrypted FTP mount that looks just like a directory. I have seen extreme increases in network throughput, and drops in overhead, from this change.
If you can, i suggest trying it on a dev box to see the difference across the wan links.
Am I the only one that's amazed by how slow ALL of the speeds quoted by the OP are?
This is a 10GB file being transferred across a 10Gb link, theoretically it should take 8 seconds. Of course you'll [almost] never see that in real life, but I wouldn't expect it to take any longer than 30-60 seconds. The OP is seeing 477 seconds to transfer to another machine located just a few feet away, that's 21 MB/s on a 10Gb link!
All of those speeds are way too slow, even ignoring the remote transfer. Me thinks there's something wrong with the network, NIC, cables, etc.
For comparison's sake, I get somewhere around 200 MB/s over NFS or SCP (using arcfour) on my 10Gb link, and somewhere around 100 MB/s on a 1Gb link between neighboring machines. The single-transfer bandwidth on the 10Gb isn't all that much better than the 1Gb, but I can run five of those 200 MB/s transfers simultaneously on the 10Gb link and get somewhere around 9-9.5 Gb/s total throughput. If I was only getting 20 MB/s on my 10Gb network, I would be dropping everything and diving into the equipment to find out what's wrong. Never mind the 2 MB/s he's getting over the 70 mile link...
Last edited by suicidaleggroll; 04-29-2014 at 04:10 PM.
While I used the transfer of a large file to test/measure the difference in performance local vs remote NSF clients, the filesystem serves many purposes including a common repository for many files and directory structures used across all the client servers. I’ve grown use the convenience of not having to synchronize them and having changes take effect immediately.
There were a couple suggestions for alternatives that may facilitate this and I’ll have to take a look at them, although as a mere DBA my ability to drive the infrastructure here is limited.
I’d also have to admit that I may be too focused on determining if my view, that the difference in performance is greater than it should be, is correct. And if I am correct how to diagnose the problem.
As to the issues raised by suicidaleggroll…….the throughput I’m seeing locally IS significantly less than what he is reporting
scp (using fiber attached SAN disk on both servers) ~46MB/sec
NSF (cp to SAN) ~21MB/sec
Interestingly the scp throughput only drops to ~42MB/sec transferring to the remote site
While the NSF read throughput drops to ~2MB/sec at the remote site
Problem is (to the best of my knowledge) I’m the only one “complaining” about network throughput in any context and it has been most noticeable using NFS. The focus to date has been on trying to understand why the difference between scp and NFS is so large. Don't know much about the internal of scp but, I understand NSF to be a very “chatty” protocol (e.g. 65Kbyte block reads w/verification for each).
Are there diagnostic tools/techniques I can use or suggest to my system and network admins for better analyzing this ?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.