LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 04-28-2014, 04:00 PM   #1
bozzo99
LQ Newbie
 
Registered: Jul 2010
Location: Madison, Wisconsin
Distribution: RH
Posts: 4

Rep: Reputation: 0
NFS Read Performance over WAN (70 Miles) ?


I’m a DBA working across a number of Linux servers (Redhat 2.6.18-274.el5) and have a common NFS mount available to all of them. This mounted filesystem is used has a general purpose storage location for housing extracts, patches, some utility scripts, etc. It is not contain any actual database files.

We have two data centers approximately 70 miles (113 km) apart connected by a 10 gigabit network.
There is one NSF server.

In the “remote” data center I am experiencing significant performance issues with read operations.

Copying a 10 Gig file from the NFS mount to SAN storage, attached to the server, in the data center where the NFS server lives (“local”) takes roughly 7 minutes 57 seconds (477 sec). The same copy performed in the “remote” data center takes 1 hour, 31 minutes, 48 seconds (5508 sec).

I understand that I should expect some performance degradation but how much. Should I expect it to be 11 times slower ?

Copying the same file from one SAN based filesystem to another completes in approximately 60 seconds on both servers.

A ping from the “local” server to the NFS sever averages 0.200 MS. From the “remote” server the ping averages 1.652 MS.

Using SCP to copy the file from the “local” server’s NFS mount to the “remote” server’s SAN takes 4 minutes 10 seconds (250 sec).

So it would appear to be an issue with the NFS protocol itself.

My sysadmin and network admin agree that the difference seems excessive, but none of us have any prior experience with NFS over this distance.

Do others run NFS over WANs ?
If so, are they experience similar read performance ?

Write performance while degraded is much better than reads, e.g. copying from SAN disk to the NFS mount on the “remote” sever completes in approximately 15 minutes. I’m guessing this is an asynchronous operation.

Any suggestions on reference materials and/or diagnostics would be appreciated.

Thanks in Advance
Bozzo99
 
Old 04-28-2014, 04:25 PM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,140

Rep: Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263
The speed of light is 186,000 miles per second, or 186 miles per 1 msec. If you are doing a 140-mile round-trip in 0.2 msec then I would like to buy some of the cable you are using, because mine only gets about 0.7 C.

NFS operations happen via RPC calls, each of which should have about 1 msec latency in your setup.
 
Old 04-29-2014, 10:52 AM   #3
bozzo99
LQ Newbie
 
Registered: Jul 2010
Location: Madison, Wisconsin
Distribution: RH
Posts: 4

Original Poster
Rep: Reputation: 0
I'm sorry it wasn't clear; but the 0.200 MS ping was from a client server in the same data center as the NFS server.
Perhaps 10-20 feet away.

The ping from the "remote" client to the NFS server was 1.652 MS (70 miles).
 
Old 04-29-2014, 02:25 PM   #4
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,140

Rep: Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263
Oh, sorry, I misread. The problem is you have plenty of bandwidth but can't keep the pipe filled. Every time you have to wait on an operation, it takes 1-2 msec. Over long latency lines its typical to use local cache and remote replication of some kind.

Could you add a file checkout stage to your process? Use git, maybe? That's easy to set up and allows many concurrent users.

An approach that's probably overkill is a multi-node, replicated-data filesystem like lustre.
 
Old 04-29-2014, 02:59 PM   #5
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,976

Rep: Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623Reputation: 3623
"connected by a 10 gigabit network." Dedicated?? Wireless?? Fiber?? what?

I'd suspect that you have some network issue if it takes 10-12 times longer.

Some device wrong, setting wrong, connection wrong, slow devices, route issue, high vpn encryption on weak systems.???

This assumes remote is in fact fully able to deliver speeds locally.
 
Old 04-29-2014, 03:05 PM   #6
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278

Rep: Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694
I do not run NFS over wan links. On the internal VPN networks and links I will use FUSE to mount an unencrypted FTP mount that looks just like a directory. I have seen extreme increases in network throughput, and drops in overhead, from this change.

If you can, i suggest trying it on a dev box to see the difference across the wan links.
 
Old 04-29-2014, 04:09 PM   #7
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142
Am I the only one that's amazed by how slow ALL of the speeds quoted by the OP are?

This is a 10GB file being transferred across a 10Gb link, theoretically it should take 8 seconds. Of course you'll [almost] never see that in real life, but I wouldn't expect it to take any longer than 30-60 seconds. The OP is seeing 477 seconds to transfer to another machine located just a few feet away, that's 21 MB/s on a 10Gb link!

All of those speeds are way too slow, even ignoring the remote transfer. Me thinks there's something wrong with the network, NIC, cables, etc.

For comparison's sake, I get somewhere around 200 MB/s over NFS or SCP (using arcfour) on my 10Gb link, and somewhere around 100 MB/s on a 1Gb link between neighboring machines. The single-transfer bandwidth on the 10Gb isn't all that much better than the 1Gb, but I can run five of those 200 MB/s transfers simultaneously on the 10Gb link and get somewhere around 9-9.5 Gb/s total throughput. If I was only getting 20 MB/s on my 10Gb network, I would be dropping everything and diving into the equipment to find out what's wrong. Never mind the 2 MB/s he's getting over the 70 mile link...

Last edited by suicidaleggroll; 04-29-2014 at 04:10 PM.
 
Old 05-02-2014, 03:05 PM   #8
bozzo99
LQ Newbie
 
Registered: Jul 2010
Location: Madison, Wisconsin
Distribution: RH
Posts: 4

Original Poster
Rep: Reputation: 0
Thank you for your responses

The network is dedicated fiber

While I used the transfer of a large file to test/measure the difference in performance local vs remote NSF clients, the filesystem serves many purposes including a common repository for many files and directory structures used across all the client servers. I’ve grown use the convenience of not having to synchronize them and having changes take effect immediately.

There were a couple suggestions for alternatives that may facilitate this and I’ll have to take a look at them, although as a mere DBA my ability to drive the infrastructure here is limited.

I’d also have to admit that I may be too focused on determining if my view, that the difference in performance is greater than it should be, is correct. And if I am correct how to diagnose the problem.

As to the issues raised by suicidaleggroll…….the throughput I’m seeing locally IS significantly less than what he is reporting
scp (using fiber attached SAN disk on both servers) ~46MB/sec
NSF (cp to SAN) ~21MB/sec

Interestingly the scp throughput only drops to ~42MB/sec transferring to the remote site
While the NSF read throughput drops to ~2MB/sec at the remote site

Problem is (to the best of my knowledge) I’m the only one “complaining” about network throughput in any context and it has been most noticeable using NFS. The focus to date has been on trying to understand why the difference between scp and NFS is so large. Don't know much about the internal of scp but, I understand NSF to be a very “chatty” protocol (e.g. 65Kbyte block reads w/verification for each).

Are there diagnostic tools/techniques I can use or suggest to my system and network admins for better analyzing this ?

Again thank you for your feedback
 
  


Reply

Tags
nfs, wan



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
iSCSI write performance very poor while read performance is excellent dinominant Linux - Server 1 10-10-2012 10:51 AM
How to mount NFS over WAN hhh123 Linux - Server 9 06-26-2011 08:41 AM
Software Raid 6 - poor read performance / fast write performance Kvothe Linux - Server 0 02-28-2011 03:11 PM
Bad WAN port performance on WRT54GL when used as a switch. (DD-WRT) Synt4x_3rr0r Linux - Networking 0 08-23-2010 11:44 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 10:08 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration