LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 08-21-2014, 04:48 AM   #1
TheTuxKeeper
LQ Newbie
 
Registered: Aug 2014
Posts: 2

Rep: Reputation: Disabled
Question Weird retransmission problem with NFv3 over TCP between debian wheezy and Netapp


Hello,

i try to understand a problem we have.
Our setup:
  • Debian wheezy VM (on multiple ESXi 5.0 hosts) with vmxnet3 NICs and NFS rootfs (based on LTSP) and other NFS mounts
  • NFS server: Netapp Ontap 8.1.3P3 (rootfs and other NFS mounts of the clients from this IP, so we usually have only two connections but more mounts)
  • 10GBASE network: ESXi hosts with CX4 (newer ones with Cat6), Netapp with fiber
  • all in the same network (VLAN), no router
  • MTU is 9000 on client and Netapp
  • nfs mount options:
    Code:
    rw,noatime,nodiratime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,nolock,noacl,proto=tcp,timeo=100,retrans=360,sec=sys,mountvers=3,mountport=4046,mountproto=udp,local_lock=all


The problem (also see wireshark csv export captured on the nfs-client http://pastie.org/pastes/9490898/tex...sbkkculwwkszoq):
  1. two packets of the NFS connections get lost (reason yet to find) - 4251210111 and 4251219059
  2. Netapp sends duplicate ACK with the sequence number of first lost packet in ack field (as far as i know this should trigger a fast-retransmit) - seq 1681885773+ and ack 4251210111
  3. more write requests from the client, ACKed with SACK by Netapp (ACK field still the sequence number of lost packet) - seq 1681885773 onwards
  4. 12s nothing - client-VM frozen since nearly all filesystem operations hang (only two connections that handle all the NFS traffic and there are no other filesystems used except tmpfs)
  5. SYN from client (trying to reestablish the connection)
  6. another ACK from netapp with ack 4251210111 (still waiting for a retransmit)
  7. RST from client (wants to reestablish the connection)
  8. 73s nothing (client-VM still nearly completely frozen)
  9. successful reestablishing of the connection
  10. except timeouts that trigger in the applications, everything works again

The question(s):
Why is there no retransmission of the two lost packets?
There should be one of two mechanisms to be triggered - fast-retransmission or retransmission timeout (RTO). Am i missing something !?
Is this a bug?

It's happening quite randomly on multiple VMs on multiple ESXi hosts and not on all VMs of a ESXi host. But up to now only on wheezy. We still have some squeeze VMs that don't have the problem (yet?).
Perhaps it's really a bug since we use wheezy longer than we have the problem (or have discovered it). But i don't know how to debug it.

I hope someone can help me here. I already dug deep into TCP and lost my way a little bit

Regards
Daniel

Last edited by TheTuxKeeper; 08-22-2014 at 03:11 AM.
 
Old 07-21-2017, 01:23 PM   #2
RikiRikRdo
LQ Newbie
 
Registered: Jul 2017
Posts: 1

Rep: Reputation: Disabled
Hey TuxKeeper.
I am seeing this exact behavior. Did you ever figure out what it was?
 
Old 07-22-2017, 08:13 AM   #3
TheTuxKeeper
LQ Newbie
 
Registered: Aug 2014
Posts: 2

Original Poster
Rep: Reputation: Disabled
Hi,

sorry, I forgot to write the solution here. But I found the ticket in our internal issue tracker!
It was vmxnet3 driver bug (I think something with the offloading options).

Our solution was to switch from the vmware tools to the open-vm tools. For linux it's an official recommendation of vmware. They open sourced the drivers and it's in the official kernel for some time now.
Uninstall the vmware tools completely and check that all modules are removed (they should be in /lib/modules/<kernel-version>/extra/, only the ones in /lib/modules/<kernel-version>/kernel/ should remain). Then install the open-vm tools, there should be an official package for the vmtoolsd daemon in debian and in most other distributions (drivers are already in the kernel and just overruled by the vmware tools installation)
EDIT: don't forget to rebuild the initramfs! The old driver could still be there (update-initramfs)

I hope that helps to your fix your issue!

Regards
Daniel

Last edited by TheTuxKeeper; 07-22-2017 at 08:16 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Unexpected TCP Retransmission. $ubbu Linux - Networking 10 06-01-2012 02:09 PM
ssh hangs upon tcp errors or retransmission wastingtime Linux - Software 1 03-10-2009 08:13 PM
TCP Retransmission in Information tab of Ethereal Linuxfan0001 Linux - Networking 1 06-18-2008 03:28 PM
TCP Retransmission & lost segments problem under Linux but not under XP debuser123 Linux - Networking 22 12-16-2007 04:34 PM
TCP retransmission and duplicated ack enjoyzj Linux - Networking 0 06-05-2004 06:19 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 03:14 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration