After a recent expansion of our GPFS storage servers (eight new NSD servers in another building) in addition to our existing four NSD [Network Storage Device] servers we had been using we are now seeing some network slowdowns which netstat and tcpdump leads us to believe is due to a high number of retransmits.
After googling around a bit I discovered a tool available on RHEL/Centos systems called
dropwatch which will tell you where in the kernel something is being dropped. I'm not sure about the relationship between dropped and retransmitted packets but I got some interesting results from dropwatch which are peculiar to only the NSD servers serving GPFS. We are still investigating our switches but we noticed that certain GPFS and sysctl performance tweaks led to better throughput, so it might be host side.
A typical (Centos) system in our network (I tried this on a few different servers) will have some, what I assume is normal, drops like this (within a 5 second period):
Code:
2 drops at unix_stream_connect+1dc (0xffffffff814bdf1c)
1 drops at netlink_attachskb+398 (0xffffffff81454338)
13 drops at unix_dgram_poll+4cd (0xffffffff814bbf9d)
4 drops at unix_stream_connect+1dc (0xffffffff814bdf1c)
1 drops at netlink_unicast+251 (0xffffffff81454981)
1 drops at audit_log_lost+10b (0xffffffff810caf4b)
2 drops at neigh_event_ns+1b4 (0xffffffff81438544)
1 drops at netlink_attachskb+398 (0xffffffff81454338)
4 drops at unix_dgram_poll+4cd (0xffffffff814bbf9d)
5 drops at __netlink_create+e9 (0xffffffff81452bb9)
5 drops at netlink_unicast+251 (0xffffffff81454981)
1 drops at neigh_event_ns+1b4 (0xffffffff81438544)
8 drops at unix_dgram_poll+4cd (0xffffffff814bbf9d)
4 drops at unix_stream_connect+1dc (0xffffffff814bdf1c)
1 drops at arp_error_report+39 (0xffffffff8148c9a9)
1 drops at netlink_attachskb+398 (0xffffffff81454338)
The NSD Servers serving GPFS (four Redhat 5.8 + eight Centos 6.2), however, show something different consistently (with high numbers) which appears on none of the other servers:
netlabel_unlabel_acceptflg. I see this pretty much on all the NSD servers (within a 5 second period):
Code:
1 drops at netlink_run_queue+ff
1 drops at netlink_broadcast+283
62 drops at netlabel_unlabel_acceptflg+7e4f032
64 drops at netlabel_unlabel_acceptflg+7e4f032
64 drops at netlabel_unlabel_acceptflg+7e4f032
1 drops at unix_stream_recvmsg+3fa
1 drops at unix_stream_recvmsg+3fa
16 drops at netlabel_unlabel_acceptflg+7e4f032
64 drops at netlabel_unlabel_acceptflg+7e4f032
64 drops at netlabel_unlabel_acceptflg+7e4f032
1 drops at unix_stream_recvmsg+3fa
2 drops at unix_stream_recvmsg+3fa
64 drops at netlabel_unlabel_acceptflg+7e4f032
1 drops at unix_stream_recvmsg+3fa
64 drops at netlabel_unlabel_acceptflg+7e4f032
57 drops at netlabel_unlabel_acceptflg+7e4f032
7 drops at netlabel_unlabel_acceptflg+7e52827
64 drops at netlabel_unlabel_acceptflg+7e4f032
2 drops at unix_stream_recvmsg+3fa
1 drops at unix_stream_recvmsg+3fa
64 drops at netlabel_unlabel_acceptflg+7e4f032
64 drops at netlabel_unlabel_acceptflg+7e4f032
1 drops at unix_stream_recvmsg+3fa
64 drops at netlabel_unlabel_acceptflg+7e4f032
64 drops at netlabel_unlabel_acceptflg+7e4f032
3 drops at netlink_broadcast+283
1 drops at unix_stream_recvmsg+3fa
2 drops at unix_stream_recvmsg+3fa
2 drops at netlink_broadcast+283
3 drops at unix_stream_recvmsg+3fa
64 drops at netlabel_unlabel_acceptflg+7e4f032
64 drops at netlabel_unlabel_acceptflg+7e4f032
11 drops at unix_stream_recvmsg+3fa
10 drops at unix_dgram_connect+531
1 drops at netlink_destroy_callback+11
5 drops at tcp_rcv_state_process+66
1 drops at unix_stream_recvmsg+3fa
I googled
netlabel_unlabel_acceptflg and all I came up with was something about "unlabeled packets" so I started searching on that and it led me to an Oracle page discussing RIPSO/CIPSO and packet labeling:
http://docs.oracle.com/cd/E19109-01/...dhn/index.html. I have no idea if that is even related but I am really curious as to what the high number of drops at netlabel_unlabel_acceptflg indicates.
Any ideas or guesses?