Debian Squeeze: TCP stops working, UDP doesn't, with "unexpectedly shrunk window"

hnatt · 03-27-2012, 02:33 AM

Hi,
For some time I have an issue which finally made me angry enough to get it over. Suddenly, web browsers, curl and wget stop working (I mean refuse to load web-pages), but BitTorrent and ping continue to work. That's why I conclude that TCP doesn't work but UDP do work, please tell me if I'm wrong. In most but not all cases this is accompanied by messages in dmesg:

Code:

[114481.188607] TCP: Peer 192.162.164.1:33760/60908 unexpectedly shrunk window 3159965547:3159965552 (repaired)
[118194.106999] TCP: Peer 192.162.164.1:33760/36531 unexpectedly shrunk window 1431905130:1431905135 (repaired)
[118871.376599] TCP: Peer 192.162.164.1:33760/52432 unexpectedly shrunk window 1483300804:1483300809 (repaired)
[124194.158415] TCP: Peer 192.162.164.1:33760/59202 unexpectedly shrunk window 1945222584:1945222589 (repaired)
[125985.609105] TCP: Peer 80.234.42.237:10682/35066 unexpectedly shrunk window 3008698421:3008698426 (repaired)
[127195.970719] TCP: Peer 192.162.164.1:33760/51821 unexpectedly shrunk window 1421444199:1421444204 (repaired)
[130430.167525] TCP: Peer 192.162.164.1:33760/37678 unexpectedly shrunk window 415705804:415705809 (repaired)
[134511.031396] TCP: Peer 80.234.42.237:10682/46199 unexpectedly shrunk window 2395711739:2395711744 (repaired)
[136547.021033] TCP: Peer 80.234.42.237:10682/45208 unexpectedly shrunk window 3482768114:3482768119 (repaired)
[138152.067493] TCP: Peer 80.234.42.237:10682/52580 unexpectedly shrunk window 285955201:285955206 (repaired)
[143384.165370] TCP: Peer 192.162.164.1:33760/43433 unexpectedly shrunk window 1547569829:1547569834 (repaired)
[146863.178678] TCP: Peer 80.234.42.237:10682/41219 unexpectedly shrunk window 779253382:779253387 (repaired)

This happens once or twice in a week, always when my BitTorrent client is on an my laptop left on for longer than a day. For a long time there was only one IP 192.162.164.1, which made me suppose that this is some kind of attack on users of BitTorrent, but this is truly a stab in the dark.

"/etc/init.d/networking restart" doesn't help, as well as reconnecting with NetworkManager. The only thing that I found to be helpful is rebooting my laptop.

I have Internet connection over Wi-Fi router. Other machines connected to my router do not suffer this issue.

If you need any other info to analyse my problem, tell me and I will write it down as soon as this happens again.

My questions are: what is this, how do I secure myself from this, and how to regain Internet connection without rebooting whole system.

unSpawn · 03-27-2012, 07:07 PM

The kernel encountered a problem because the remote site changed its advertised window size without any reason. The kernel fixed this all by itself. It's a message of the informational level, not a warning. Watching wireless-tools output and saving packet captures (something like 'tcpdump -i [DEVICE] -s 0 -n -nn -N -w /path/to/file') may (or may not) show clues. To me Wifi always came across as rather fragile.

WizadNoNext · 03-30-2012, 11:38 AM

You simply run into tcp memory pool issues. How much memory you have? 64MiB? You never should run in such problem, it indicates, that memory pool for tcp is exhausted and kernel start aggressively cut TCP connections. Maybe you rise the limits, but be very careful - do not add more then 25% to those limits, as it is stated in pages, not bytes!

You can try to change congestion control to less aggressive one like westwood or illinois

Code:

modprobe tcp-westwood
echo westwood >/proc/sys/net/ipv4/tcp_congestion_control

Second command would fail with sudo - you have to be root to do it (redirection would done as normal user, when you would be using sudo)!

Westwood is actually best congestion control for wireless, but it is cutting TCP window more aggressively, then illinois, which is made for wireless as well (but not as target)

P.S. I never have run into such problem. My 2 servers, desktop and laptop are constantly on, there was time, when KTorrent was running on desktop 24/7/365 and I never had even slightest problem.

hnatt · 03-30-2012, 01:16 PM

unSpawn, thanks! It was my blind guess that those messages point directly to the problem. I'm quite a newbie and all I knew was "dmesg | tail".

WizadNoNext, thank you, too! I will learn about congestion control and tcp pool. That's the kind of answer I wanted to hear -- something to start with. Because I didn't know where to dig. I have relatively modern laptop with 3 GB of RAM. So no lack of memory here. May it be caused by buggy BitTorrent client? I use qBittorrent. Many thanks, again.

hnatt · 03-30-2012, 01:21 PM

I'll not mark this thread as SOLVED yet, untill I definitely know that problem is solved. And surely will post when it happen. Meanwhile any guesses are still welcome.

unSpawn · 03-30-2012, 01:44 PM

Quote:

Originally Posted by WizadNoNext

You simply run into tcp memory pool issues.

Could you please point to a document that supports your claim?

WizadNoNext · 03-30-2012, 04:58 PM

Then with such big amount of memory, you should get quite fair amount of memory for tcp.
For me it is (tcp_mem):

Code:

48276   64370   96552

The amounts are in pages (4KiB for IA32/AMD64).
Description:

Quote:

tcp_mem - vector of 3 INTEGERs: min, pressure, max
min: below this number of pages TCP is not bothered about its
memory appetite.

pressure: when amount of memory allocated by TCP exceeds this number
of pages, TCP moderates its memory consumption and enters memory
pressure mode, which is exited when memory consumption falls
under "min".

max: number of pages allowed for queueing by all TCP sockets.

Defaults are calculated at boot time from amount of available
memory.

This settings are system wide - for all TCP connections!
You could have it bigger.

Another set of parameters is tcp_wmem (but it shouldn't be a problem in your case). My (automatic) settings are:
IT IS in bytes!

Code:

4096    16384   2059840

And description:

Quote:

tcp_wmem - vector of 3 INTEGERs: min, default, max
min: Amount of memory reserved for send buffers for TCP sockets.
Each TCP socket has rights to use it due to fact of its birth.
Default: 1 page

default: initial size of send buffer used by TCP sockets. This
value overrides net.core.wmem_default used by other protocols.
It is usually lower than net.core.wmem_default.
Default: 16K

max: Maximal amount of memory allowed for automatically tuned
send buffers for TCP sockets. This value does not override
net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables
automatic tuning of that socket's send buffer size, in which case
this value is ignored.
Default: between 64K and 4MB, depending on RAM size.

This settings are for separate connection (each counted separately.

Settings are unchanged (set by kernel) on 2GiB RAM home server.

unSpawn: it is just guess, but look closely. TCP suddenly dies and won't work any more. If you know any other explanation...

unSpawn · 03-31-2012, 04:39 AM

Quote:

Originally Posted by WizadNoNext

it is just guess

As far as I am aware computing is binary. This means there should be no reason to "worry", "think" or "guess" as conditions like for instance kernel runtime parameters for the machine ('uname -r; sysctl net.ipv4'), IP statistics ('cat /proc/net/sockstat') and memory object usage ('( grep sharedavail /proc/slabinfo|tr -d '#'; grep -i tcp /proc/slabinfo; grep -i udp /proc/slabinfo ) | column -t;') can be tested to be true or false. As far as I'm aware the tcp_wmem and tcp_rmem settings you refer to do not require tuning unless a distinct need arises. IMHO such a conclusion should be supported by results of proper diagnosis and not "just a guess".

WizadNoNext · 03-31-2012, 10:31 AM

unSpawn
What it could be then? I am actually quite curious about this problem. My guess do not explain problems with unexpected window shrinks, but it could be up to other side of connection due to lost packets.

unSpawn · 04-01-2012, 04:40 AM

Quote:

Originally Posted by WizadNoNext

What it could be then?

Now that is the right question.

First of all you should establish a baseline, meaning the OP should provide details about the distribution (kernel), network stack information ('sysctl net.ipv4'), network device configuration (wherever that resides) and an indication if any sysctls were tweaked. Second you observe the OP trying to load web pages and failing so when the situation arises he could first run 'dmesg' to list messages, run 'iwconfig' (or whatever tool in the wireless-tools package exposes the most information) in a loop to list changing network details and start 'tcpdump' to save traffic. With that in place he should then run network diagnostics and since, as he said, 2 out of 3 IP suite protocols seem to work, running 'tcptraceroute' (and not plain traceroute) and retrieving a page with 'curl' could help gather enough information for you to run the packet capture he might share through Wireshark.

hnatt · 04-01-2012, 08:03 AM

Code:

# uname -r
2.6.32-5-amd64

Code:

# sysctl net.ipv4
error: "Invalid argument" reading key "net.ipv4"

-- What's wrong here? Should the command be different for my distribution?

Code:

# cat /proc/net/sockstat
sockets: used 650
TCP: inuse 40 orphan 5 tw 2 alloc 50 mem 10
UDP: inuse 19 mem 9
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

Code:

( grep sharedavail /proc/slabinfo|tr -d '#'; grep -i tcp /proc/slabinfo; grep -i udp /proc/slabinfo ) | column -t;
name           <active_objs>  <num_objs>  <objsize>  <objperslab>  <pagesperslab>  :  tunables  <limit>  <batchcount>  <sharedfactor>  :  slabdata  <active_slabs>  <num_slabs>  <sharedavail>
tw_sock_TCPv6  50             50          320        25            2               :  tunables  0        0             0               :  slabdata  2               2            0
TCPv6          38             51          1856       17            8               :  tunables  0        0             0               :  slabdata  3               3            0
tw_sock_TCP    64             64          256        32            2               :  tunables  0        0             0               :  slabdata  2               2            0
TCP            60             133         1664       19            8               :  tunables  0        0             0               :  slabdata  7               7            0
UDPLITEv6      0              0           1024       32            8               :  tunables  0        0             0               :  slabdata  0               0            0
UDPv6          64             64          1024       32            8               :  tunables  0        0             0               :  slabdata  2               2            0
UDP-Lite       0              0           832        39            8               :  tunables  0        0             0               :  slabdata  0               0            0
UDP            78             78          832        39            8               :  tunables  0        0             0               :  slabdata  2               2            0

Right now my connection is OK, so sorry if I pasted something that is not useful. I'll remember what I need to do when the failure will happen again.

unSpawn · 04-01-2012, 10:55 AM

Quote:

Originally Posted by hnatt

Code:

error: "Invalid argument" reading key "net.ipv4"

Does 'sysctl net.ipv6' return anything?

Quote:

Originally Posted by hnatt

Code:

# cat /proc/net/sockstat
sockets: used 650
TCP: inuse 40 orphan 5 tw 2 alloc 50 mem 10
UDP: inuse 19 mem 9

And if you're on IPv6 then you want /proc/net/sockstat6 as well or run 'netstat -s' for human readable output.

Quote:

Originally Posted by hnatt

Right now my connection is OK, so sorry if I pasted something that is not useful. I'll remember what I need to do when the failure will happen again.

No, it's OK. Basically what you want is to grab as much information and as quickly as possible related to the network as apparently it's a transient situation: kernel tunables, device configuration, network statistics and traffic captures.

hnatt · 04-01-2012, 05:37 PM

Quote:

Originally Posted by unSpawn

Does 'sysctl net.ipv6' return anything?

Same error. Maybe I need to install some packages?

Quote:

Originally Posted by unSpawn

And if you're on IPv6 then you want /proc/net/sockstat6 as well or run 'netstat -s' for human readable output.

No, it's IPv4.

Today I ran into this problem again. Here is the info I managed to gather:

Code:

# ifconfig wlan0
wlan0     Link encap:Ethernet  HWaddr ##:##:##:##:##:##  
          inet addr:192.168.1.100  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::226:82ff:fedf:27b4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:18725404 errors:0 dropped:0 overruns:0 frame:0
          TX packets:13304772 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:19863677094 (18.4 GiB)  TX bytes:4728786052 (4.4 GiB)

# iwconfig wlan0
wlan0     IEEE 802.11bgn  ESSID:"###############"  
          Mode:Managed  Frequency:2.432 GHz  Access Point: ##:##:##:##:##:##   
          Bit Rate=48 Mb/s   Tx-Power=19 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Encryption key:7369-6567-5F6F-6465-725F-746F-64
          Power Management:off
          Link Quality=70/70  Signal level=-38 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

# cat /proc/net/sockstat
sockets: used 636
TCP: inuse 32 orphan 2 tw 0 alloc 42 mem 10
UDP: inuse 21 mem 10
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

# ( grep sharedavail /proc/slabinfo | tr -d '#'; grep -i tcp /proc/slabinfo; grep -i udp /proc/slabinfo ) | column -t
name           <active_objs>  <num_objs>  <objsize>  <objperslab>  <pagesperslab>  :  tunables  <limit>  <batchcount>  <sharedfactor>  :  slabdata  <active_slabs>  <num_slabs>  <sharedavail>
tw_sock_TCPv6  50             50          320        25            2               :  tunables  0        0             0               :  slabdata  2               2            0
TCPv6          38             51          1856       17            8               :  tunables  0        0             0               :  slabdata  3               3            0
tw_sock_TCP    64             64          256        32            2               :  tunables  0        0             0               :  slabdata  2               2            0
TCP            55             152         1664       19            8               :  tunables  0        0             0               :  slabdata  8               8            0
UDPLITEv6      0              0           1024       32            8               :  tunables  0        0             0               :  slabdata  0               0            0
UDPv6          64             64          1024       32            8               :  tunables  0        0             0               :  slabdata  2               2            0
UDP-Lite       0              0           832        39            8               :  tunables  0        0             0               :  slabdata  0               0            0
UDP            78             78          832        39            8               :  tunables  0        0             0               :  slabdata  2               2            0

Unfortunately I found out that I didn't have tcpdump package installed at that moment, and I didn't read your last post where you mentioned 'netstat -s', so I could not gather this potentionally useful information this time.

But one more detail now. When I plugged in Ethernet cable and turned off WiFi card, the connection was not regained. The symptoms remained the same as it appeared to me: curl, wget and browsers do not work, and ping, BitTorrent or ICQ client do work. So I wonder if this problem really has something to do with wireless connection. Well, that is again just a blind guess, because there is no reason why WiFi can't cause some problem that could not be solved by simply turning off the WiFi or turning on Ethernet.

WizadNoNext · 04-02-2012, 05:31 AM

Answer is simple - /proc/sys/net/ipv4 is directory, not file! You cannot set directory nor get its value.
For instance

Code:

sysctl net.ipv4.tcp_rmem

If you wish to see all ipv4 values then

Code:

[sysctl -a | grep ipv4

for ease of use (scrolling)

Code:

sysctl -a | grep ipv4 | less

You can even get all net values

Code:

sysctl -a | grep net | less

browsing it without less (or similar program) would be quite awkward.

Actually it seams that TCP is getting overloaded and either it drops everything or it simply stops to work. I was trying to work out, which module is responsible for TCP, but either I was to lazy or it is compiled into kernel. If it is compiled into kernel and would crash, then you have no other choice, then reboot, as there would be no fix.

I just checked Makefile and it is build-in without option to make module. So somehow you TCP stack dies (crashes) and then only option is to reboot. It should never happen!

Maybe try to get linux kernel 3.2.13 or 3.3 and see if it would happen again. BTW what version of kernel you are running, maybe there is some bug and you run into it.

P.S. I have two servers, when I had just one I had all services there. I never had any problem and I can assure you, that from time to time I overloaded both TCP and UDP (FTP, NFS, samba, proxy, DNS, at least 3 SSH connections always running, copying (using FTP, NFS, samba, SSH), sometimes compiling few programs at once (at most 4 kernels with sources on server and compiling process on desktop)) - I never run into such problem - something is terribly wrong with either your usage or your connection or your kernel. It should never happen - kernel should be able to counter-fight such problems, before they would arise to being serious.

hnatt · 04-02-2012, 08:47 AM

Quote:

Originally Posted by WizadNoNext

What version of kernel you are running, maybe there is some bug and you run into it.

Code:

# uname -r
2.6.32-5-amd64

It is from default repository of Debian Squeeze (which is the stable branch for now) and I went through several updates of the "linux-image" package with this problem, so I believe it's rather something wrong with my configuration or hardware.

Must confess, one time I was trying to learn traffic analyzing tools like wireshark, but soon ran out of leisure time and gave up. Maybe I broke something while configuring thoughtlessly wireshark, etc.?

Here's the output of sysctl -a | grep net.ipv: http://pastebin.com/pT0a2UgX