Rate limiting SSH via netfilter hashlimit module causes packet_write_wait: Broken pipe

witiko · 05-08-2018, 06:48 PM

I am maintaining a remote server over SSH. To ensure that my maintenance only has a minor impact on the server's network speed, I rate-limit my outbound traffic to 200KiB/s by dropping packets using the following iptables rule:

Code:

# ip6tables -A INPUT -p tcp -m hashlimit --hashlimit-above 200kb/s -m tcp --destination 3ffe:ffff::dead:beef --dport 22 -j DROP

# ip6tables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp      anywhere             3ffe:ffff::dead:beef  limit: above 200kb/s tcp dpt:22

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

However, now when I saturate the link, the SSH sessions drop in predictable 15 minute time intervals:

Code:

$ for TRIAL in `seq 1 5`
> do
>     yes | dd status=progress if=/dev/stdin bs=1k count=$((500*1024)) 2>dd.$TRIAL.log |
>     ssh -vvv remotehost 'cat >/dev/null' 2>&1 |
>     while read LINE
>     do
>         printf '%s\t%s\n' `date +%H:%M:%S` "$LINE"
>     done | tee output.$TRIAL.log
> done

$ tail <output.1.log
20:51:53        debug2: channel 4: rcvd adjust 131072
20:51:54        debug2: channel 4: rcvd adjust 131072
20:51:55        debug2: channel 4: rcvd adjust 131072
20:51:55        debug2: channel 4: rcvd adjust 131072
20:51:56        debug2: channel 4: rcvd adjust 131072
20:51:56        debug2: channel 4: rcvd adjust 131072
20:51:57        debug2: channel 4: rcvd adjust 131072
20:51:58        debug2: channel 4: rcvd adjust 131072
20:51:58        debug3: send packet: type 1
20:51:58        packet_write_wait: Connection to 3ffe:ffff::dead:beef port 22: Broken pipe

$ for TRIAL in `seq 2 5`; do tail -n 1 <output.$TRIAL.log; done
21:07:34        packet_write_wait: Connection to 3ffe:ffff::dead:beef port 22: Broken pipe
21:23:11        packet_write_wait: Connection to 3ffe:ffff::dead:beef port 22: Broken pipe
21:38:47        packet_write_wait: Connection to 3ffe:ffff::dead:beef port 22: Broken pipe
21:54:24        packet_write_wait: Connection to 3ffe:ffff::dead:beef port 22: Broken pipe

$ for TRIAL in `seq 1 5`; do cat <dd.$TRIAL.log; echo; done
190336000 bytes (190 MB, 182 MiB) copied, 925.446 s, 206 kB/s
190317568 bytes (190 MB, 182 MiB) copied, 925.541 s, 206 kB/s
190258176 bytes (190 MB, 181 MiB) copied, 925.136 s, 206 kB/s
190503936 bytes (191 MB, 182 MiB) copied, 926.104 s, 206 kB/s
190619648 bytes (191 MB, 182 MiB) copied, 926.24 s, 206 kB/s

On the remote side, the executed commands still hang, so the server does not detect that the session has dropped. This tells me this is a client issue:

Code:

$ ssh remotehost ps ax | grep -F 'cat >/dev/null'
 6999 ?        Ss     0:00 bash -c cat >/dev/null
13084 ?        Ss     0:00 bash -c cat >/dev/null
13425 ?        Ss     0:00 bash -c cat >/dev/null
13593 ?        Ss     0:00 bash -c cat >/dev/null
13779 ?        Ss     0:00 bash -c cat >/dev/null

If I rate-limit the data I send to SSH, the SSH sessions no longer drop, so I use that as my workaround until I have found a solution:

Code:

# ip6tables -D INPUT -p tcp -m hashlimit --hashlimit-above 200kb/s -m tcp --destination 3ffe:ffff::dead:beef --dport 22 -j DROP

# ip6tables -A INPUT -p tcp -m hashlimit --hashlimit-above 300kb/s -m tcp --destination 3ffe:ffff::dead:beef --dport 22 -j DROP

$ while TRIAL in `seq 6 10`
> do
>     yes | dd status=progress if=/dev/stdin bs=1k count=$((500*1024)) 2>dd.$TRIAL.log |
>     pv -q -L 200k | ssh -vvv remotehost 'cat >/dev/null' 2>&1 |
>     while read LINE
>     do
>         printf '%s\t%s\n' `date +%H:%M:%S` "$LINE"
>     done | tee output.$TRIAL.log
> done

$ tail <output.6.log
22:48:14
22:48:14        debug1: channel 3: free: port listener, nchannels 1
22:48:14        debug3: channel 3: status: The following connections are open:
22:48:14
22:48:14        debug1: fd 0 clearing O_NONBLOCK
22:48:14        debug1: fd 1 clearing O_NONBLOCK
22:48:14        debug1: fd 2 clearing O_NONBLOCK
22:48:14        Transferred: sent 524986928, received 94512 bytes, in 2925.9 seconds
22:48:14        Bytes per second: sent 179429.8, received 32.3
22:48:14        debug1: Exit status 0

$ for TRIAL in `seq 6 10`; do tail -n 1 <dd.$TRIAL.log; done
524288000 bytes (524 MB, 500 MiB) copied, 2919.03 s, 180 kB/s
524288000 bytes (524 MB, 500 MiB) copied, 2559.03 s, 205 kB/s
524288000 bytes (524 MB, 500 MiB) copied, 2644.5 s, 198 kB/s
524288000 bytes (524 MB, 500 MiB) copied, 2559.03 s, 205 kB/s
524288000 bytes (524 MB, 500 MiB) copied, 2559.01 s, 205 kB/s

This tells me that the SSH session drops have something to do with SSH control messages not getting through, but why would this be the case?

The TCP keep alive packets are likely dropped, but the kernel does not send the first keep alive packet until after two hours, long after my SSH sessions have dropped after 15 minutes, so this is unlikely to be the cause of my problem (/proc/sys/net/ipv4/tcp_keepalive_time does apply to IPv6 as well as IPv4):

Code:

$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200

The SSH client is set to time out if it does not receive response to three consecutive SSH keep alive messages, which are sent in 5 minute intervals. According just to the timing, this seems to be a much more likely cause, but the SSH debug output above does not indicate that these messages are being sent by the client. There is only a bunch of messages at the beginning (authentication, opening a channel, etc.) and then a disconnect message after 15 minutes with nothing in between:

Code:

$ grep 'send packet' <output.1.log
20:36:33        debug3: send packet: type 20
20:36:33        debug3: send packet: type 30
20:36:33        debug3: send packet: type 21
20:36:33        debug3: send packet: type 5
20:36:33        debug3: send packet: type 50
20:36:33        debug3: send packet: type 50
20:36:33        debug3: send packet: type 50
20:36:33        debug3: send packet: type 90
20:36:33        debug3: send packet: type 80
20:36:33        debug3: send packet: type 98
20:36:33        debug3: send packet: type 98
20:51:58        debug3: send packet: type 1

Not only that, but if I replace the remote command cat >/dev/null with tee >/dev/null, the output is echoed back to me, but the SSH session still drops, so there does not seem to be an issue of not being able to receive server responses. In the opposite direction, the server does not send SSH keep alive messages at all:

Code:

$ cat .ssh/config
Host remotehost
User username
Hostname 3ffe:ffff::dead:beef
ControlMaster auto
ControlPath /var/tmp/remotehost.socket
TCPKeepAlive yes
ServerAliveInterval 300
ServerAliveCountMax 3

$ ssh remotehost cat /etc/ssh/sshd_config
PasswordAuthentication no
TCPKeepAlive yes
ClientAliveInterval 0

I will appreciate your input. After reading the sticky, I figured this question would qualify as a networking question, because the issue is related to finetuning iptables and SSH, but please let me know if you feel the question would feel more at home elsewhere.

BillT440 · 05-10-2018, 04:07 PM

I'm not really sure what you're doing - are you leaving the session idle and you drop after some time? If so, maybe try keeping the ssh session going by periodically sending data through the encrypted channel. I have to do this at work because some systems are in a DMZ and the firewall team sets idle timers lower than what the OS keepalives would do.

from https://patrickmn.com/aside/how-to-k...-ssh-sessions:
To enable the keep alive system-wide (root access required), edit /etc/ssh/ssh_config; to set the settings for just your user, edit ~/.ssh/config (create the file if it doesn’t exist). Insert the following:
Host *
ServerAliveInterval 300
ServerAliveCountMax 2
You can also make your OpenSSH server keep alive all connections with clients by adding the following to /etc/ssh/sshd_config:
ClientAliveInterval 300
ClientAliveCountMax 2
These settings will make the SSH client or server send a null packet to the other side every 300 seconds (5 minutes), and give up if it doesn’t receive any response after 2 tries, at which point the connection is likely to have been discarded anyway.

witiko · 05-11-2018, 09:53 AM

Quote:

Originally Posted by BillT440

I'm not really sure what you're doing - are you leaving the session idle and you drop after some time?

Thank you for the response. I am not leaving the session idle, I am sending data to the other host (using the yes command) as fast as the SSH flow control, and my iptables rule (rate limiting to 200 KiB/s) allows. This causes the session to drop after fifteen minutes. If I send data at 100 KiB/s, then the session does not drop, which leads me to believe that the keepalive messages are not getting through. I am investigating why that happens.

Quote:

Originally Posted by BillT440

To enable the keep alive system-wide (root access required), edit /etc/ssh/ssh_config; to set the settings for just your user, edit ~/.ssh/config (create the file if it doesn’t exist). Insert the following: ...

As you can see in the last code block of the original post, I configured the client to send both TCP, and SSH keepalive messages.

witiko · 05-14-2018, 04:42 AM

Due to lack of attention, I cross-posted this question to superuser.com.

Turbocapitalist · 05-14-2018, 11:19 AM

Does it make a difference if you use the REJECT target instead of DROP?

Code:

# ip6tables -A INPUT -p tcp -m hashlimit --hashlimit-above 200kb/s \
        -m tcp --destination 3ffe:ffff::dead:beef --dport 22 -j REJECT

Also, have you looked at tc instead for limiting the outgoing bandwidth? I mostly use PF but recall that was the way to do traffic shaping on GNU/Linux.

witiko · 05-17-2018, 07:49 PM

Quote:

Originally Posted by Turbocapitalist

Does it make a difference if you use the REJECT target instead of DROP?

Not really, the results are the same.

Quote:

Originally Posted by Turbocapitalist

Also, have you looked at tc instead for limiting the outgoing bandwidth? I mostly use PF but recall that was the way to do traffic shaping on GNU/Linux.

I am vaguely familiar with it. It lets you do a lot more with respect to trafic shaping, but in this simple case I did not see an advantage over a simple iptables rule.