iptables output filter dropping packets before correct routing decision is made
Hi,
This is a question/observation/discussion regarding when routing decisions are made in a context of multiple routing tables, iptables "mangling" and filtering.
My experience (as shown below) indicates that routing decisions based on rules using "fwmark" take place after the OUPUT Filter is traversed and that this is at odds with the documentation which suggests that a second and final routing decision takes place between OUTPUT Mangle and OUTPUT Filter. If anyone is knowledgeable about this subject I would be very interested in your point of view.
I am using iptables and the Policy Routing Database to make locally generated packets use a different default route to that specified in table "main". Whilst this works, I find that filter rules specified in the OUTPUT table block some packets which I believe should not be blocked. My understanding of routing decisions for locally generated packets is:
1) Routing decision No 1.
2) mangle OUTPUT.
3) nat OUTPUT.
4) Routing decision No 2.
5) filter OUTPUT.
...
My setup is as follows:
# Use CONNTRACK to mark New locally generated connections.
iptables -t mangle -A OUTPUT ! -d 127.0.0.1/32 -m state --state NEW -j CONNMARK --set-xmark 0xff/0xffffffff
#
iptables -t mangle -A OUTPUT -m connmark --mark 0xff -j MARK --set-xmark 0x1/0xffffffff
iptables -t mangle -A PREROUTING -m connmark --mark 0xff -j MARK --set-xmark 0x1/0xffffffff
ip rule add fwmark 1 table alt1
ip route add 192.168.151.1/32 dev tun3 table alt1
ip route add default via 192.168.151.1 dev tun3 src 192.168.152.1 table alt1
iptables -t nat -I POSTROUTING -o tun3 -j SNAT --to 192.168.152.1
echo 0 >/proc/sys/net/ipv4/conf/all/rp_filter
On OUTPUT filter I have:
iptables -P OUTPUT DROP
iptables -I OUTPUT -o tun3 -j ACCEPT
iptables -I OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -I OUTPUT -d 10.65.33.39 -j LOG --log-prefix _FILTER_OUTPUT_
Tailing the log while telnetting to 10.65.33.39 yields:
# Mar 5 13:26:45 srv-internet kernel: OUTPUT_FILTERIN= OUT=eth0 SRC=10.65.47.193 DST=10.65.33.39 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=36300 DF PROTO=TCP SPT=42294 DPT=23 WINDOW=14600 RES=0x00 SYN URGP=0 MARK=0x1
Mar 5 13:26:46 srv-internet kernel: OUTPUT_FILTERIN= OUT=eth0 SRC=10.65.47.193 DST=10.65.33.39 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=36301 DF PROTO=TCP SPT=42294 DPT=23 WINDOW=14600 RES=0x00 SYN URGP=0 MARK=0x1
Mar 5 13:26:48 srv-internet kernel: OUTPUT_FILTERIN= OUT=eth0 SRC=10.65.47.193 DST=10.65.33.39 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=36302 DF PROTO=TCP SPT=42294 DPT=23 WINDOW=14600 RES=0x00 SYN URGP=0 MARK=0x1
The output shows that all the packets are marked correctly (0x1). However, a final routing decision has not yet taken place because packets are being sent out through the default gateway of the main routing table. According to available documentation, the second routing decision should take place before the OUTPUT FILTER is traversed. This would yield an outbound interface of "tun3" which would be "ACCEPTED" by the OUTPUT filter.
If I add the two following rules:
iptables -A OUTPUT -d 10.65.33.39 -j ACCEPT
iptables -t mangle -I POSTROUTING -d 10.65.33.39 -j LOG --log-prefix POSTROUTING_MANGLE
The packets are allowed through OUTPUT Filter after which the log output below suggests that another (final ?) routing decision takes place:
# Mar 5 13:40:25 srv-internet kernel: OUTPUT_FILTERIN= OUT=eth0 SRC=10.65.47.193 DST=10.65.33.39 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=59041 DF PROTO=TCP SPT=34613 DPT=22 WINDOW=14600 RES=0x00 SYN URGP=0 MARK=0x1
Mar 5 13:40:25 srv-internet kernel: POSTROUTING_MANGLEIN= OUT=tun3 SRC=10.65.47.193 DST=10.65.33.39 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=59041 DF PROTO=TCP SPT=34613 DPT=22 WINDOW=14600 RES=0x00 SYN URGP=0 MARK=0x1
The second entry generated in mangle Postrouting shows a routing decision has been made in terms of the output interface. NB - the SRC IP is yet to be changed in the NAT Postrouting.
Interestingly, if I change the setup as follows:
# Remove the OUTPUT filter rule which allowed this to work.
iptables -D OUTPUT -d 10.65.33.39 -j ACCEPT
# Add a different kind of policy routing rule.
ip rule add to 10.65.33.39 table alt1
This works without any interference from the OUTPUT filter, as evidenced by the following log entries:
# Mar 5 13:49:38 srv-internet kernel: OUTPUT_FILTERIN= OUT=tun3 SRC=192.168.152.1 DST=10.65.33.39 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=6743 DF PROTO=TCP SPT=53322 DPT=22 WINDOW=14600 RES=0x00 SYN URGP=0 MARK=0x1
Mar 5 13:49:38 srv-internet kernel: POSTROUTING_MANGLEIN= OUT=tun3 SRC=192.168.152.1 DST=10.65.33.39 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=6743 DF PROTO=TCP SPT=53322 DPT=22 WINDOW=14600 RES=0x00 SYN URGP=0 MARK=0x1
Mar 5 13:49:38 srv-internet kernel: OUTPUT_FILTERIN= OUT=tun3 SRC=192.168.152.1 DST=10.65.33.39 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=6744 DF PROTO=TCP SPT=53322 DPT=22 WINDOW=229 RES=0x00 ACK URGP=0 MARK=0x1
Mar 5 13:49:38 srv-internet kernel: POSTROUTING_MANGLEIN= OUT=tun3 SRC=192.168.152.1 DST=10.65.33.39 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=6744 DF PROTO=TCP SPT=53322 DPT=22 WINDOW=229 RES=0x00 ACK URGP=0 MARK=0x1
Mar 5 13:49:38 srv-internet kernel: OUTPUT_FILTERIN= OUT=tun3 SRC=192.168.152.1 DST=10.65.33.39 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=6745 DF PROTO=TCP SPT=53322 DPT=22 WINDOW=229 RES=0x00 ACK URGP=0 MARK=0x1
Mar 5 13:49:38 srv-internet kernel: POSTROUTING_MANGLEIN= OUT=tun3 SRC=192.168.152.1 DST=10.65.33.39 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=6745 DF PROTO=TCP SPT=53322 DPT=22 WINDOW=229 RES=0x00 ACK URGP=0 MARK=0x1
Conclusion : When using multiple routing tables and policy routing, routing decisions are not all taken at the same time. Routing decisions using "to PREFIX" appear to be taken earlier that routing decisions using "fwmark".
I have one additional point of curiosity: The SRC IP address is correct in the second example but not in the first. Using fwmark, it is necessary to SNAT to get the correct SRC IP but using "to PREFIX" the the LOG output written as early as the OUTPUT filter shows the correct SRC IP.
Regards,
Gerry.
|