My problems here hasn't really been solved, I just worked around them in a way I'm not happy with, so I'm returning to it. I now know why the marking stuff I tried did not work, I had failed to understand that the routing itself can't see the connection mark, only the packet mark. And I never tried writing the connection mark to the packet mark.
However, I am still in the dark as to why the return packets are not routed correctly, so I'd like to revisit this. The one crucial point here is the routing of the return packets. After browsing literally hundreds of web pages, I still haven't found an answer to exactly when SNAT and DNAT is "reversed" (or un-SNATed and un-DNATed if you like).
When the first packet enters the ppp0 interface, the DNAT rule triggers, and rewrites the destination address. Then the routing decision is made, and the packet should leave on the eth1 interface. As it hits the postrouting, the masquerading rewrites the source address to the eth1 address, and the packet leaves eth1. So far so good, and the connection entry looks like this:
Code:
udp 17 28 src=37.38.39.40 dst=10.10.10.10 sport=12345 dport=51000 [UNREPLIED] src=192.168.100.1 dst=192.168.100.10 sport=51000 dport=12345 mark=0 use=1
Now the server replies as expected and the conntrack entry changes to:
Code:
udp 17 27 src=37.38.39.40 dst=10.10.10.10 sport=12345 dport=51000 src=192.168.100.1 dst=192.168.100.10 sport=51000 dport=12345 mark=0 use=1
The [UNREPLIED] state is removed, since conntrack was able to match the reply.
The reply packet (as it enters eth1) has a source address of 192.168.100.1, and a destination address of 192.168.100.10 (the router's eth1 address). At this point, the packet needs to have both the source and destination addresses changed, but in which order does this happen, and at which point is the routing table consulted?
The routing rules are:
Code:
0: from all lookup local
2000: from 10.10.10.10 lookup ppproute
32766: from all lookup main
32767: from all lookup default
And if I ask ip route how it would route a packet for 37.38.39.40 from 10.10.10.10:
Code:
ip route get to 37.38.39.40 from 10.10.10.10
37.38.39.40 from 10.10.10.10 dev ppp0
cache
..it tells me it would in fact use the ppp0 interface (which is what I want).
For the packet flow outlined above, this does not happen, the reply instead leaves out eth0. What this means is obviously that the routing decision is made before both the source and destination addresses are rewritten/reversed.
So, can someone tell me
exactly when the packet addresses are changed within netfilter/nftables/conntrack for a packet flow that is forwarded through a router and where both DNAT and SNAT (masquerading) is used? And at which points the routing decisions happen?
One more minor point: The rule 2000 above does in fact work when I ping the ppp0 interface from the outside, and ensures that the ping reply goes back where it should. If I remove the rule, the ping reply goes out eth0 (the default route) and is lost. I'm mentioning this just to clarify that the routing rule itself seems to work as intended when the source address is 10.10.10.10.