I just noticed a bug on one of my NAT boxes this weekend during a brief Sprint outage, which I think is partially due to an upgrade on one of my older NAT machines, and now I'm getting the 'martian source' errors (cool error msg
It was running 2.2.14, now it's running 2.2.19. This machine has 3 NICS. One is connected to the internal net, the other two are connected to our seperate Internet connections. The idea is that if our primary carrier drops, a script will create a new default gw route for the second carrier. This has worked great.
Now, for some reason this system is now having problems keeping outside traffic seperate.
If I ping either IP from any machine internal or external, all is fine. If however I force the ping over a specific interface that doens't have the default route (ping -n <w.x.y.z> -Ieth2), the icmp traffic leaves the NAT box, successfully finds the destination, comes back, and then when the NAT box tries to interpret the result freaks out with the martian source error.
It will always do this for the adapter who's default gw is not being used. So, if I have a default route for eth1 and no default route for eth2, then the martian source appears for eth2. If however I create a new default gw for eth2 and leave the one for eth1 in place (such that it will now use eth2 for all normal originating traffic) it will work fine for eth2 and generate martian source for eth1 for ping -n <w.x.y.z> -Ieth1.
This used to work w/ 2.2.14, and it seems like the upgrade hosed something. But, what's weird is that I have another box with 2.2.14 that I haven't upgraded yet that has an almost identicle configuration and it works fine. However, it's a P2/400, while the problem box is a P166. So, either this is a hardware glitch, or there is some setting screwed up on this wonky NAT box... any ideas? I did try to flush the route cache too
Thanks,
Jon