LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   Server on multiple VLANs server not responding to pings from non-local subnets (https://www.linuxquestions.org/questions/linux-networking-3/server-on-multiple-vlans-server-not-responding-to-pings-from-non-local-subnets-819880/)

markfox 07-14-2010 10:17 PM

Server on multiple VLANs server not responding to pings from non-local subnets
 
I've got a machine running Ubuntu Server that is on several VLANs. Each VLAN has its own subnet and the server has an address on each subnet. The switches are set to allow tagged traffic to the server for each VLAN that it is on. Switch ports ending with workstations are given untagged ports on whatever VLAN is appropriate. Workstations are given addresses on a subnet for each VLAN via DHCP. All this works great and hosts on any subnet/VLAN can access the server as normal via its address on that subnet/VLAN.

Accessing the machine by its address on a non-local subnet is where I run into a problem. Inter-subnet traffic has to go through a router, which has been set up appropriately. Running tcpdump on the server and pinging it from a workstation on a subnet, using its address on a different subnet, shows the server receives the ping, but sends no response:

Code:

sudo tcpdump -i vlan4 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vlan4, link-type EN10MB (Ethernet), capture size 96 bytes
21:00:33.692386 IP 192.168.48.17 > 192.168.49.130: ICMP echo request, id 34365, seq 1, length 64
21:00:34.697374 IP 192.168.48.17 > 192.168.49.130: ICMP echo request, id 34365, seq 2, length 64
21:00:35.697306 IP 192.168.48.17 > 192.168.49.130: ICMP echo request, id 34365, seq 3, length 64
21:00:36.697327 IP 192.168.48.17 > 192.168.49.130: ICMP echo request, id 34365, seq 4, length 64

I've been able to repeat this on several different machines, all running Ubuntu. I haven't tried to repeat this on a different distribution.

I've checked the iptables chains and they are all set to ACCEPT:

Code:

Chain INPUT (policy ACCEPT)
target    prot opt source              destination

Chain FORWARD (policy ACCEPT)
target    prot opt source              destination

Chain OUTPUT (policy ACCEPT)
target    prot opt source              destination

Any ideas why the machine can receive traffic from a non-local subnet, but not reply?

And if you are wondering why I don't just let the router do it's job and have the server on a single VLAN, it's for performance. The server is connected to the network by an aggregated link (ie. bonding). So it has a 2X gigabit to talk to workstations. The router is constrained to a single gigabit link.

dkm999 07-14-2010 10:42 PM

I am not using Ubuntu, but another Linux distro, but I think they all treat the 'Net the same. You might check /proc/sys/net/ipv4/icmp_echo_ignore_all, to see if it is set to 1. It should be 0 to allow ping (and pong). Making that change so that it survives a reboot in Ubuntu is outside my comfort zone, but I'm sure you can find that using Google.

markfox 07-15-2010 12:36 AM

Thanks for the suggestion. I checked and it is set to 0.

I can definitely ping from a subnet to the same subnet and get a response. It's when two subnets/VLANs are involved that it breaks down.

I'm wondering if the problem is that the IP stack is bothered by the traffic coming from the router (ie. via a gateway) when there is a direct connection. I can't see why that would be a problem, but there it is.


Quote:

Originally Posted by dkm999 (Post 4033474)
I am not using Ubuntu, but another Linux distro, but I think they all treat the 'Net the same. You might check /proc/sys/net/ipv4/icmp_echo_ignore_all, to see if it is set to 1. It should be 0 to allow ping (and pong). Making that change so that it survives a reboot in Ubuntu is outside my comfort zone, but I'm sure you can find that using Google.


dkm999 07-15-2010 11:27 AM

The next area of investigation I would recommend is your routing table, especially if your server is not able to exchange data at all with non-local machines. Depending on your VLAN topology, you may need to let the server know about the subnets beyond the router, so that it will be able to route packets out the correct interface to get them to their destination. There are two ways to do this: either declare static routes for the networks beyond the router, or set up a routing protocol exchange between the router and your server. Either way, the routing table will end up with entries telling the IP stack which interface to use for traffic destined for one of these non-local networks.

Good luck, and write again if this only gets you part way there.

markfox 07-15-2010 06:52 PM

Yup. The routing table is good. All of the VLANs are accounted for and the address/netmask of each interface is as it should be.

But I think I'm homing in on the problem.

The key is that the server receives pings (or any other sort of traffic), but doesn't send anything back in reply. And the server doesn't even try to send it (ie. tcpdump on the server shows the request, but no reply). This only occurs when traffic is arriving via the router and would leave via a different interface. (I set up two Linux boxes so that they could connect directly to all of the VLANs, and everything worked as it should, but they didn't even need the router.)

If a ping request arrives at the server by way of the router, on which interface would the reply go back? The server will want to reply via the most direct route, the VLAN corresponding to whatever workstation sent the request. Unfortunately, that's a different interface. My guess is that that is just too much to ask of a multi-homed server. Instead, the server has to be setup to do routing or use a separate router to do the job.

I was able to duplicate the behaviour with a machine with two interfaces acting as the server. It had no knowledge of VLANs. This got me the same result. Exactly the same.

The bottom line is multi-homing can cause some fun. It would be far better to drop multi-homing and just use the router as it was intended. Since this is all coming about because my router can't do what I want (aggregating two GigE links), I'm going to explore Linux routing. Our Cisco routers are getting old anyway and I'm curious if I can get away with doing everything (routing, Samba, Squid, etc.) on one box.

Thanks all.


Quote:

Originally Posted by dkm999 (Post 4034200)
The next area of investigation I would recommend is your routing table, especially if your server is not able to exchange data at all with non-local machines. Depending on your VLAN topology, you may need to let the server know about the subnets beyond the router, so that it will be able to route packets out the correct interface to get them to their destination. There are two ways to do this: either declare static routes for the networks beyond the router, or set up a routing protocol exchange between the router and your server. Either way, the routing table will end up with entries telling the IP stack which interface to use for traffic destined for one of these non-local networks.

Good luck, and write again if this only gets you part way there.


dkm999 07-15-2010 07:06 PM

One more long-shot. I think ICMP packets should be treated like any others; do you have ip_forwarding enabled?
(/proc/sys/net/ipv4/ip_forward = 1)?

markfox 07-16-2010 08:50 AM

Yup. Sadly, no difference. I tried that early on and should have noted it.

Mind you, there might be something in Linux routing that offers a simple solution. It still seems strange to me that the server doesn't even try to send a reply to pings or any other traffic. I'm going to try to find a way of getting more logs out of the kernel's IP stack. Heck, I've even downloaded the source for my kernel and have started perusing it. I really want to understand why this doesn't work.

In other news, I've realized that I have some switches capable of 802.3ad channel aggregation. So I may be able to set the router up to bond two interfaces (Etherchannel in Cisco-speak). This is probably the best way to go.

If I ever figure out why Linux can't do this, or a way to make it possible, I'll post here.

Quote:

Originally Posted by dkm999 (Post 4034680)
One more long-shot. I think ICMP packets should be treated like any others; do you have ip_forwarding enabled?
(/proc/sys/net/ipv4/ip_forward = 1)?


markfox 07-16-2010 11:09 AM

Solved!

Since I was now sure that what I'm doing qualifies as multi-homing (in a very non-traditional sense), I started reading some Linux multi-homing HOWTOs this morning. All of them started by disabling rp_filter, which is explained very well here.

That sounds exactly like my problem so I disabled it on all of the interfaces of my server like so:

Code:

echo 0 > /proc/sys/net/iprv4/conf/eth0/rp_filter
There are are also conf/all/rp_filter and conf/default/rp_filter. I set them to 0 as well.

Bang! Everything started working.

So now I'm just trying to figure out the minimal set of rp_filter files that need to be set to 0, and how to make them stick over a reboot.


Quote:

Originally Posted by markfox (Post 4035307)
[...]

If I ever figure out why Linux can't do this, or a way to make it possible, I'll post here.


dkm999 07-16-2010 01:06 PM

Congratulations. I learned something I hadn't picked up on as well. Thanks for posting the explanation.

suprstar 01-30-2012 01:36 PM

Quote:

Originally Posted by markfox (Post 4035439)
So now I'm just trying to figure out the minimal set of rp_filter files that need to be set to 0, and how to make them stick over a reboot.

I know this is an old thread, but I just had this problem, and solved it because of this thread. And for anyone else who may have this problem in the future, add:

echo 0 > /proc/sys/net/ipv4/conf/eth0/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/eth1/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter

to your /etc/rc.local - (or /etc/init.d/rc.local I think in some distros) - this is a script that runs when the machine boots.

speculatrix 02-29-2012 05:39 PM

this thread has been very useful to me.

I have a remote server on a VPN, and I tunnel some traffic through it, so I mark local network traffic using the mangle table, then route marked traffic with a different routing table whose default route is the remote server over tun1.

I hadn't used it for some time and when I tried it again recently it had stopped working. The alernately routed packets went over to the remote server, and it sent responses back but they simply disappeared. It turned out to be the rp_filter settings. To fix it I turned rp_filter on for everything individually, then off for "all" and the tun device - the /proc/sys/net/ipv4/conf/all/rp_filter overrides all the others.

Code:

#!/bin/bash

echo "Before..."
for X in /proc/sys/net/ipv4/conf/*/rp_filter; do echo -n "$X " ; cat $X; done

echo "Turning on rp_filter for individuals, then removing for all and tun1"
for X in /proc/sys/net/ipv4/conf/*/rp_filter; do echo 1 > $X; done
echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/tun1/rp_filter

echo "After..."
for X in /proc/sys/net/ipv4/conf/*/rp_filter; do echo -n "$X " ; cat $X; done


speculatrix 03-04-2012 08:28 AM

I asked some other linux experts on this, as I thought there must be a better way to do it. I haven't tried it yet but it seems what I should do is copy the main routing table to the alternate table, and then set the mark on packets coming in on the tun device. And finally do "sysctl net.ipv4.conf.all.src_valid_mark=1" so that the kernel obey the mark when checking the reverse path.

speculatrix 03-04-2012 08:51 AM

Yup, just tried it.

so, basically, copy main table to newtable except the default, and add a different default, add packet marking..

Code:

echo "4 newtable" >> /etc/iproute2/rt_tables

ip route show table main | grep -Ev ^default \
  | while read ROUTE ; do
  ip route add table newtable $ROUTE
  done

ip rule add fwmark 4 table newtable

# a.b.c.d is the remote openvpn address
ip route add table newtable default via a.b.c.d dev tun0


# w.x.y.z is the internal host whose traffic is sent to vps
iptables -t mangle -I PREROUTING -i eth0 -s w.x.y.z -j MARK --set-mark 4

# return traffic over openvpn is marked
iptables -t mangle -I PREROUTING -i tun0 -j MARK --set-mark 4

# tell kernel to use packet mark so it stops rp_filter from quietly dropping packets
sysctl net.ipv4.conf.all.src_valid_mark=1


JacobOkanta 05-24-2021 12:45 PM

I know this is quite an old post, however I found this on Google searching for a similar problem with a different solution. The only difference seems to be that I wasn't multihoming and the server could communicate across subnets just not to a certain subnet. As it turns out I am running alot of Docker containers on the server and one of the Docker networks took the same subnet that the IOT devices were using. Docker takes very large subnets apparently with the 255.255.240.0 netmask, I am currently looking for a way to "blacklist" certain subnets to docker, but for now you can set the subnet in the compose or config files. Hopefully this will keep the next person from banging their head on a desk for 3 days.

GammaGames 02-02-2024 03:59 PM

Thank you JacobOkanta!!
 
I know, re-necroing an old thread, but this is the first result when you google "linux server not replying to certain subnet"
Thank you so much JacobOkanta! I had a test docker network hanging around after I thought I'd turned it all down with compose. I spent most of the day mucking around on the command line trying to find the right question to ask, when the solution was a simple `docker system prune`


All times are GMT -5. The time now is 09:09 PM.