Strange behaviour in a multi network router.

fdelvall · 08-06-2010, 10:42 AM

Hi.
We are working on a project based entirely on Centos 5.5 servers.

Our problematic host (Centos 5.5 server) is standing between our Lan and DMZ segments. It seems buggy, or at least it's administrator (me

The host has 5 interfaces. Eth0 has 3 subinterfaces.

Eth0 is the default gateway for the LAN, and its subinterfaces provide access to different Squid instances.

Eth4 is our LAN interface devoted to database connections.

Eth1 is our DMZ interface devoted to the DMZ servers, thru which these request database connections .

The traffic has been segmented as to provide traffic-shaping to the eth0 interfaces (internet access) while keeping the database interface running with full bandwidth.
Thus it is expected to preserve the database connection running from eth1 thru eth4 even when users might saturate eth0.

The weirdness is such that most of the time other Unices, though having static routes for eth4's IP, answer through eth0 even while eth0 is not their default gateway. This is the case for HP-UX servers running Informix databases.

# netstat -rn
Routing tables
Destination Gateway Flags Refs Interface Pmtu
127.0.0.1 127.0.0.1 UH 0 lo0 4136
172.16.200.105 172.16.200.105 UH 0 lan0 4136
172.16.200.106 172.16.200.106 UH 0 lan1 4136
192.168.200.18 172.16.200.17 UGH 0 lan1 0
172.16.0.0 172.16.200.106 U 2 lan1 1500
172.16.0.0 172.16.200.105 U 2 lan0 1500
127.0.0.0 127.0.0.1 U 0 lo0 0
default 172.16.100.100 UG 0 lan1 0

Curiously enough, other Centos 5.5 w/MySQL sharing a similar routing table work seamlessly.

root@Linux05# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.200.16 172.16.200.17 255.255.255.240 UG 0 0 0 eth0
172.16.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
0.0.0.0 172.16.100.100 0.0.0.0 UG 0 0 0 eth0

Windows workstations work strangely.
SSH is expected to go thru the default gateway, but they try randomly eth4, and finally route thru eth0 as expected. Windows workstations have no route to the remote eth4, just a default gateway to remote eth0.
Ubuntu and Fedora workstations, curiously enough, persistently try to route ssh traffic thru eth4, which is blocked. These workstations have no more than a default gateway to the remote eth0.

When the problematic host is reboot, services may work as expected for some minutes. But then again, traffic from database servers switch to eth0 and ssh clients to eth4.

As our firewall permits just the intended traffic, when the bug shows we run out of production and administration capability as well.

Servers run over VmWare Enterprise/Vsphere Enterprise Plus on an HP Blade system, on xeon processors.

Other 8 virtualised Centos 5.5 servers work fine, route well, and their interfaces don't overlap networks.

Eth0 and eth4 belong to the same network. Eth0:0, eth0:1 and eth0:2 are reserved for squid access.

In brief:
The DMZ tomcat servers initiate BD connections. The BD receives requests thru eth4 and reply thru eth0 (should go thru eth4), #1 problem.
The Lan workstations initiate SSH connections. The requests go thru eth4 (should go thru eth0), #2 problem.

Any help would be greatly appreciated.

Note: 172.16.0.0/16 is our Lan segment.

172.16.200.101 172.16.200.103 172.16.200.105 172.16.200.112 are HP-UX Informix database servers.

172.16.200.25 is Centos 5.5 mySQL database server.

Tranks.

Fredrick.

Note: configuration is described in attachment.

fdelvall · 08-10-2010, 07:06 AM

Found the problem resides in either the HP blade switches or VMWare, as we've found that pinging both eth0 and eth4 from our lan result in same mac address resolved. Quite mystifying,as VMware and the Centos machines show different mac addresses for both interfaces.

F.