LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices

Reply
 
Search this Thread
Old 05-06-2007, 10:10 PM   #1
RobynWoodall
LQ Newbie
 
Registered: May 2007
Distribution: CentOS, Ubuntu
Posts: 7

Rep: Reputation: 0
Question I wish "dead gateway" would honour traffic it is routing.


I wish "dead gateway" would honour traffic it is routing.

I am 99% there but can't quite get the last bit to work. Please help!


The problem:

A client has two networks. For High Availability (HA) reasons they have two routers/firewalls between the networks, both active with different IPs addr and rely on "dead gateway" detection by PCs and Srvs on each side to work out if the primary router is down and switch over. This more or less works (Linux boxes are quick but Windows takes a minute or two).

However client has some devices that have a very limited/closed TCP/IP stack and as such only support a single default gateway (or any addition to the routing table). Unfortunately they are the most important devices to the business.

To work around this I put a new Linux box on the same side of the routers as the problematic devices (and applied a whole bunch of HA to it). I then pointed these problematic devices gateways at the new Linix box. The new Linux Box has both router gateways setup. The idea being this effectively gives these single gateway devices dual default gateways and a way to participate in the existing dead gateway detection fail over the client has going here.

The new Linux box has a single NIC with ipforward=1 (on) and all the "send_redirects"=0 (off). In the past I have had a lot of success with this router on a stick approach (but then I have not relied on dead gateway before).

While everything is up this works. According to "iptraf" traffic from the "devices" goes to the new Linux box and then gets routed through the default router/firewall to servers in the other network.

However when the default router is failed all other PC and Server eventually detect the dead gateway and switch over to the secondary router/firewall, including traffic to and from the new Linux box. However the single gateway problematic devices still don't work. Their traffic gets sent to the new Linux box, the new Linux box has worked out via dead gateway detection and switched over for its own traffic but will not honour traffic being routed through it by passing it on to the secondary (active) router/firewall (see below for detail).


The why are you doing this way question:

Yes it is a patch to a bad design (legacy design). In the short term I can't change the basic setup. The two firewalls/routers have been in for years and have 100's of odd looking rules that would take months to untangle. The client is a genuine 24/7 operation. I was hoping this way I could leave the system as much as possible as is and just fix this one issue with minimum impact/change to the business. I am open to other suggestions but there are politics and other issues that would make this posting way longer if I left it that open.

I could script something but I want Linux inbuilt functionality, as is, to work (so it is not on my neck if script causes issues).


The config:

Example Problematic device: 10.9.1.100/24 with default gateway 10.9.1.3.

New Linux Box: eth0 10.9.1.3/24 with default gateways 10.9.1.133 and 10.9.1.14 (both metric 0). Sysctl.conf also sets up ip_forward=1 but send_redirects=0 x all + default + eth0.

Primary Router/Firewall : eth1 10.9.1.133/24, eth0 10.6.1.133/24 (for testing in lab these are un-firewall-ed routers)
Secondary router/Firewall : eth1 10.9.1.14/24, eth0 10.6.1.14/24

Example Server : 10.6.1.25/24 with dual gateways 10.6.1.133 and 10.6.1.14 (W2K3 and metric both 0).


What else have I tried:

The testing has all been done in a lab with clean built machines to first show it is possible (or not in this case).

I tried two NICs on the Linux Box, but both nics still in the same subnet. The problematic devices default gateway to one nic and the other with routes to the two firewalls/gateways. Same result.

As the new problematic Linux box I have tried both Ubuntu 7.02 (server) and CentOS 4.4 (minimum install). Same result.

I know ping is not enough for “dead gateway detection” so have auto ftp scripts forcing tcp traffic to assist the gateway switch.

"ip route flush table cache" does not clear the problem.


The errors:

First up everything works if the primary router/firewall is active. Secondly the new Linux box its self does detect the dead gateway and switch over to the secondary route/firewall for any of its own traffic. Thirdly everything works if I fail the primary router AND manually delete the route via 10.9.1.133 on the new Linux box (but of cause I want a automatic fail over).

However once the primary router/firewall is down, ftp (or ping, telnet etc) from problematic devices to server in the other network reports:

"From 10.9.1.3 icmp_seq=x Destination Host Unreachable" (even though a ping from 10.9.1.3 works its self).

On the new linix box "ip route show table cache" shows:

"10.6.1.25 from 10.9.1.100 via 10.9.1.133 dev eth0 src 10.9.1.3 cache <src-direct> expires -xxsec mtu..." (i.e still holding the old route).

but also in cache table it reads:

"10.6.1.25 from 10.9.1.3 via 10.9.1.14 dev eth0 ..." (i.e. for its own traffic it has changed gateways so why not for other devices routing through?).


Summary:

I could be barking up the wrong tree but it looks to me - reading the "ip route show..." results, like "dead gateway detection" spots the primary gateway is down and switches gateway for its own traffic but does not do so for any traffic routed through its self. Is there a way to get "dead gateway detection" to fail over all traffic or am I asking too much of “dead gateway detection” with a single NIC?
 
Old 05-07-2007, 12:12 AM   #2
RobynWoodall
LQ Newbie
 
Registered: May 2007
Distribution: CentOS, Ubuntu
Posts: 7

Original Poster
Rep: Reputation: 0
Smile

...continuing with my R&D I built a Windows 2003 box with RRAS activated and it works. It fails over (slowly) and routes traffic that is not its own through the new route.

So now I at least have a option that works however I still would prefer a working linux salution (Windows makes a expensive and resource hungry router).

Last edited by RobynWoodall; 05-08-2007 at 04:25 PM.
 
Old 05-08-2007, 04:25 PM   #3
RobynWoodall
LQ Newbie
 
Registered: May 2007
Distribution: CentOS, Ubuntu
Posts: 7

Original Poster
Rep: Reputation: 0
... and continuing more R&D CentOS 5 does not work either.
 
  


Reply

Tags
availability, dead, detection, failover, gateway, high, routing


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Unix: Pretty Spry for "Dead." LXer Syndicated Linux News 0 12-30-2006 06:33 PM
I get "[F1] for setup, [F2] to load default settings." And the Keyboard is dead. michael! Linux - Hardware 1 10-12-2006 04:40 AM
"vsFTPD is dead but syskey locked" in RH9 services. SPo2 Linux - Networking 3 06-26-2006 11:16 PM
have to ping gateway to "kickstart" net connection and routing bPrompter Linux - Networking 0 01-19-2005 02:56 PM


All times are GMT -5. The time now is 09:26 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration