LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 05-21-2021, 03:37 PM   #1
jmgibson1981
Senior Member
 
Registered: Jun 2015
Location: Tucson, AZ USA
Distribution: Debian
Posts: 1,148

Rep: Reputation: 393Reputation: 393Reputation: 393Reputation: 393
Dual wan ip tables. One wan is abysmally slow.


This is the script that runs on boot for my debian based router.

https://gitlab.com/jmgibson1981/scri.../homerouter.sh

I'm unsure how to chase this down. Due to my location internet sucks. The one wan is for priority traffic. My wifes work, zoom, and in my case world of warcraft. These things all work flawlessly all the time.

Everything else which defaults to the other wan runs terrible. Today it took me 10 minutes to pay 3 credit card bills due to slow access and timeouts. I've called this isp numerous times. I have a hard time believing the service is like this for all their customers. I'm hoping it's not my router. Can anyone have a look at the script. Maybe I overlooked something.

*EDIT* Not on phone now. Posting routing table as it sits.

Code:
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         10.254.197.1    0.0.0.0         UG    0      0        0 enxd03745c561a2
10.88.88.0      0.0.0.0         255.255.255.0   U     0      0        0 enxd03745d06539
10.254.197.0    0.0.0.0         255.255.255.0   U     0      0        0 enxd03745c561a2
192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 enp2s0

Last edited by jmgibson1981; 05-21-2021 at 03:43 PM.
 
Old 05-21-2021, 03:59 PM   #2
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 3,345

Rep: Reputation: Disabled
If I've understood the script correctly, it's supposed to do this:
  • Traffic destined for certain ports, as well as anything coming from the host $WIFEWORKGW (192.168.1.250), are marked '100' by an iptables fwmark rule in the MANGLE chain
  • An IP rule directs packets marked '100' towards routing table $GOODGWTABLE
  • The gateway in the default routing table is $MEDIOCREGW (10.254.197.1)
  • The gateway in the alternate table is $GOODGW (10.88.88.1)
Two possible issues:

1. Whenever the script is run, it clears the default routing table but not the alternate table. If you've edited this script and run it repeatedly, the alternate table may contain a number of incorrect routes.

2. The ip route command creating the alternate table as well as the rule directing traffic to it, uses the variable $GOODGWTABLE, which contains the text "bluespan". But AFAIK, routing tables must be identified by numbers, not names.

Have you verified that traffic is indeed flowing towards the right gateways? If not, you could add a dummy rule catching, say, ICMP PINGs towards a random host and then use tcpdump to see that it actually works.
 
Old 05-21-2021, 04:20 PM   #3
jmgibson1981
Senior Member
 
Registered: Jun 2015
Location: Tucson, AZ USA
Distribution: Debian
Posts: 1,148

Original Poster
Rep: Reputation: 393Reputation: 393Reputation: 393Reputation: 393
Quote:
Whenever the script is run, it clears the default routing table but not the alternate table. If you've edited this script and run it repeatedly, the alternate table may contain a number of incorrect routes.
How can I check the alternate table? I have this script run via systemd service so if there is anything there that isn't in that script then I don't know where it's coming from unless it saves it over shutdown / reboot cycles.

I have verified it. I've used tcptraceroute on world of warcraft server via port and it goes over bluespan. same command over port 80 / 443 goes out through the normal default.

Quote:
2. The ip route command creating the alternate table as well as the rule directing traffic to it, uses the variable $GOODGWTABLE, which contains the text "bluespan". But AFAIK, routing tables must be identified by numbers, not names.
This is where I'm at. Sourced from https://tldp.org/HOWTO/Adv-Routing-H...netfilter.html along with endless other googling. I'm terrible about documenting where I find stuff :/ working on it.

Code:
root@mainrouter:/etc/iproute2/rt_tables.d# cat bluespan.conf
100 bluespan
root@mainrouter:/etc/iproute2/rt_tables.d# ip rule ls
0:      from all lookup local
32765:  from all fwmark 0x64 lookup bluespan
32766:  from all lookup main
32767:  from all lookup default

I'm perfectly ok with accepting it's the isp for this wan. I live in an odd place and service here is generally just bad if it's even available. But I do want to eliminate myself as the problem if possible. Like I said in the OP though I have a hard time believing this company is still in business if the service is this bad all the time.

I'm pretty much just waiting for Starlink. Assuming all is well they anticipate mid to late this year in my area.

Last edited by jmgibson1981; 05-21-2021 at 04:25 PM.
 
Old 05-21-2021, 04:51 PM   #4
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 3,345

Rep: Reputation: Disabled
Extreme delays like that are usually a sign of severe packet loss, timeouts, and subsequent retransmissions. Some possible reasons why that might happen:
  • An MTU blackhole somewhere along the path.

    Try pinging a host with the "-M do" option (which sets the "don't fragment" flag), and adjust the payload size with the "-s" parameter to see when it fails. Ideally, "-s 1472" should work (1472 bytes of payload + 28-byte header = 1500 bytes, which is the standard Ethernet MTU), but it is also acceptable if a router along the path returns an ICMP Type 3, Code 4 error ("Fragmentation needed and DF set") unless a lower value is used.

    What wouldn't be acceptable, is if packets over a certain size just vanish without a trace.

  • NAT issues, which in turn create reverse path problems.

    Your NAT MASQUERADE rule seem fine, but you may want to double-check that no packets exit the outbound interface with a 192.168.1.n source address.

  • IP/netmask issues.

    I noticed that your gateways are both in the 10.0.0.0/8 network, but what are their netmasks? How is this physically connected? Are you using one or two NICs for outbound traffic? And if you're using one NIC, does it have multiple IP addresses?
Regarding 1, if you do identify an MTU blackhole, this would explain why TCP traffic is dropped en masse. It can be solved with a rule in the POSTROUTING chain of the mangle table forcibly setting the TCP MSS (Maximum Segment Size) to the highest working MTU value.

As for point 2, all outbound traffic must originate from an address that the respective routers can reach. NATing behind the outbound interface of the Linux router will work, and so will adding static routes on the other routers.

I can't say much more about point 3 without further details about your setup.
 
Old 05-21-2021, 05:27 PM   #5
jmgibson1981
Senior Member
 
Registered: Jun 2015
Location: Tucson, AZ USA
Distribution: Debian
Posts: 1,148

Original Poster
Rep: Reputation: 393Reputation: 393Reputation: 393Reputation: 393
Quote:
An MTU blackhole somewhere along the path.

Try pinging a host with the "-M do" option (which sets the "don't fragment" flag), and adjust the payload size with the "-s" parameter to see when it fails. Ideally, "-s 1472" should work (1472 bytes of payload + 28-byte header = 1500 bytes, which is the standard Ethernet MTU), but it is also acceptable if a router along the path returns an ICMP Type 3, Code 4 error ("Fragmentation needed and DF set") unless a lower value is used.
I just did this. Zero problem at 1472. 1473 though and it fails out.

Code:
ping: local error: message too long, mtu=1500
At first I thought it was just my Windows kvm vm with pci pass through being the problem. But streaming is a stuttery mess, youtube unwatchable, stuff like that on other devices around the lan as well.

Quote:
NAT issues, which in turn create reverse path problems.

Your NAT MASQUERADE rule seem fine, but you may want to double-check that no packets exit the outbound interface with a 192.168.1.n source address.
I have no idea how to check this but I am curious.

Quote:
IP/netmask issues.

I noticed that your gateways are both in the 10.0.0.0/8 network, but what are their netmasks? How is this physically connected? Are you using one or two NICs for outbound traffic? And if you're using one NIC, does it have multiple IP addresses?
2 POE devices up on the roof. I haven't any idea how to find their netmask. I just know that the IP's I have are their IP's. They both come down into my router via a pair of usb 3.0 nics. These 2 nics get an address via dhcp. The motherboard nic is the lan line.

Last edited by jmgibson1981; 05-21-2021 at 05:58 PM.
 
Old 05-21-2021, 06:14 PM   #6
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 3,345

Rep: Reputation: Disabled
Quote:
Originally Posted by jmgibson1981 View Post
I just did this. Zero problem at 1472. 1473 though and it fails out.

Code:
ping: local error: message too long, mtu=1500
Excellent, that rules out MTU issues.
Quote:
Originally Posted by jmgibson1981 View Post
At first I thought it was just my Windows kvm vm with pci pass through being the problem. But streaming is a stuttery mess, youtube unwatchable, stuff like that on other devices around the lan as well.
You have several components in your setup that could be the cause of the issue, which makes troubleshooting harder.

Have you tried connecting a PC/laptop directly to the cable from the "bad" gateway? If the issues persist, then it's either the cable or the connection itself. However, should that turn out to work reasonably well (and I wouldn't be surprised if it did), there's an issue with the router/setup.
Quote:
Originally Posted by jmgibson1981 View Post
I have no idea how to check this but I am curious.
tcpdump -i <outbound_device> src net 192.168.1.0/24 should capture exactly 0 packets on both outbound interfaces. If anything at all shows up, something is wrong.
Quote:
Originally Posted by jmgibson1981 View Post
2 POE devices up on the roof. I haven't any idea how to find their netmask. I just know that the IP's I have are their IP's. They both come down into my router via a pair of usb 3.0 nics. These 2 nics get an address via dhcp. The motherboard nic is the lan line.
Since the routers are handing out IP addresses (and netmasks) to your device via DHCP, you can check this by simply checking which netmask you've received (ifconfig <devicename> or ip -4 addr list dev <devicename>.

If both netmasks are either 255.255.255.0 (/24) or 255.255.0.0 (/16), all is well since the NICs will have IP addresses belonging to different subnets. However, if either NIC shows a netmask of 255.0.0.0 (/8), you have overlapping networks.

But as you have two NICs, there may be a NAT issue. Your script contains the following line in the function router_config_func():
Code:
iptables -t nat -A POSTROUTING -o "$ifname" -j MASQUERADE
Unless this is called twice, once for each outgoing interface, you're going to have NAT issues.

Dump your NAT rules with iptables-save -t nat and make sure there are in fact two MASQUERADE rules, one for each outgoing interface.
 
Old 05-21-2021, 06:28 PM   #7
jmgibson1981
Senior Member
 
Registered: Jun 2015
Location: Tucson, AZ USA
Distribution: Debian
Posts: 1,148

Original Poster
Rep: Reputation: 393Reputation: 393Reputation: 393Reputation: 393
Both masks are 255.255.255.0 (/24)

Code:
# iptables-save -t nat
# Generated by xtables-save v1.8.2 on Fri May 21 16:23:25 2021
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 3130
-A POSTROUTING -o enxd03745d06539 -j MASQUERADE
-A POSTROUTING -o enxd03745c561a2 -j MASQUERADE
COMMIT
# Completed on Fri May 21 16:23:25 2021
This is the result from the Global default route, Nothing showing up on Bluespan. I'm sure this is bad based on what you said. But I'm unsure of what it means exactly or how to fix. That ip is not one of my static ips so I have no idea without chasing mac addresses. *EDIT* Doesn't matter what device. I'm seeing it track all kinds of stuff across the screen right now from a few ips on my lan.

Code:
# tcpdump -i enxd03745c561a2 src net 192.168.1.0/24
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enxd03745c561a2, link-type EN10MB (Ethernet), capture size 262144 bytes
16:25:52.624099 IP 192.168.1.200.51116 > 137.221.106.103.http: Flags [F.], seq 2431791768, ack 2579697586, win 8209, length 0
16:25:52.936495 IP 192.168.1.200.51116 > 137.221.106.103.http: Flags [F.], seq 0, ack 1, win 8209, length 0
16:25:53.552071 IP 192.168.1.200.51116 > 137.221.106.103.http: Flags [F.], seq 0, ack 1, win 8209, length 0
16:25:54.765115 IP 192.168.1.200.51116 > 137.221.106.103.http: Flags [F.], seq 0, ack 1, win 8209, length 0
16:25:57.172386 IP 192.168.1.200.51116 > 137.221.106.103.http: Flags [F.], seq 0, ack 1, win 8209, length 0
16:26:01.986624 IP 192.168.1.200.51116 > 137.221.106.103.http: Flags [F.], seq 0, ack 1, win 8209, length 0

Last edited by jmgibson1981; 05-21-2021 at 06:31 PM.
 
Old 05-21-2021, 06:44 PM   #8
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 3,345

Rep: Reputation: Disabled
Quote:
Originally Posted by jmgibson1981 View Post
Both masks are 255.255.255.0 (/24)
Another one down, a few more to go.
Quote:
Originally Posted by jmgibson1981 View Post
Code:
-A POSTROUTING -o enxd03745d06539 -j MASQUERADE
-A POSTROUTING -o enxd03745c561a2 -j MASQUERADE
You do indeed have one MASQUERADE rule for each interface, which should work...
Quote:
Originally Posted by jmgibson1981 View Post
Code:
# tcpdump -i enxd03745c561a2 src net 192.168.1.0/24
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enxd03745c561a2, link-type EN10MB (Ethernet), capture size 262144 bytes
16:25:52.624099 IP 192.168.1.200.51116 > 137.221.106.103.http: Flags [F.], seq 2431791768, ack 2579697586, win 8209, length 0
16:25:52.936495 IP 192.168.1.200.51116 > 137.221.106.103.http: Flags [F.], seq 0, ack 1, win 8209, length 0
16:25:53.552071 IP 192.168.1.200.51116 > 137.221.106.103.http: Flags [F.], seq 0, ack 1, win 8209, length 0
16:25:54.765115 IP 192.168.1.200.51116 > 137.221.106.103.http: Flags [F.], seq 0, ack 1, win 8209, length 0
16:25:57.172386 IP 192.168.1.200.51116 > 137.221.106.103.http: Flags [F.], seq 0, ack 1, win 8209, length 0
16:26:01.986624 IP 192.168.1.200.51116 > 137.221.106.103.http: Flags [F.], seq 0, ack 1, win 8209, length 0
...and yet your NAT setup is leaking like a sieve. That's your problem right there, but I can see no obvious reason why it shouldn't be working.

See if flushing the conntrack table with conntrack -F makes a difference.
 
1 members found this post helpful.
Old 05-21-2021, 06:50 PM   #9
jmgibson1981
Senior Member
 
Registered: Jun 2015
Location: Tucson, AZ USA
Distribution: Debian
Posts: 1,148

Original Poster
Rep: Reputation: 393Reputation: 393Reputation: 393Reputation: 393
I'll try it the minute my wife stops her workday, 10-15 minutes from now. According to my machine conntrack isn't installed. I'll install it then. I don't want to risk losing her network connection.
 
Old 05-21-2021, 06:53 PM   #10
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 3,345

Rep: Reputation: Disabled
Good idea.

Flushing the conntrack table means all ongoing TCP/UDP streams will lose their ESTABLISHED state and will have to be reestablished. Most software handles this quite well, but there will be a short interruption regardless.
 
Old 05-21-2021, 07:19 PM   #11
jmgibson1981
Senior Member
 
Registered: Jun 2015
Location: Tucson, AZ USA
Distribution: Debian
Posts: 1,148

Original Poster
Rep: Reputation: 393Reputation: 393Reputation: 393Reputation: 393
Seems to be working. Got 2 putty windows up. Not a thing leaked by on either in the past couple minutes. Before it was within 5 seconds on the one. While waiting I googled how to fix this.

https://www.reddit.com/r/linuxadmin/...to_be_leaking/

Is what I followed. I've committed the revisions to the script to git. I've got a tv streaming at the moment without a hiccup yet. About to start another one just to test the capability with multiple streams. I am quite grateful. Just the leaking problem alone is fixed and you led me to it. THank you much!
 
Old 05-21-2021, 07:26 PM   #12
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 3,345

Rep: Reputation: Disabled
That's great news! And you're welcome.
 
Old 05-21-2021, 09:51 PM   #13
jmgibson1981
Senior Member
 
Registered: Jun 2015
Location: Tucson, AZ USA
Distribution: Debian
Posts: 1,148

Original Poster
Rep: Reputation: 393Reputation: 393Reputation: 393Reputation: 393
Bit later update. My wife has successfully streamed a few episodes of her current binge show on the Disney+. I think it's buffered on her once or twice in the last hour and a half, maybe 5-10 seconds each time. Based on my experience living here with only WISP internet sources I will chalk that up to just prime time of day for them. Before it wasn't even watchable due to buffering and occasionally skipping forward and back. Also I've had my putty windows up for each interface running tcp dump. Not a whisper.

Thank you much again.
 
Old 05-21-2021, 11:29 PM   #14
Ser Olmy
Senior Member
 
Registered: Jan 2012
Distribution: Slackware
Posts: 3,345

Rep: Reputation: Disabled
One final comment: I've been thinking about what could have caused this NAT leakage to occur in the first place, and I believe it may be related to your external interfaces getting addresses via DHCP.

As the router boots, the DHCP client receives IP addresses for both interfaces, and unless you've added the appropriate parameter to the config file, it will end up stuffing two different gateways into the same routing table (main).

That would be bad news on any day, as the kernel will now load balance over two different NAT endpoints (which absolutely will not work), but in your case it might also mess up the gateway handling in the "homerouter.sh" script, as the ip route delete default command will delete only one entry, and then probably the wrong one given how things tend to go south if at all possible.

I would recommend either configuring the external NICs with static IP addresses and no gateway parameter, or alternatively make sure the DHCP client leaves the routing table well alone. Leaving all gateway handling to the homerouter script has the additional benefit of preventing any conntrack entries from being created before the relevant NAT rules are in place, as there will simply be no next-hop information of any kind in the routing table before the script is run.
 
Old 05-22-2021, 05:03 PM   #15
jmgibson1981
Senior Member
 
Registered: Jun 2015
Location: Tucson, AZ USA
Distribution: Debian
Posts: 1,148

Original Poster
Rep: Reputation: 393Reputation: 393Reputation: 393Reputation: 393
Interesting effect. I adjusted my interfaces file as such. Before they were dhcp.

Code:
auto enxd03745d06539
iface enxd03745d06539 inet static
        address 10.88.88.252/24

auto enxd03745c561a2
iface enxd03745c561a2 inet static
        address 10.254.197.254/24
Removed the route delete default from the script and rebooted. I see websites loading instantly. Even after the nat leak issue it was still a bit of a load time even if minor. This was almost as fast as I hit it. Major improvement. Thank you yet again...

Last edited by jmgibson1981; 05-22-2021 at 05:06 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] OPNSense. Multi WAN, force one local ip to specific wan. jmgibson1981 Linux - Networking 1 02-14-2021 06:12 PM
X11 is Abysmally Slow yooden Linux - Desktop 13 02-04-2014 12:27 PM
LXer: Tables of Contents, Indexes and Other Special Tables in Scribus LXer Syndicated Linux News 0 05-13-2011 05:30 AM
x86_64 on HP zv5410 abysmally slow sigsegv Linux - Laptop and Netbook 1 03-09-2005 10:03 PM
FreeS/Wan Vs. OpenS/Wan Vs. StrongS/Wan bkankur Linux - Security 1 03-01-2005 09:27 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 08:00 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration