[HOWTO] A quick explanation of routing setup with OpenVPN tunnels

sundialsvcs · 11-10-2016, 08:25 AM

Probably the most-frustrating thing about setting up OpenVPN for the first time is trying to figure out, either why the damned thing won't connect, or why this-computer can't ping that-one. So, here's a quick overview of how it should work, and how things should be set up to make it work ... considering, in this case, only the second scenario of "why I can't ping you, and/or why you can't ping me."

Let's say that Bob, on subnet #1 at IP-address 10.11.11.11 in Idaho, wants to ping Mary's computer, on subnet #2 at IP-address 10.22.22.22, in Vermont. (Neither Bob nor Mary are running OpenVPN on their own machines.)

Quote:

First(!) of all, kindly notice that "their employer is doing many things correctly." Each company subnet (10.11.11.xx, 10.22.22.xx, etc.) is situated on one of the several private-network IP-address ranges. Furthermore, each one is distinct from the other (within the company ...), and none is too-likely to conflict with the addresses used in a road-warrior's coffee shop, hotel, or home.

Necessarily, some computer on both subnets must be running OpenVPN and must now be successfully connected to each other. Let's say that the computer on Bob's subnet is a client at 10.11.11.123 (owned by Greg), and the computer on Mary's is the server in the home-office at 10.22.22.234 (owned by Sue). Let's also say that the internal private network that the OpenVPN systems use to talk among themselves is at 10.8.0.0. Here we go ...

("10.8.0.0," "UDP," "port 1194," and so-on are arbitrary things specified in the server's configuration file. These are the values most commonly seen.)

(By the way, I'm not going to try to tell you here "how to make it happen." You can look on-line for that. What I'm going to try to do is to describe what's supposed to happen, every step of the way. Also please note that I presume that Greg's and Sue's machines are connected, and that digital certificates with unique CN= common-names have been used to secure the connection and to uniquely identify every client, so that client-config-dir works. In this way, I hope to help you quickly discover what exactly has gone wrong right now for you...)

(P.S.: What's the green text for? You guessed it! These are the "gotchas." These are the things that might be all or part of the problem at each stage.)

Onward ..!

(1) Bob says, ping 10.22.22.22, and crosses his fingers hopefully. An ICMP Ping-request packet is created and sent. But, where does it go and how does it get there? Well, there could be a route on Bob's machine (see (2b), but there probably isn't. So, the packet gets sent by-default to the "default gateway" ... the office router. Notice that the outgoing packet bears a sender-address of 10.11.11.11 and a destination of 10.22.22.22.

(2) If the office router did not recognize a special-case for these packets, it would forward them to "the Internet," which would immediately deep-six them since they bear "private-network" addresses. Therefore, the office router must instead divert them to where they need to go. The office router must have a static route for a destination of "10.22.22.xx" ... (as well as for "10.8.0.x" for reasons that will be apparent later) ... which forwards the traffic to the OpenVPN machine (10.11.11.123) "as a gateway." In other words, it regards the OpenVPN machine as "another router," which it is.

(2b) Alternatively, Bob's machine could have had a route command which specified 10.11.11.123 as a "gateway" leading to subnet 10.22.22.x, thereby avoiding the bounce through the office router.

(2c) Certain 'big, expensive' routers might respond with a Redirect Host reply, telling you about the OpenVPN machine and inviting you to use it as your gateway, instead. If your router does this, you need to make it stop doing it. Your router must forward the traffic, itself. (Trust me on this one ... ...)

(3a) The machine at 10.11.123 must specify IP-address forwarding so that the gateway packet will be received and acted-upon instead of just being dropped. This is what instructs the machine to "act as a router," accepting traffic that specifies its address as a gateway rather than a final destination. (Otherwise, you'll see the packets being sent to the machine but going no farther, like so-many unsuccessful spermatozoa.)

When you make this change on a Windows box (in the registry), remember to restart the machine.

(3b) This computer consults its own routing-table, and it finds a route directing traffic for "10.22.22.xx" to its (virtual) tunX device. This is caused by a route directive in the OpenVPN configuration, which was probably "pushed" to the client from the server and then acted-upon by the client because of a pull directive in its own config.

(4) The OpenVPN machine encrypts the traffic and sends it via UDP port 1194 (usually ...) over the public Internet all the way to Vermont. (Yes, it is possible to use any UDP port. It's also possible to use TCP/IP, instead. But this is not customarily done ...)

(5) A port-forwarding rule on the Internet router in Vermont causes "incoming UDP traffic on port 1194" to be forwarded to the OpenVPN server at 10.22.22.234.

(6a) The OpenVPN server decrypts the traffic and, seeing that it is destined for its local network, sends it to Mary's machine, which dutifully generates an ICMP Ping-reply. This packet has a sender address of 10.22.22.22 and a destination of 10.11.11.11. This packet must now manage to make its way back to the OpenVPN server at 10.22.22.234, as with steps 1-3a.

(6b) (If the destination was not on the local subnet, but instead was on a subnet controlled by a different VPN client, this would be a client-to-client case: the traffic will be re-encrypted and sent out to another client. See step (7b), below, for more information about how OpenVPN knows how to do this.)

(7a) (Since this is not a client-to-client case ...) The ping-reply packet, having been dutifully delivered to the OpenVPN Server by the local router or what-not, and having been accepted by the server-machine because of IP-address forwarding, as usual, is routed to the OpenVPN Server's tunX device by its local operating-system because there is a route 10.11.11.0 ... directive in its own OpenVPN configuration.

(7b) Well, the OpenVPN server in Vermont happens to have OpenVPN connections to dozens of offices around the country. Each office necessarily has its own, non-overlapping IP address range, so the router is able to look at the destination and decide to which client ("Idaho ...") this packet should be sent. This is because there is a client-config-dir, and within it there is a file with a name matching the Idaho server's "common name," and within that file there is a iroute 10.11.11.0 ... directive. Now, OpenVPN knows which client to send it to.

To clarify:

The route directive is always needed, so that OpenVPN can issue the correct operating-system command to cause the specified subnet to be directed to the tunX, and thence to OpenVPN itself. (Only OpenVPN can do this, because it creates the tunnel virtual-device first.)
The iroute command is actually a directive used by OpenVPN, to make it aware of the presence of the remote subnet and thus to know which of its clients (even if there is only one!) to send the traffic to. It does not correspond to an operating-system command, and does not cause any OS command to be issued.
Despite the very similar names, the directives have an entirely distinct purpose, and so, both are required if any client exposes a subnet that is to be reachable. OpenVPN will not "guess" about such things: it must be told.

(8) And so, another UDP packet winds its way back to Idaho, where port-forwarding once again causes it to be delivered to the OpenVPN client at 10.11.11.123.

(9) The OpenVPN client, seeing that the packet is bound for Bob on its local network, forwards the packet to Bob, and the long-awaited "ping reply" message appears. (Much jubilation and merriment briefly ensues ...)

Also:

(10) If Greg had issued the ping 10.22.22.22 command, his packet would have carried a return-address somewhere in the 10.8.0.x address range, not 10.11.11.123, because the route commands in place on his own machine would have sent the ping directly to tunX. (See step (3a) above.) Therefore, it is necessary for the office routers to have static routes covering this address-range, as well. Both should be forwarded to the OpenVPN machine "as a gateway." The same is true for Sue. Correct routing must be in place, one way or another, for the entire round-trip, on both sides. All IP-addresses of all subnets and the special 10.8.0.x subnet used by OpenVPN itself, must be covered.

---
To properly diagnose and resolve communication issues, you first need to clearly understand how the traffic is supposed to flow ... as I have attempted to do here ... and you need a tool such as tcpdump or WireShark and you need to know how to use it. Although you can't decrypt the UDP traffic (of course...), you can see it coming and going, or not-going, as the case may be.

Critically, it comes down to making sure that e-v-e-r-y participant in the entire exchange knows how to route the traffic (and knows to accept it).

(Whew! And I initially thought that this would be a short post!!)

HTH™ ...!!

sundialsvcs · 11-29-2016, 10:58 AM

I would like to add to this thread an explanation of a problem that I recently encountered, and how I diagnosed and fixed it.

The Problem:
Users at a remote (10.30.40.xx) could connect to a service on the cloud server (10.44.55.11) that was also running OpenVPN, but could not connect to a service on a different machine within the virtual subnet (10.44.55.33) , although they were pushed a route to it. In that case, the server "took too long to respond."

On "my" machine, which happens to use a direct OpenVPN client, I was able to reach all servers instantly.

Evidence Gathered:
traceroute from the remote side to the destination machine could be seen reaching the local OpenVPN server and being sent to the correct address, but after this hop, the route stalled.

The specific hop that was failing was from 10.44.55.11 to 10.44.55.33. Since these servers can reach one another directly, it had to be a problem with the packets being unable to return.

Explanation:
The packets made it to the destination machine (10.44.55.33) on the cloud side), but they didn't know how to make it home. The packets bore a sender-address of, say, 10.30.40.16, but there was no route command telling this server (10.44.55.33) to forward this traffic back to the OpenVPN machine (10.44.55.11). So, the packets got thrown onto the Internet by way of the default route, where they vanished into the gloom because these are non-routable addresses.

There was a routing command that covered the IP-address range used for the participants in the OpenVPN network itself (10.8.0.xx), which is the address-range that "my" machine would be using because I'm running an OpenVPN client directly. This is why "my" machine could see all of the servers.

Resolution:
Add the following command to /etc/network/interfaces on 10.44.55.33, specifying the appropriate address and interface. (This is the same routing that already necessarily existed for the 10.8.0.xx subnet.) In this example, the OpenVPN server is on 10.44.55.11 and is reached through interface ens35. This directive will cause this route command to be issued each time this interface is brought up in the future.

Quote:

ifup route add -net 10.20.30.0 netmask 255.255.254.0 gw 10.44.55.11 dev ens35

If you are fixing the problem on-the-spot, you'll need to also issue a one-time route command to establish this route "right now." (Omit ifup.) You do not need to bring the interface down or up: as soon as this computer knows where to forward this traffic, the problem is solved.

To clarify:

OpenVPN traffic that originates from a directly connected user, who is running an OpenVPN client on his own machine, will bear a return-address in the range used by OpenVPN to denote such users: e.g. 10.8.0.xx. (The traffic goes immediately into the tunnel-device on their machine.) There must be a route on all machines to which an OpenVPN user might connect, to send their traffic back to the OpenVPN server for delivery.
But traffic from an indirectly connected user, who is using an OpenVPN router on their remote subnet, will bear the return address of their machine within that subnet. These packets, too, must be routed home back to the OpenVPN server machine for final delivery. And, as is the case with all remote subnets, the IP-address ranges on both sides must be distinct.
If you find that "traceroute stalls," spitting out rows of asterisks, this is a dead giveaway that you have a return-routing problem at that point (the packets get there, but they don't know how to get home).