Hi. I'm jon.404, a Unix/Linux/Database/Openstack/Kubernetes Administrator, AWS/GCP/Azure Engineer, mathematics enthusiast, and amateur philosopher. This is where I rant about that which upsets me, laugh about that which amuses me, and jabber about that which holds my interest most: *nix.

On-prem kubernetes, Part 3

Posted 12-14-2023 at 12:27 PM by rocket357
Updated 12-27-2023 at 04:47 AM by rocket357

Tags kubernetes, linux, networking, openbsd, virtualization

Posts in this series:
Project Goals and Description: Background info and goals
Preparing the installers: pxeboot configs
Installing the Xen Hosts: installing Debian/Xen dom0
Installing the K8s VMs: installing the k8s domUs
Initializing the Control Plane: Bootstrapping a bare-bones HA Kubernetes Cluster
Installation/Configuration of Calico/MetalLB/ingress-nginx (this post): Installing the CNI/Network Infrastructure
Installation/Configuration of LVM-CSI, S3-CSI, and Kadalu (GlusterFS): Installing the CSIs for Persistent Volumes
Installation/Configuration of cert-manager: Installing/Configuring cert-manager
Automating the boring bits: Installing/Configuring ArgoCD and GitOps Concepts
Authentication considerations: Installing/Configuring Authelia/Vault and LDAP/OAuth Integrations
Authentication configurations: Securing Applications with Authelia
Staying up to date: Keeping your cluster up-to-date

Github for example configuration files: rocket357/on-prem-kubernetes

Overview

Today we're going to install and configure the Calico CNI (Container Network Interface), MetalLB software loadbalancer, and ingress-nginx ingress controller. Calico will enable the app pods in the kubernetes cluster to communicate with each other (and indirectly, the outside world). MetalLB is the software loadbalancer *within* kubernetes (the loadbalancing is getting out of hand. I use relayd on my physical network, haproxy/keepalived on my Xen "cluster", and now MetalLB within Kubernetes!). MetalLB and Calico are listed as *mostly* compatible, seemingly because they have overlapping purposes, so we're going to need to customize the configuration of both so they work together properly. (The main issue, to my understanding, is that they both speak BGP, and if both are reaching out from a network host to a BGP Peer, the BGP Peer will see duplicate sessions and only accept one, meaning only one of them will work. The solution, essentially, is to use Calico to advertise routes to BGP Peers, and MetalLB to assign services to loadbalancer IPs. This requires that MetalLB *not* be configured for BGP Peering, and Calico be configured so it knows which IPPool is supposed to be advertised). The last piece of the puzzle is ingress-nginx (not to be confused with nginx-ingress, a different project!). ingress-nginx is an ingress controller, which essentially routes traffic for a given destination URL to specific pods to handle the requests.

Ok, with the background out of the way, we can focus on getting everything installed. Since we used a specific pod-network-cidr (which is the one I've always used and tends to show up a lot in kubeadm init instructions) we'll need to double check that Calico and MetalLB use similar configurations.

Intro to Kubernetes Operators

If you follow the operator-based install at calico's website, they'll have you kubectl create -f a large yaml file first to install the operator framework (don't do this, though. See notes on helm below!). An operator in kubernetes-speak is simply a codebase that is responsible for checking the health, updates, and configuration of a specific set of applications through custom resource definitions, or CRDs. For instance, postgres-operator and k8ssandra-operator (which I manage at work regularly) are a pair of operators that assist in the deployment, configuration, and upkeep of postgresql and cassandra databases, respectively. postgres-operator essentially takes a yaml definition, or manifest (that conforms to the expected custom resource definition...a definition of a multi-host postgres database cluster with realtime replication), and deploys the manifest (the postgres cluster) to the kubernetes cluster. It then performs periodic health checks, as well as keeps an eye on the manifest, and corrects as needed. So for instance if you update the manifest and re-deploy/apply it to the cluster, postgres-operator will see the change and perform a rolling restart of the database cluster to update the configuration. The rolling update will apply the changes to the replicas first, one at a time, then once the replicas are back up and healthy, it will failover the database primary to a suitable replica and perform the updates on the old primary. This will result in an update that only has a single minimal (few second) downtime.

Some operators handle backups as well, such as postgres-operator backing up dbs to appropriate media (s3, gcs, ssh shares, etc...). Postgres-operator is going to be deployed to this cluster at some point, mainly for an authelia backing store (and probably a few apps that can use postgres as their backing db as well, but which applications is yet to be decided), so expect more discussion on operators in future blog posts. For the time being, just understand that an operator automates much of the deployment, configuration, and healthcheck/auto-healing of the resources it's intended to own.

You can, of course, apply a postgresql pod yaml definition to the cluster without the use of an operator's CRDs, but then you'll need to manually configure backup, healthchecks, and recovery, etc... for the database. And you'd have to do all that manual work for every single database pod in the cluster.

TL;DR: Operators aren't strictly necessary, but they're a really good idea.

Use Helm

Also, it should be noted that kubectl apply is not the preferred way to manage applications on a cluster. Using a package manager such as helm is ideal, as it gives you a way to track upgrades and perform rollbacks should something go wrong. As such, don't blindly follow directions on the calico website. Be smart, use helm instead!

Seriously, I cannot stress this enough: use helm at all points where you can. The tigera operator does have a helm chart, so use it! It's as simple as following the directions on artifact hub (the above link...a repository of helm charts that you can use to deploy applications to your kubernetes cluster) after you've installed helm on your management machine.

Install/Configure Calico

Once the tigera operator is installed (you used helm, right? Don't worry about customizations yet, I used all defaults for it for helm. That's ok at this point), we can download the calico custom-resources.yaml file and edit it to match our pod-network-cidr. This is an important step (as noted in the docs). The default custom resource file at the time of this writing uses 192.168.0.0/16, which won't match the expected pod-network-cidr, so don't just blindly apply the file! (if you do, however, just perform the same kubectl apply -f operation, but do a kubectl delete -f instead, which will remove the resources from your cluster so you can fix the code and try again.

Note that normally you could kubectl apply -f after editing the file to reconfigure the resources without having to delete, but calico specifically states in the custom-resources.yaml file that:

Code:

# Note: The ipPools section cannot be modified post-install.

If they say it can't be done, don't try it. Just kubectl delete, edit the file, and kubectl apply again. Unless stated otherwise, however, just edit the file and kubectl apply it to reconfigure in-line and save yourself the extra step of deletion.

We also need to edit the config to add our BGP configuration, so open the yaml file in your favorite editor and add the following to the end of it:

Code:

---
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  logSeverityScreen: Info
  nodeToNodeMeshEnabled: true
  nodeMeshMaxRestartTime: 120s
  asNumber: 64500
  serviceClusterIPs:
    - cidr: 10.96.0.0/12
  serviceLoadBalancerIPs:
    - cidr: 10.1.15.0/24
  listenPort: 179
  bindMode: NodeIP
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: my-bgp-peer
spec:
  peerIP: 10.1.0.1
  asNumber: 64501

At this point, I could step through packaging a new helm chart to deploy your specific calico custom resources yaml, and while that's not a bad idea, there will be plenty of work done on that in the future. So, against better judgement, let's just kubectl apply -f the edited calico custom resource yaml file to get the cluster in a working form. After a few moments, you should see your nodes go to the Ready state:

Code:

$ kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
k8s-master-1   Ready    control-plane   26h   v1.28.2
k8s-master-2   Ready    control-plane   39h   v1.28.2
k8s-master-3   Ready    control-plane   39h   v1.28.2
k8s-worker-1   Ready    <none>          38h   v1.28.2
k8s-worker-2   Ready    <none>          26h   v1.28.2
k8s-worker-3   Ready    <none>          26h   v1.28.2
k8s-worker-4   Ready    <none>          22h   v1.28.2

In case you're wondering, the k8s-worker-4 VM listed above is the original single-controlplane cluster I'd built on the gaming laptop. I grabbed all of the configs I needed from it and repurposed it via the tftp/pxeboot/Debian preseed stuff as before, and popped it on top of the desktop tower stack. Maybe I'll provide pictures at some point. Maybe not.

You should see a BGP session come up for your router (the one at 10.1.0.1 in the above configuration):

Code:

# bgpctl show
Neighbor                   AS    MsgRcvd    MsgSent  OutQ Up/Down  State/PrfRcvd
k8s-master-1            64500       2859       2859     0 00:00:19      9
k8s-master-2            64500       2852       2853     0 00:00:21      9
k8s-master-3            64500       2854       2853     0 00:00:20      9
k8s-worker-1            64500       2859       2858     0 00:00:21      9
k8s-worker-2            64500       2859       2857     0 00:00:20      9
k8s-worker-3            64500       2859       2858     0 00:00:20      9
k8s-worker-4            64500       2854       2852     0 00:00:21      9

There should be an advertisement for your LoadBalancer IP range, as well as some IP spaces within the pod-network-cidr 10.244.0.0/16 and the cluster IP range (10.96.0.0/12). Note the MsgRcvd and MsgSent in the output above is really high because I had a MetalLB BGP configuration in place for testing, which I had to remove due to Calico taking over the BGP duties. This BGP router sees a session drop and re-establish from a given host, and it doesn't know what specific process is handling the session...it just sees a session from a host and that's the extent of its knowledge on that in the output above. Check advertisements:

Code:

fw1# bgpctl show fib bgp
flags: B = BGP, C = Connected, S = Static
       N = BGP Nexthop reachable via this route
       r = reject route, b = blackhole route

flags prio destination                      gateway
B       48 10.1.15.0/24                     10.1.8.1
B       48 10.96.0.0/12                     10.1.8.1
B       48 10.244.69.192/26                 10.1.8.1
B       48 10.244.79.64/26                  10.1.8.1
B       48 10.244.140.0/26                  10.1.8.1
B       48 10.244.168.0/26                  10.1.8.1
B       48 10.244.182.64/26                 10.1.8.1
B       48 10.244.196.0/26                  10.1.8.1
B       48 10.244.230.0/26                  10.1.8.1

A quick check of the rib (routing information base) shows that while k8s-master-1 (10.1.8.1) is listed in the output above, all of the hosts are advertising the above routes as well so if k8s-master-1 goes down another route peer can be selected and used.

Install/Configure MetalLB

Next up we're going to use helm to install MetalLB. Create a file and name it "metallb-values.yaml", and put it somewhere safe (say, a configs/metallb folder that you can git push at some point to a private github repo, or private gitea server...it's good practice to do this!). In this file, put the following:

Code:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: first-pool
  namespace: metallb
spec:
  addresses:
  - 10.1.15.0/24
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default-l2advertisement  # this is only required if your k8s cluster resides on the same subnet with clients!
  namespace: metallb
spec:
  ipAddressPools:
  - first-pool

Adjust the addresses list to be whatever you want. 10.1.15.0/24 works for my network, so I'm sticking with that here. This IPAddressPool is the list of IPs that MetalLB will use when handing out IPs for loadbalancers for your kubernetes cluster, which will be advertised via BGP (so the rest of our network will know where to reach our kubernetes services and when any updates occur). Once that file exists, run the helm install:

Code:

helm install metallb metallb/metallb -f metallb-values.yaml --namespace metallb --create-namespace

It's a good idea to keep track of the helm install commands you've used in a file somewhere so you can always reference it in the future. You might decide to move an installation to a different namespace, and thus need to helm delete or helm upgrade, so having a copy of the history is useful. You can always helm upgrade --reuse-values, of course, but I like to keep track of the install history for my own sanity. I'm OCD like that.

Install/Configure ingress-nginx

This is simple via helm:

Code:

helm install ingress-nginx ingress-nginx/ingress-nginx --namespace ingress-nginx --create-namespace

You should see a service of type "LoadBalancer" in the ingress-nginx namespace come up shortly with an "External IP" from the load balancer pool from MetalLB above. If the service External IP hangs on "Pending", double check the MetalLB IPAddressPool configuration and perhaps even the logs on the MetalLB controller pod. You can do that via:

Code:

kubectl logs $(kubectl get pods -n metallb | awk '/controller/ {print $1}') -n metallb | grep ingress-nginx

kubectl logs -f will follow the logs until you hit Ctrl-C, which is useful to keep the log tail going during live troubleshooting.

Once ingress-nginx is up and running, we can safely use the cluster to deploy applications...

Except...

Not quite. We could deploy completely stateless applications right now without issue, but at some point we're probably going to want to utilize data persistence for applications (postgresql would be entirely useless if we had to completely redeploy the schema and data every time a pod restarted, for example). For this, we'll need a CSI installed.

And that will come in the next blog post.

Cheers!

On-prem kubernetes, Part 3

Comments