Balancing the Load
This part of the project was only necessary because I already had an existing Kubernetes cluster running production workloads. In other words, post 80 and port 443 were already port forwarded on my router to the other cluster. I needed a way to selectively port forward app1.domain.tld to cluster A and app2.domain.tld to cluster B. I accomplished this using HAProxy and keepalived.
The traffic on port 80 and port 443 of my external IP are now forwarded to the loadbalancer and then to the correct Kubernetes cluster based on the domain of the request.
Enter HAProxy. HAProxy is a high-performance reverse proxy and load balancer. It’s used on high traffic sites and by most cloud providers because of its capabilities. It can proxy both HTTP(S) and TCP-based applications and make decisions how to route the traffic to the backend servers.
Keepalived is a load balancer that provides high-availability for the HAProxy VMs. It uses Virtual Redundancy Routing Protocol (VRRP) to create a floating IP and ensure that one of the machines running keepalived will take over the IP if the other stops responding.
I used Terraform and Ansible to create the two VMs using the same pattern that I did for the k3s VMs. The Terraform project creates the VMs and the Ansible inventory file with the Ansible loadbalancer role doing the heavy-lifting of installing and configuring the haproxy and keepalived.
Terraform
The Terraform main.cf looks like:
variable "num_loadbalancers" {
default = 2
}
variable "pve_node" {
description = "The PVE node to target"
type = string
sensitive = false
default = "proxmox1"
}
variable "loadbalancer_ip_addresses" {
description = "List of IP addresses for loadbalancer(s)"
type = list(string)
default = ["xxx.xxx.xxx.250/24", "xxx.xxx.xxx.251/24"]
}
variable "loadbalancer_gateway" {
type = string
default = "xxx.xxx.xxx.1"
}
variable "tamplate_vm_name" {
default = "debian-bullseye-cloudinit-template"
}
variable "loadbalancer_mem" {
default = "4096"
}
variable "loadbalancer_cores" {
default = "2"
}
variable "loadbalancer_user" {
default = "ansible"
}
variable "loadbalancer_disk" {
default = "8G"
}
variable "loadbalancer_vlan" {
default = 33
}
variable "loadbalancer_nameserver_domain" {
type = string
default = "domain.tld"
}
variable "loadbalancer_nameserver" {
type = string
default = "xxx.xxx.xxx.1"
}
variable "loadbalancer_hagroup" {
type = string
default = "main"
}
terraform {
required_providers {
proxmox = {
source = "telmate/proxmox"
version = ">=2.8.0"
}
}
}
provider "proxmox" {
pm_api_url = var.pve_api_url
pm_api_token_id = var.pve_token_id
pm_api_token_secret = var.pve_token_secret
pm_log_enable = false
pm_log_file = "terraform-plugin-proxmox.log"
pm_parallel = 1
pm_timeout = 600
pm_log_levels = {
_default = "debug"
_capturelog = ""
}
}
resource "proxmox_vm_qemu" "proxmox_vm_loadbalancer" {
count = var.num_loadbalancers
desc = "HAProxy Loadbalancer ${count.index}"
name = "lb-${count.index}"
ipconfig0 = "gw=${var.loadbalancer_gateway},ip=${var.loadbalancer_ip_addresses[count.index]}"
target_node = var.pve_node
hastate = "started"
# Same CPU as the Physical host, possible to add cpu flags
# Ex: "host,flags=+md-clear;+pcid;+spec-ctrl;+ssbd;+pdpe1gb"
cpu = "host"
numa = false
clone = var.tamplate_vm_name
onboot = true
os_type = "cloud-init"
agent = 1
hagroup = var.loadbalancer_hagroup
ciuser = var.loadbalancer_user
memory = var.loadbalancer_mem
cores = var.loadbalancer_cores
nameserver = var.loadbalancer_nameserver
searchdomain = var.loadbalancer_nameserver_domain
network {
model = "virtio"
bridge = "vmbr0"
tag = var.loadbalancer_vlan
}
serial {
id = 0
type = "socket"
}
vga {
type = "serial0"
}
disk {
size = var.loadbalancer_disk
storage = "vmdata"
type = "scsi"
format = "qcow2"
}
lifecycle {
ignore_changes = [
network,
]
}
}
data "template_file" "loadbalancer" {
template = file("./templates/loadbalancer.tpl")
vars = {
loadbalancer_ip = "${join("\n", [for instance in proxmox_vm_qemu.proxmox_vm_loadbalancer : join("", [instance.name, " ansible_host=", instance.default_ipv4_address])])}"
}
}
resource "local_file" "loadbalancer_file" {
content = data.template_file.loadbalancer.rendered
filename = "../../inventory/loadbalancer"
}
output "Loadbalancers-IPS" {
value = ["${proxmox_vm_qemu.proxmox_vm_loadbalancer.*.default_ipv4_address}"]
}
templates/loadbalancer.tpl:
[loadbalancer]
${loadbalancer_ip}
Loadbalancer Role
The Ansible loadbalancer role is pretty straight forward. It simply installs haproxy and keepalived using apt, copies over the configuration files, and restarts them.
roles/loadbalancer/tasks/main.yaml:
- name: Configure Floating IP
ansible.builtin.template:
src: templates/lb-floating-ip.j2
dest: /etc/network/interfaces.d/60-lb-floating-ip
owner: root
group: root
mode: "0644"
notify:
- network_restart
- name: Install keepalived
ansible.builtin.apt:
name: keepalived
state: latest
- name: Configure Keepalived
ansible.builtin.template:
src: templates/keepalived-conf.j2
dest: /etc/keepalived/keepalived.conf
owner: root
group: root
mode: "0644"
notify:
- keepalived_restart
- name: Install haproxy
ansible.builtin.apt:
name: haproxy
state: latest
- name: Configure haproxy
ansible.builtin.template:
src: templates/haproxy-cfg.j2
dest: /etc/haproxy/haproxy.cfg
owner: root
group: root
mode: "0644"
notify:
- haproxy_restart
The floating IP is defined in an inventory variable.
inventory/group_vars/loadbalancer:
lb_floating_ip: "xxx.xxx.xxx.254"
Network Interface
Create a new interface with the floating IP by creating a file in /etc/network/interfaces.d.
roles/loadbalancer/templates/lb-floating-ip.j2:
auto eth0:1
iface eth0:1 inet static
address {{ lb_floating_ip }}/32
Keepalived Configuration
Create the keepalived configuration file in /etc/keepalived/keepalived.conf. This tells keepalived to check that the haproxy process is running and configures the heartbeat and priority of each host in the VRRP cluster.
roles/loadbalancer/templates/keepalived-conf.j2:
global_defs {
script_user root
enable_script_security
}
vrrp_script chk_haproxy {
script "/usr/bin/pgrep haproxy"
interval 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 33
priority {{ lb_priority }}
advert_int 1
authentication {
auth_type PASS
auth_pass 9999
}
track_script {
chk_haproxy
}
virtual_ipaddress {
{{ lb_floating_ip }}
}
}
The lb_priority variable is defined in the Ansible host variables:
inventory/host_vars/lb-0:
lb_priority: "100"
inventory/host_vars/lb-1:
lb_priority: "101"
HAProxy Configuration
The haproxy configuration goes in /etc/haproxy/haproxy.cfg and I’ll explain each section.
roles/loadbalancer/templates/haproxy-j2.conf:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
ssl-server-verify none
maxconn 10000
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# intermediate configuration See https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
ssl-default-bind-options prefer-client-ciphers no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
ssl-default-server-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
ssl-default-server-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
defaults
log global
mode http
option dontlognull
timeout connect 5000
timeout client 3600s
timeout server 3600s
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend http
mode http
bind :80
option httplog
option dontlognull
option forwardfor
timeout client 3600s
acl whoami hdr(host) -i whoami.domain.tld
acl staging hdr(host) -i test.domain.tld
acl blog hdr(host) -i blog.domain.tld
use_backend k8s-http-ingress if whoami
use_backend k3s-http-ingress if staging
use_backend k3s-http-ingress if blog
default_backend k8s-http-ingress
frontend https
mode tcp
bind :443
option tcplog
acl clienthello req_ssl_hello_type 1
acl whoami req_ssl_sni -i whoami.domain.tld
acl clienthello req_ssl_hello_type 1
acl staging req_ssl_sni -i test.domain.tld
acl clienthello req_ssl_hello_type 1
acl blog req_ssl_sni -i blog.domain.tld
tcp-request inspect-delay 10s
tcp-request content accept if clienthello
tcp-request content reject if WAIT_END
use_backend k3s-https-ingress if whoami
use_backend k3s-https-ingress if staging
use_backend k3s-https-ingress if blog
default_backend k8s-https-ingress
backend k8s-http-ingress
balance roundrobin
mode http
timeout connect 1s
timeout queue 5s
timeout server 3600s
server ingress1 xxx.xxx.xxx.5:80 maxconn 1000 weight 10 send-proxy
backend k8s-https-ingress
mode tcp
option redispatch
balance roundrobin
timeout connect 1s
timeout queue 5s
timeout server 3600s
# maximum SSL session ID length is 32 bytes.
stick-table type binary len 32 size 30k expire 30m
acl clienthello req_ssl_hello_type 1
# use tcp content accepts to detects ssl client and server hello.
tcp-request inspect-delay 5s
tcp-request content accept if clienthello
# no timeout on response inspect delay by default.
stick on payload_lv(43,1) if clienthello
option ssl-hello-chk
server ingress1 xxx.xxx.xxx.5:443 maxconn 1000 weight 10 send-proxy
backend k3s-http-ingress
balance roundrobin
option redispatch
mode http
timeout connect 1s
timeout queue 5s
timeout server 3600s
server trfk-ingress xxx.xxx.xxx.240:80 maxconn 1000 weight 10 send-proxy-v2
backend k3s-https-ingress
mode tcp
option redispatch
balance roundrobin
timeout connect 1s
timeout queue 5s
timeout server 3600s
# maximum SSL session ID length is 32 bytes.
stick-table type binary len 32 size 30k expire 30m
acl clienthello req_ssl_hello_type 1
# use tcp content accepts to detects ssl client and server hello.
tcp-request inspect-delay 5s
tcp-request content accept if clienthello
# no timeout on response inspect delay by default.
stick on payload_lv(43,1) if clienthello
option ssl-hello-chk
server trfk-ingress xxx.xxx.xxx.240:443 maxconn 1000 weight 10 send-proxy
Global and defaults sections
The global section is pretty self-explanatory. It sets up the basic haproxy daemon with it’s own chroot jail, logging, and default for SSL ciphers.
Frontends
The frontend sections defines the ports for haproxy to listen on and the rules for routing traffic to which backend. There are two front ends - one for http and https. The http section is simpler because haproxy is able to directly inspect the request header coming from the browser and route the request. If a request doesn’t match one of the defined rules, it routes to the default backend. This feature allowed me to route all requests to the old Rancher-based k8s cluster unless there was a rule specifically routing it to the k3s cluster.
The https frontend is more complicated. Since https requests are encrypted, only the SSL certificate exchange is in the clear. If the request is an SSL hello, then check the Server Name Indication (SNI) for the subdomains and route to the correct https backend.
Backends
The backends section defines how to route the traffic to the cluster itself. In this case, there is only one server directive because we’re sending the traffic to the IP of the cluster ingress which is itself a load balanced IP created by metallb. The https backend also has a directive to define “stickiness” so that once a balancing decision is made the traffic will continue going to the same server. This isn’t necessary with just one server, but it’s there in case something changes in the future.
The send-proxy options are necessary to ensure that the http proxy headers are set on the request that is forwarded to the backend server. The proxy header contains the IP address of the original request since the traffic will appear to be coming from the IP of loadbalancer. It’s important to the know the original requester’s IP address for logging and security. Both Traefik and nginx ingresses have a configuration parameter to allow them to only trust proxy headers from a specific IP or range of IP addresses.
Handlers
The last part of the Ansible role are the handlers. When notified that the configuration was changed using the notify directive, they will execute the restarts.
roles/loadbalancer/handlers/main.yml:
- name: network_restart
service:
name: "networking"
state: restarted
- name: keepalived_restart
service:
name: "keepalived"
state: restarted
- name: haproxy_restart
service:
name: "haproxy"
state: restarted
Playbooks
The playbooks used to execute the Terraform plan and destroy are very similar to the ones for the k3s VMs.
The Playbook to execute the loadbalancer role looks like this:
---
- hosts: loadbalancer
gather_facts: yes
become: yes
pre_tasks:
roles:
- role: debian-cloud
- role: loadbalancer
The debian-cloud role does basic configuration and customization of the Debian cloud image used as the template for the VMs. It includes things like setting the locale, configuring sshd, and updating the qemu-guest-tools package.
Summary
The loadbalancer Terraform project and Ansible role creates the VMs and then installs and configures keepalived and haproxy. As changes need to be made to the haproxy configuration, updating the template and running the Ansible role keeps everything in sync between the VMs. In addition, the role is in a git repository for version control.
Next Steps
In my next post, I will share the Ansible playbook and roles which install and initialize the k3s cluster, including ensuring that Kubernetes API is always available using keepalived.