This part of the project was only necessary because I already had an existing Kubernetes cluster running production workloads. In other words, post 80 and port 443 were already port forwarded on my router to the other cluster. I needed a way to selectively port forward app1.domain.tld to cluster A and app2.domain.tld to cluster B. I accomplished this using HAProxy and keepalived.

The traffic on port 80 and port 443 of my external IP are now forwarded to the loadbalancer and then to the correct Kubernetes cluster based on the domain of the request.

Enter HAProxy. HAProxy is a high-performance reverse proxy and load balancer. It’s used on high traffic sites and by most cloud providers because of its capabilities. It can proxy both HTTP(S) and TCP-based applications and make decisions how to route the traffic to the backend servers.

Keepalived is a load balancer that provides high-availability for the HAProxy VMs. It uses Virtual Redundancy Routing Protocol (VRRP) to create a floating IP and ensure that one of the machines running keepalived will take over the IP if the other stops responding.

I used Terraform and Ansible to create the two VMs using the same pattern that I did for the k3s VMs. The Terraform project creates the VMs and the Ansible inventory file with the Ansible loadbalancer role doing the heavy-lifting of installing and configuring the haproxy and keepalived.

Terraform

The Terraform main.cf looks like:

variable "num_loadbalancers" {
  default = 2
}

variable "pve_node" {
  description = "The PVE node to target"
  type        = string
  sensitive   = false
  default     = "proxmox1"
}

variable "loadbalancer_ip_addresses" {
  description = "List of IP addresses for loadbalancer(s)"
  type        = list(string)
  default     = ["xxx.xxx.xxx.250/24", "xxx.xxx.xxx.251/24"]
}

variable "loadbalancer_gateway" {
  type    = string
  default = "xxx.xxx.xxx.1"
}

variable "tamplate_vm_name" {
  default = "debian-bullseye-cloudinit-template"
}

variable "loadbalancer_mem" {
  default = "4096"
}

variable "loadbalancer_cores" {
  default = "2"
}

variable "loadbalancer_user" {
  default = "ansible"
}

variable "loadbalancer_disk" {
  default = "8G"
}

variable "loadbalancer_vlan" {
  default = 33
}

variable "loadbalancer_nameserver_domain" {
  type    = string
  default = "domain.tld"
}

variable "loadbalancer_nameserver" {
  type    = string
  default = "xxx.xxx.xxx.1"
}

variable "loadbalancer_hagroup" {
  type    = string
  default = "main"
}

terraform {
  required_providers {
    proxmox = {
      source  = "telmate/proxmox"
      version = ">=2.8.0"
    }
  }
}

provider "proxmox" {
  pm_api_url          = var.pve_api_url
  pm_api_token_id     = var.pve_token_id
  pm_api_token_secret = var.pve_token_secret
  pm_log_enable       = false
  pm_log_file         = "terraform-plugin-proxmox.log"
  pm_parallel         = 1
  pm_timeout          = 600
  pm_log_levels = {
    _default    = "debug"
    _capturelog = ""
  }
}

resource "proxmox_vm_qemu" "proxmox_vm_loadbalancer" {
  count = var.num_loadbalancers
  desc  = "HAProxy Loadbalancer ${count.index}"
  name  = "lb-${count.index}"
  ipconfig0   = "gw=${var.loadbalancer_gateway},ip=${var.loadbalancer_ip_addresses[count.index]}"
  target_node = var.pve_node
  hastate     = "started"
  # Same CPU as the Physical host, possible to add cpu flags
  # Ex: "host,flags=+md-clear;+pcid;+spec-ctrl;+ssbd;+pdpe1gb"
  cpu          = "host"
  numa         = false
  clone        = var.tamplate_vm_name
  onboot       = true
  os_type      = "cloud-init"
  agent        = 1
  hagroup      = var.loadbalancer_hagroup
  ciuser       = var.loadbalancer_user
  memory       = var.loadbalancer_mem
  cores        = var.loadbalancer_cores
  nameserver   = var.loadbalancer_nameserver
  searchdomain = var.loadbalancer_nameserver_domain

  network {
    model  = "virtio"
    bridge = "vmbr0"
    tag    = var.loadbalancer_vlan
  }

  serial {
    id   = 0
    type = "socket"
  }

  vga {
    type = "serial0"
  }

  disk {
    size    = var.loadbalancer_disk
    storage = "vmdata"
    type    = "scsi"
    format  = "qcow2"
  }

  lifecycle {
    ignore_changes = [
      network,
    ]
  }
}

data "template_file" "loadbalancer" {
  template = file("./templates/loadbalancer.tpl")
  vars = {
    loadbalancer_ip = "${join("\n", [for instance in proxmox_vm_qemu.proxmox_vm_loadbalancer : join("", [instance.name, " ansible_host=", instance.default_ipv4_address])])}"
  }
}

resource "local_file" "loadbalancer_file" {
  content  = data.template_file.loadbalancer.rendered
  filename = "../../inventory/loadbalancer"
}

output "Loadbalancers-IPS" {
  value = ["${proxmox_vm_qemu.proxmox_vm_loadbalancer.*.default_ipv4_address}"]
}

templates/loadbalancer.tpl:

[loadbalancer]
${loadbalancer_ip}

Loadbalancer Role

The Ansible loadbalancer role is pretty straight forward. It simply installs haproxy and keepalived using apt, copies over the configuration files, and restarts them.

roles/loadbalancer/tasks/main.yaml:

- name: Configure Floating IP
  ansible.builtin.template:
    src: templates/lb-floating-ip.j2
    dest: /etc/network/interfaces.d/60-lb-floating-ip
    owner: root
    group: root
    mode: "0644"
  notify:
    - network_restart
- name: Install keepalived
  ansible.builtin.apt:
    name: keepalived
    state: latest
- name: Configure Keepalived
  ansible.builtin.template:
    src: templates/keepalived-conf.j2
    dest: /etc/keepalived/keepalived.conf
    owner: root
    group: root
    mode: "0644"
  notify:
    - keepalived_restart
- name: Install haproxy
  ansible.builtin.apt:
    name: haproxy
    state: latest
- name: Configure haproxy
  ansible.builtin.template:
    src: templates/haproxy-cfg.j2
    dest: /etc/haproxy/haproxy.cfg
    owner: root
    group: root
    mode: "0644"
  notify:
    - haproxy_restart

The floating IP is defined in an inventory variable.

inventory/group_vars/loadbalancer:

lb_floating_ip: "xxx.xxx.xxx.254"

Network Interface

Create a new interface with the floating IP by creating a file in /etc/network/interfaces.d.

roles/loadbalancer/templates/lb-floating-ip.j2:

auto eth0:1
iface eth0:1 inet static
    address {{ lb_floating_ip }}/32

Keepalived Configuration

Create the keepalived configuration file in /etc/keepalived/keepalived.conf. This tells keepalived to check that the haproxy process is running and configures the heartbeat and priority of each host in the VRRP cluster.

roles/loadbalancer/templates/keepalived-conf.j2:

global_defs {
  script_user root
  enable_script_security
}

vrrp_script chk_haproxy {
  script "/usr/bin/pgrep haproxy"
  interval 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 33
    priority {{ lb_priority }}
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 9999
    }
    track_script {
        chk_haproxy
    }
    virtual_ipaddress {
        {{ lb_floating_ip }}
    }
}

The lb_priority variable is defined in the Ansible host variables:

inventory/host_vars/lb-0:

lb_priority: "100"

inventory/host_vars/lb-1:

lb_priority: "101"

HAProxy Configuration

The haproxy configuration goes in /etc/haproxy/haproxy.cfg and I’ll explain each section.

roles/loadbalancer/templates/haproxy-j2.conf:

global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
        stats timeout 30s
        user haproxy
        group haproxy
        daemon
        ssl-server-verify none

        maxconn 10000

        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private

        # intermediate configuration See https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
        ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
        ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
        ssl-default-bind-options prefer-client-ciphers no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets

        ssl-default-server-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
        ssl-default-server-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
        ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets

defaults
        log     global
        mode    http
        option  dontlognull
        timeout connect 5000
        timeout client  3600s
        timeout server  3600s
        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http

frontend http
        mode http
        bind :80
        option httplog
        option dontlognull
        option forwardfor 
        timeout	client 3600s

        acl whoami hdr(host) -i whoami.domain.tld
        acl staging hdr(host) -i test.domain.tld
        acl blog hdr(host) -i blog.domain.tld

        use_backend k8s-http-ingress if whoami
        use_backend k3s-http-ingress if staging
        use_backend k3s-http-ingress if blog

        default_backend k8s-http-ingress

frontend https
        mode tcp
        bind :443
        option tcplog

        acl clienthello req_ssl_hello_type 1
        acl whoami req_ssl_sni -i whoami.domain.tld

        acl clienthello req_ssl_hello_type 1
        acl staging req_ssl_sni -i test.domain.tld

        acl clienthello req_ssl_hello_type 1
        acl blog req_ssl_sni -i blog.domain.tld

        tcp-request inspect-delay 10s
        tcp-request content accept if clienthello
        tcp-request content reject if WAIT_END
       
        use_backend k3s-https-ingress if whoami
        use_backend k3s-https-ingress if staging
        use_backend k3s-https-ingress if blog

        default_backend k8s-https-ingress

backend k8s-http-ingress
        balance roundrobin
        mode http
        timeout connect 1s
        timeout queue 5s
        timeout server 3600s

        server ingress1 xxx.xxx.xxx.5:80 maxconn 1000 weight 10 send-proxy

backend k8s-https-ingress
        mode tcp
        option redispatch 
        balance roundrobin
        timeout connect 1s
        timeout queue 5s
        timeout server 3600s

        # maximum SSL session ID length is 32 bytes.
        stick-table type binary len 32 size 30k expire 30m
        acl clienthello req_ssl_hello_type 1
 
        # use tcp content accepts to detects ssl client and server hello.
        tcp-request inspect-delay 5s
        tcp-request content accept if clienthello

        # no timeout on response inspect delay by default.
        stick on payload_lv(43,1) if clienthello

        option ssl-hello-chk

        server ingress1 xxx.xxx.xxx.5:443 maxconn 1000 weight 10 send-proxy

backend k3s-http-ingress
   balance roundrobin
   option redispatch 
   mode http
   timeout connect 1s
   timeout queue 5s
   timeout server 3600s

   server trfk-ingress xxx.xxx.xxx.240:80 maxconn 1000 weight 10 send-proxy-v2

backend k3s-https-ingress
        mode tcp
        option redispatch 
        balance roundrobin
        timeout connect 1s
        timeout queue 5s
        timeout server 3600s

        # maximum SSL session ID length is 32 bytes.
        stick-table type binary len 32 size 30k expire 30m
        acl clienthello req_ssl_hello_type 1
 
        # use tcp content accepts to detects ssl client and server hello.
        tcp-request inspect-delay 5s
        tcp-request content accept if clienthello

        # no timeout on response inspect delay by default.
        stick on payload_lv(43,1) if clienthello

        option ssl-hello-chk

   server trfk-ingress xxx.xxx.xxx.240:443 maxconn 1000 weight 10 send-proxy

Global and defaults sections

The global section is pretty self-explanatory. It sets up the basic haproxy daemon with it’s own chroot jail, logging, and default for SSL ciphers.

Frontends

The frontend sections defines the ports for haproxy to listen on and the rules for routing traffic to which backend. There are two front ends - one for http and https. The http section is simpler because haproxy is able to directly inspect the request header coming from the browser and route the request. If a request doesn’t match one of the defined rules, it routes to the default backend. This feature allowed me to route all requests to the old Rancher-based k8s cluster unless there was a rule specifically routing it to the k3s cluster.

The https frontend is more complicated. Since https requests are encrypted, only the SSL certificate exchange is in the clear. If the request is an SSL hello, then check the Server Name Indication (SNI) for the subdomains and route to the correct https backend.

Backends

The backends section defines how to route the traffic to the cluster itself. In this case, there is only one server directive because we’re sending the traffic to the IP of the cluster ingress which is itself a load balanced IP created by metallb. The https backend also has a directive to define “stickiness” so that once a balancing decision is made the traffic will continue going to the same server. This isn’t necessary with just one server, but it’s there in case something changes in the future.

The send-proxy options are necessary to ensure that the http proxy headers are set on the request that is forwarded to the backend server. The proxy header contains the IP address of the original request since the traffic will appear to be coming from the IP of loadbalancer. It’s important to the know the original requester’s IP address for logging and security. Both Traefik and nginx ingresses have a configuration parameter to allow them to only trust proxy headers from a specific IP or range of IP addresses.

Handlers

The last part of the Ansible role are the handlers. When notified that the configuration was changed using the notify directive, they will execute the restarts.

roles/loadbalancer/handlers/main.yml:

- name: network_restart
  service:
    name: "networking"
    state: restarted
- name: keepalived_restart
  service:
    name: "keepalived"
    state: restarted
- name: haproxy_restart
  service:
    name: "haproxy"
    state: restarted

Playbooks

The playbooks used to execute the Terraform plan and destroy are very similar to the ones for the k3s VMs.

The Playbook to execute the loadbalancer role looks like this:

---
- hosts: loadbalancer
  gather_facts: yes
  become: yes
  pre_tasks:

  roles:
    - role: debian-cloud
    - role: loadbalancer

The debian-cloud role does basic configuration and customization of the Debian cloud image used as the template for the VMs. It includes things like setting the locale, configuring sshd, and updating the qemu-guest-tools package.

Summary

The loadbalancer Terraform project and Ansible role creates the VMs and then installs and configures keepalived and haproxy. As changes need to be made to the haproxy configuration, updating the template and running the Ansible role keeps everything in sync between the VMs. In addition, the role is in a git repository for version control.

Next Steps

In my next post, I will share the Ansible playbook and roles which install and initialize the k3s cluster, including ensuring that Kubernetes API is always available using keepalived.

Balancing the Load

Terraform #

Loadbalancer Role #

Network Interface #

Keepalived Configuration #

HAProxy Configuration #

Global and defaults sections #

Frontends #

Backends #

Handlers #

Playbooks #

Summary #

Next Steps #