Towards High Availability

Towards High Availability

In order to make my new 2 node Proxmox cluster highly available, I need shared storage for the VMs and a quorum in the cluster.

Shared storage is available now as an NFS mount from the QNAP, but my goal is to retire the QNAP and move two TB disks into the first Proxmox node.

There are a number of ways to do this, but I to chose to use GlusterFS volumes backed by ZFS. ZFS can be used to provide the datasets for the Gluster bricks to reside and the volumes can be replicated across both nodes. PVE can directly use Gluster volumes as storage.

The Proxmox Ansible playbook I created for this does these tasks:

  • Install GlusterFS packages
  • Creates a nodelist from the hosts in the playbook run
  • Configure the Gluster peers
  • Starts the Gluster service
  • Create the parent ZFS dataset
  • For each volume, create a brick dataset
  • Configure the volume
- name: Proxmox
  hosts:
    - node1
    - node2
  tasks:
    - name: Install GlusterFS tools
      apt:
        name: glusterfs-server
        state: latest
    - name: Determine Gluster nodes
      set_fact:
         nodelist: "{{ ( nodelist | default([]) ) + [ hostvars[item].ansible_host ] }}"
      loop: "{{ ansible_play_hosts }}"
    - debug:
        var: nodelist | join(',')
    - name: Enable Glusterd service
      ansible.builtin.systemd:
        name: glusterd
        state: started
        enabled: yes
    - name: Create a Gluster storage pool
      gluster.gluster.gluster_peer:
        state: present
        nodes: "{{ nodelist }}"
      run_once: true
    - name: Create bricks parent zfs dataset
      community.general.zfs:
        name: "{{ gluster_zpool }}/bricks"
        extra_zfs_properties:
          mountpoint: /mnt/bricks
        state: present
    - name: Create brick - b1
      community.general.zfs:
        name: "{{ gluster_zpool }}/bricks/b1"
        state: present
    - name: Create gluster test volume
      gluster.gluster.gluster_volume:
        state: present
        name: test
        bricks: /mnt/bricks/b1/test
        replicas: 2
        cluster: "{{ nodelist }}"
      run_once: true
    - name: create ISO volume
      gluster.gluster.gluster_volume:
        state: present
        name: iso
        bricks: /mnt/bricks/b1/iso12
        replicas: 2
        cluster: "{{ nodelist }}"
      run_once: true
    - name: Create brick - b2
      community.general.zfs:
        name: "{{ gluster_zpool }}/bricks/b2"
        state: present
    - name: create replicated VM data
      gluster.gluster.gluster_volume:
        state: present
        name: vmdata-replicated
        bricks: /mnt/bricks/b2/vmdata-replicated
        replicas: 2
        cluster: "{{ nodelist }}"
      run_once: true
proxmox.yml

Once the Gluster volumes are created, they can be added to the Proxmox Storage Manager. In my case, I created a test volume, a volume for storing ISO images, and a volume for storing VM disk and container templates.  Both nodes in the cluster can access these replicated volumes so the VMs can run in either node.

I'm in the process of moving everything stored in local storage to one of these shared volumes.

QDevice

In order to gain the 3rd vote for the Quorum, I decided to use a Raspberry Pi I had sitting around doing nothing to run corosync.

Once the Pi was available via SSH and root could login with a password, I needed to install the corosync-qdevice package on all Proxmox nodes as well as the Pi.

    - name: Install Corosync Qdevice tools
      apt:
        name: corosync-qdevice
        state: latest

Then from one of the nodes, execute:

# pvecm qdevice setup 192.168.1.x

The command will prompt for the root password on the Qdevice and will install the cluster certificates. Now check the cluster status:

# pvecm status
Cluster information
-------------------
Name:             home
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sat Jul 17 10:14:07 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.30
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW node1 (local)
0x00000002          1    A,V,NMW node2
0x00000000          1            Qdevice

Should one of the nodes go down, the Qdevice will provide enough votes to maintain a quorum.