Cluster Storage

Cluster Storage

Now that the new host is up and functioning in the cluster, I need to start thinking about allocating the storage.

The new host is showing the following storage:

  • local
  • local-lvm
  • qnap-iso
  • qnap-zfs

The original Proxmox host has a local and local-vm storage as well, but I confirmed that these are not the same (should be obvious). These are local to that hosts and resides on the disk that was selected for installation of the Proxmox (500GB SSD). The local storage is configured for container templates and ISOs stored in a local directory. The local-lvm stoage is configured for disk images and containers and is thinly provisioned.

The qnap-iso storage is an NFS share on the QNAP which I set up on the original host to hold ISOs and container templates. Once I set fixed the ACLs on QNAP, the contents can now be used by both hosts.

The qnap-zfs was a ZFS filesystem I created using an iSCSI LUN from the QNAP so I could learn a little of how iSCSI and ZFS worked. While I was able to make the LUN seen from both hosts using the original steps, I'm not going to do anything with it for now. It only contains the disk for the Win 10 gaming VM and it will only ever be used on the host with the Nvidia card used with GPU passthrough. Besides, the QNAP is going away eventually so I will migrate that disk image to a different storage later and delete the pool.

At this point, here are block devices the new host sees:

NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                            8:0    0   7.3T  0 disk
sdb                            8:16   0   7.3T  0 disk
sdc                            8:32   0   7.3T  0 disk
sdd                            8:48   0   7.3T  0 disk
sde                            8:64   0     1T  0 disk
nvme1n1                      259:0    0 465.8G  0 disk
nvme0n1                      259:1    0 465.8G  0 disk
├─nvme0n1p1                  259:2    0  1007K  0 part
├─nvme0n1p2                  259:3    0   512M  0 part /boot/efi
└─nvme0n1p3                  259:4    0 465.3G  0 part
  ├─pve-swap                 253:0    0     8G  0 lvm  [SWAP]
  ├─pve-root                 253:1    0    96G  0 lvm  /
  ├─pve-data_tmeta           253:2    0   3.5G  0 lvm
  │ └─pve-data-tpool         253:4    0 338.4G  0 lvm
  │   ├─pve-data             253:5    0 338.4G  0 lvm
  │   └─pve-vm--101--disk--0 253:6    0    32G  0 lvm
  └─pve-data_tdata           253:3    0 338.4G  0 lvm
    └─pve-data-tpool         253:4    0 338.4G  0 lvm
      ├─pve-data             253:5    0 338.4G  0 lvm
      └─pve-vm--101--disk--0 253:6    0    32G  0 lvm

The sda, sdb, sdc, and sdd are the 8TB IronWolf drives. The sda device is the QNAP LUN. The nvme1n1 is the second SSD devices which will be used for caching. I have partitioned the 2nd NVMe with a small partition for the SLOG (Separate Intent Log) and the rest of the drive for the L2ARC:

Model: Samsung SSD 970 EVO Plus 500GB (nvme)
Disk /dev/nvme1n1: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name   Flags
 1      1049kB  50.0GB  50.0GB               slog
 2      50.0GB  500GB   450GB                cache

Following this guide, it was recommended to use /dev/disk/by-id/ rather than the /dev/sdx devices to ensure the device names don't change in the future. It was also the recommended method for pools of less than 10 disks.

To do, this, I need to set up aliases in /etc/zfs/vdev_id.conf:

#     by-vdev
#     name     fully qualified or base name of device link
alias d1       /dev/disk/by-id/ata-ST8000VN004-2M2101_WSD2EGG7
alias d2       /dev/disk/by-id/ata-ST8000VN004-2M2101_WSD2EGCR
alias d3       /dev/disk/by-id/ata-ST8000VN004-2M2101_WSD1LF0S
alias d4       /dev/disk/by-id/ata-ST8000VN004-2M2101_WSD0WTXP
alias s1       /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_500GB_S58SNM0R521818P

Then run udevadm trigger to read the configuration file and create the /dev/disk/by-vdev paths:

# ls -l /dev/disk/by-vdev/
total 0
lrwxrwxrwx 1 root root  9 Jul  5 15:56 d1 -> ../../sda
lrwxrwxrwx 1 root root  9 Jul  5 15:56 d2 -> ../../sdb
lrwxrwxrwx 1 root root  9 Jul  5 15:56 d3 -> ../../sdc
lrwxrwxrwx 1 root root  9 Jul  5 15:56 d4 -> ../../sdd
lrwxrwxrwx 1 root root  9 Jul  5 15:56 i1 -> ../../sde
lrwxrwxrwx 1 root root 13 Jul  5 15:56 s1 -> ../../nvme1n1
lrwxrwxrwx 1 root root 15 Jul  5 15:56 s1-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 Jul  5 15:56 s1-part2 -> ../../nvme1n1p2

I've decided that I want to create a RAIDZ-1 pool which will give me a decent balance of fault tolerance (1 disk) and usable space (24TB).

# zpool create tank raidz1 d1 d2 d3 d4 log s1-part1 cache s1-part2

Then check the status of the newly created pool:

# zpool status
  pool: tank
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            d1      ONLINE       0     0     0
            d2      ONLINE       0     0     0
            d3      ONLINE       0     0     0
            d4      ONLINE       0     0     0
        logs
          s1-part1  ONLINE       0     0     0
        cache
          s1-part2  ONLINE       0     0     0

errors: No known data errors

The last task is to create a vmdata file system:

# zfs create tank/vmdata
# zfs set compression=on tank/vmdata
# pvesm zfsscan

and add it to /etc/pve/storage.cfg:

zfspool: vmdata
        pool tank/vmdata
        content rootdir,images
        sparse

Next, I will migrate some VMs over to the new host.