## Posts Tagged ‘zfs’

### qemu-kvm: file driver requires to be a regular file for ZFS volume

Thursday, June 23rd, 2022

After one of the recent updates, a few KVM-based systems could not boot anymore.

I am using ZFS for my emulated block devices, and I was happy with that until recently. Now – VMs won’t start, showing the error message in this post’s header.

The source of the problem is rather nasty – qemu update to version 6 has changed compatibility (see here) and libvirt has not followed through.

An ugly workaround is to modify the XML directly (either using Virt-Manager or using text editor of the VM’s XML) and modify the disk to the following. Change ‘file’ into ‘block’ as the type, and change source from ‘file’ type to ‘dev’ type.

An example:

<disk type="block" device="disk">
<driver name="qemu" type="raw"/>
<source dev="/dev/share/VMs/hassos-updated.lun"/>
<target dev="vda" bus="virtio"/>
<address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
</disk>


### Better iostat visibility of ZFS vdevs

Sunday, July 4th, 2021

All ZFS users are familiar with ‘zpool iostat’ command, however, it is not easily translated into Linux ‘iostat’ command. Using large pools with many disks will result in a mess, where it’s hard to identify which disk is which, and going to a translation table from time to time, to identify a suspect slow disk.

Linux ‘iostat’ command allows to use aliases, and if you’r using vdev_id.conf file, and you are using ZFS aliased names, you can harness the same naming to your ‘iostat’ command. See my command example below – note that in this setup I do not use multipath or other DM devices, but a direct approach to /dev/sd devices. Also – in this case – I have a few (slightly more than a dozen) disks, so no need to address /dev/sd[a-z][a-z] devices:

iostat -kt 5 -j vdev -x /dev/sd? /dev/nvme0n1

You can chain more devices to the end of this line. The result should be something like this:

07/04/2021 10:36:54 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.26    0.00    0.87    1.49    0.00   97.39

r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00 nvme0n1
42.80   15.40   5461.60   1938.40     0.00     0.00   0.00   0.00    0.50    0.87   0.01   127.61   125.87   1.12   6.50 sata500g2
52.20    0.00   6631.20      0.00     0.00     0.00   0.00   0.00    0.43    0.00   0.00   127.03     0.00   1.40   7.30 sata500g1
40.60    0.40   5196.80     51.20     0.00     0.00   0.00   0.00    0.44    0.50   0.00   128.00   128.00   1.41   5.80 sata500g4
71.60    6.20   9148.00    776.80     0.00     0.00   0.00   0.00    0.45    0.48   0.00   127.77   125.29   1.34  10.46 sata500g3
35.00    6.00   4463.20    768.00     0.00     0.00   0.00   0.00    0.44    0.53   0.00   127.52   128.00   1.24   5.08 sata1t
0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00 mid1
28.80   10.60    748.00     84.00     0.00     0.00   0.00   0.00    1.55    0.53   0.04    25.97     7.92   1.11   4.36 top4
5.00   18.00    122.40    106.40     0.00     0.00   0.00   0.00   13.32   18.32   0.39    24.48     5.91   1.18   2.72 top5
4.60   27.40    124.00    160.80     0.00     0.00   0.00   0.00   10.35   15.71   0.46    26.96     5.87   1.09   3.48 top2
26.40   12.20    676.80     88.80     0.00     0.00   0.00   0.00    2.14    0.52   0.05    25.64     7.28   1.07   4.12 bot3
4.60   25.40    104.80    137.60     0.00     0.00   0.00   0.00    5.26    0.64   0.04    22.78     5.42   0.31   0.94 mid4
5.40   19.00    130.40    119.20     0.00     0.00   0.00   0.00    3.81    0.52   0.02    24.15     6.27   0.57   1.38 mid5
25.00   12.00    596.80     80.00     0.00     0.20   0.00   1.64    3.61    0.13   0.08    23.87     6.67   1.03   3.80 mid2
28.00   11.20    678.40     81.60     0.00     0.00   0.00   0.00    3.67    0.59   0.10    24.23     7.29   1.23   4.84 bot2
5.00   23.40    114.40    140.80     0.00     0.00   0.00   0.00    9.28    0.43   0.05    22.88     6.02   0.51   1.44 bot4
5.00   27.00    120.80    151.20     0.00     0.00   0.00   0.00    4.04    0.75   0.03    24.16     5.60   0.49   1.56 bot5
0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00 mid3
27.40    8.60    661.60     77.60     0.00     0.00   0.00   0.00    1.77    0.49   0.04    24.15     9.02   1.16   4.18 bot1
27.00    9.60    692.00     84.80     0.00     0.00   0.00   0.00    2.10    0.52   0.05    25.63     8.83   1.15   4.20 top1


### ZFS clone script

Sunday, March 28th, 2021

ZFS has some magical features, comparable to NetApp’s WAFL capabilities. One of the less-used on is the ZFS send/receive, which can be utilised as an engine below something much like NetApp’s SnapMirror or SnapVault.

The idea, if you are not familiar with NetApp’s products, is to take a snapshot of a dataset on the source, and clone it to a remote storage. Then, take another snapshot, and clone only the delta between both snapshots, and so on. This allows for cloning block-level changes only, which reduces clone payload and the time required to clone it.

Copy and save this file as clone_zfs_snapshots.sh. Give it execution permissions.

#!/bin/bash
# This script will clone ZFS snapshots incrementally over SSH to a target server
# Snapshot name structure: [email protected]${TGT_HASH}_INT ; where INT is an increment number # Written by Etzion. Feel free to use. See more stuff in my blog at https://run.tournament.org.il # Arguments: #$1: ZFS filesystem name
# $2: (target ZFS system):(target ZFS filesystem) IAM=$0
ZFS=/sbin/zfs
LOCKDIR=/dev/shm
LOCAL_SNAPS_TO_LEAVE=3
RESUME_LIMIT=3

### FUNCTIONS ###

# Sanity and usage
function usage() {
echo "Usage: $IAM SRC REMOTE_SERVER:ZFS_TARGET (port=SSH_PORT)" echo "ZFS_TARGET is the parent of filesystems which will be created with the original source names" echo "Example:$IAM share/test backupsrv:backup"
echo "It will create a filesystem 'test' under the pool 'backup' on 'backupsrv' with clone"
echo "of the current share/test ZFS filesystem"
echo "This script is (on purpose) not a recursive script"
echo "For the script to work correctly, it *must* have SSH key exchanged from source to target"
exit 0
}

function abort() {
# exit errorously with a message
echo "[email protected]"
pkill -P $$remove_lock exit 1 } function parse_parameters() { # Parses command line parameters # called with * SRC_FS=1 shift TGT=1 shift for i in * do case {i} in port=*) PORT={i##*=} ;; hash=*) HASH={i##*=} ;; esac done TGT_SYS={TGT%%:*} TGT_FS={TGT##*:} # Use a short substring of MD5sum of the target name for later unique identification SRC_DIRNAME_FS={SRC_FS#*/} if [ -z "hash" ] then TGT_FULLHASH="echo TGT_FS/{SRC_DIRNAME_FS} | md5sum -" TGT_HASH={TGT_FULLHASH:1:7} else TGT_HASH={hash} fi } function sanity() { # Verify we have all details [ -z "SRC_FS" ] && usage [ -z "TGT_FS" ] && usage [ -z "TGT_SYS" ] && usage ZFS list -H -o name SRC_FS > /dev/null 2>&1 || abort "Source filesystem SRC_FS does not exist" # check_target_fs || abort "Target ZFS filesystem TGT_FS on TGT_SYS does not exist, or not imported" } function remove_lock() { # Removes the lock file \rm -f {LOCKDIR}/SRC_LOCK } function construct_ssh_cmd() { # Constract the remote SSH command # Here is a good place to put atomic parameters used for the SSH [ -z "{PORT}" ] && PORT=22 SSH="ssh -p PORT TGT_SYS -o ConnectTimeout=3" CONTROL_SSH="SSH -f" } function get_last_remote_snapshots() { # Gets the last snapshot name on a remote system, to match it to our snapshots remoteSnapTmpObj=SSH "ZFS list -H -t snapshot -r -o name {TGT_FS}/{SRC_DIRNAME_FS}" | grep {SRC_DIRNAME_FS}@ | grep {TGT_HASH} # Create a list of all snapshot indexes. Empty means its the first one remoteSnaps="" for snapIter in {remoteSnapTmpObj} do remoteSnaps="remoteSnaps {snapIter##*@{TGT_HASH}_}" done } function check_if_remote_snapshot_exists() { # Argument: 1 ->; Name of snapshot # Checks if this snapshot exists on remote node SSH "ZFS list -H -t snapshot -r -o name {TGT_FS}/{SRC_DIRNAME_FS}@{TGT_HASH}_{newLocalIndex}" return ? } function get_last_local_snapshots() { # This function will return an array of local existing snapshots using the existing TGT_HASH localSnapTmpObj=ZFS list -H -t snapshot -r -o name SRC_FS | grep [email protected] | grep TGT_HASH  # Convert into a list and remove the HASH and everything before it. We should have clear list of indexes localSnapList="" for snapIter in {localSnapTmpObj} do localSnapList="localSnapList {snapIter##*@{TGT_HASH}_}" done # Convert object to array localSnapList=( localSnapList ) # Get the last object let localSnapArrayObj={#localSnapList[@]}-1 } function delete_snapshot() { # This function will delete a snapshot # arguments: 1 -> snapshot name [ -z "1" ] && abort "Cleanup snapshot got no arguments" ZFS destroy 1 #ZFS destroy {SRC_FS}@{TGT_HASH}_{newLocalIndex} } function find_matching_snapshot() { # This function will attempt to find a matching snapshot as a replication baseline # Gets the latest local snapshot index localRecentIndex={localSnapList[localSnapArrayObj]} # Gets the latest mutual snapshot index while [ localSnapArrayObj -ge 0 ] do # Check if the current counter already exists if echo "remoteSnaps" | grep -w {localSnapList[localSnapArrayObj]} > /dev/null 2>&1 then # We know the mutual index. commonIndex={localSnapList[localSnapArrayObj]} return 0 fi let localSnapArrayObj-- done # If we've reached here - there is no mutual index! abort "There is no mutual snapshot index, you will have to resync" } function cleanup_snapshots() { # Creates a list of snapshots to delete and then calls delete_snapshot function # We are using the most recent common index, localSnapArrayObj as the latest reference for deletion let deleteArrayObj=localSnapArrayObj-{LOCAL_SNAPS_TO_LEAVE} snapsToDelete="" # Construct a list of snapshots to delete, and delete it in reverse order while [ deleteArrayObj -ge 0 ] do # Construct snapshot name snapsToDelete="snapsToDelete {SRC_FS}@{TGT_HASH}_{localSnapList[deleteArrayObj]}" let deleteArrayObj-- done snapsToDelete=( snapsToDelete ) snapDelete=0 while [ snapDelete -lt {#snapsToDelete[@]} ] do # Delete snapshot delete_snapshot {snapsToDelete[snapDelete]} let snapDelete++ done } function initialize() { # This is a unique case where we initialize the first sync # We will call this procedure when remoteSnaps is empty (meaning that there was no snapshot whatsoever) # We have to verify that the target has no existing old snapshots here # is it empty? echo "Going to perform an initialization replication. It might wipe the target TGT_FS completely" echo "Press Enter to proceed, or Ctrl+C to abort" read "abc" ### Decided to remove this check ### [ -n "LOCSNAP_LIST" ] && abort "No target snapshots while local history snapshots exists. Clean up history and try again" RECEIVE_FLAGS="-sFdvu" newLocalIndex=1 # NEW_LOC_INDEX=1 create_local_snapshot newLocalIndex open_remote_socket sleep 1 ZFS send -ce {SRC_FS}@{TGT_HASH}_{newLocalIndex} | nc TGT_SYS NC_PORT 2>&1 if [ "?" -ne "0" ] then # Do no cleanup current snapshot # delete_snapshot {SRC_FS}@{TGT_HASH}_{newLocalIndex} abort "Failed to send initial snapshot to target system" fi sleep 1 # Set target to RO SSH ZFS set readonly=on TGT_FS [ "?" -ne "0" ] && abort "Failed to set remote filesystem TGT_FS to read-only" # No need to remove local snapshot } function create_local_snapshot() { # Creates snapshot on local storage # uses argument 1 [ -z "1" ] && abort "Failed to get new snapshot index" ZFS snapshot {SRC_FS}@{TGT_HASH}_{1} [ "?" -ne "0" ] && abort "Failed to create local snapshot. Check error message" } function open_remote_socket() { # Starts remote socket via SSH (as the control operation) # port is 3000 + three-digit random number let NC_PORT=3000+RANDOM%1000 CONTROL_SSH "nc -l -i 90 NC_PORT | ZFS receive {RECEIVE_FLAGS} TGT_FS > /tmp/output 2>&1 ; sync" #CONTROL_SSH "socat tcp4-listen:{NC_PORT} - | ZFS receive {RECEIVE_FLAGS} TGT_FS > /tmp/output 2>&1 ; sync" #zfs send -R [email protected] | zfs receive -Fdvu zpnew } function send_zfs() { # Do the heavy lifting of opening remote socket and starting ZFS send/receive open_remote_socket sleep 1 ZFS send -ce -I {SRC_FS}@{TGT_HASH}_{commonIndex} {SRC_FS}@{TGT_HASH}_{newLocalIndex} | nc -i 90 TGT_SYS NC_PORT #ZFS send -ce -I {SRC_FS}@{TGT_HASH}_{commonIndex} {SRC_FS}@{TGT_HASH}_{newLocalIndex} | socat tcp4-connect:{TGT_SYS}:{NC_PORT} - sleep 20 } function increment() { # Create a new snapshot with the index localRecentIndex+1, and replicate it to the remote system # Baseline is the most recent common snapshot index commonIndex RECEIVE_FLAGS="-Fsdvu" # With an 'F' flag maybe? # Handle the case of latest snapshot in DR is newer than current latest snapshot, due to mistaken deletion remoteSnaps=( remoteSnaps ) let remoteIndex={#remoteSnaps[@]} # Get last snapshot on DR if [ {localRecentIndex} -lt {remoteIndex} ] then let newLocalIndex={remoteIndex}+1 else let newLocalIndex=localRecentIndex+1 fi create_local_snapshot newLocalIndex send_zfs # if [ "?" -ne "0" ] # then # Cleanup current snapshot #delete_snapshot {SRC_FS}@{TGT_HASH}_{newLocalIndex} #abort "Failed to send incremental snapshot to target system" # fi if ! verify_correctness then if ! loop_resume # If we can then # We either could not resume operation or failed to run with the required amount of iterations # For now we abort. echo "Deleting local snapshot" delete_snapshot {SRC_FS}@{TGT_HASH}_{newLocalIndex} abort "Remote snapshot should have the index of the latest snapshot, but it is not. The current remote snapshot index is {commonIndex}" fi fi } function loop_resume() { # Attempts to loop over resuming until limit attempt has been reached REMOTE_TOKEN=(SSH "ZFS get -Ho value receive_resume_token {TGT_FS}/{SRC_DIRNAME_FS}") if [ "REMOTE_TOKEN" == "-" ] then return 1 fi # We have a valid resume token. We will retry COUNT=1 while [ "COUNT" -le "RESUME_LIMIT" ] do # For ease of handline - for each iteration, we will request the token again echo "Attempting resume operation" REMOTE_TOKEN=(SSH "ZFS get -Ho value receive_resume_token {TGT_FS}/{SRC_DIRNAME_FS}") let COUNT++ open_remote_socket ZFS send -e -t REMOTE_TOKEN | nc -i 90 TGT_SYS NC_PORT #ZFS send -e -t REMOTE_TOKEN | socat tcp4-connect:{TGT_SYS}:{NC_PORT} - sleep 20 if verify_correctness then echo "Done" return 0 fi done # If we've reached here, we have failed to run the required iterations. Lets just verify again return 1 } function verify_correctness() { # Check remote index, and verify it is correct with the current, latest snapshot if check_if_remote_snapshot_exists then echo "Replication Successful" return 0 else echo "Replication failed" return 1 fi } ### MAIN ### [ whoami != "root" ] && abort "This script has to be called by the root user" [ -z "1" ] && usage parse_parameters * SRC_LOCK=echo SRC_FS | tr / _ if [ -f {LOCKDIR}/SRC_LOCK ] then echo "Already locked. If should not be the case - remove {LOCKDIR}/SRC_LOCK" exit 1 fi sanity touch {LOCKDIR}/SRC_LOCK construct_ssh_cmd get_last_remote_snapshots # Have a string list of remoteSnaps # If we dont have remote snapshot it should be initialization if [ -z "remoteSnaps" ] then initialize echo "completed initialization. Done" remove_lock exit 0 fi # We can get here only if it is not initialization get_last_local_snapshots # Have a list (array) of localSnaps find_matching_snapshot # Get the latest local index and the latest common index available increment # Creates a new snapshot and sends/receives it cleanup_snapshots # Cleans up old local snapshots pkill -P$$
remove_lock
echo "Done"


A manual initial run should be called manually. If you expect a very long initial sync, you should run it in tmux to screen, to avoid failing in the middle.

To run the command, run it like this:

./clone_zfs_snapshots.sh share/my-data backuphost:share


This will create under the pool ‘share’ in the host ‘backuphost’ a filesystem matching the source (in this case: share/my-data) and set it to read-only. The script will create a snapshot with a unique name based on a shortened hash of the destination, with a counting number suffix, and start cloning the snapshot to the remote host. When called again, it will create a snapshot with the same name, but different index, and clone the delta to the remote host. In case of a disconnection, the clone will retry a few times before failing.

Note that the receiving side does not remove snapshots, so handling (too) old snapshots on the backup host remains up to you.

### targetcli extend fileio backend

Friday, April 3rd, 2020

I am working on an article which will describe the procedures required to extend LUN on Linux storage clients, with and without use of multipath (device-mapper-multipath) and with and without partitioning (I tend to partition storage disks, even when this is not exactly required). Also – it will deal with migration from MBR to GPT partition layout, as part of this process.

During my lab experiments, I have created a dedicated Linux storage machine for this purpose. This is not my first, of course, and not likely my last either, however, one of the challenges I’ve had to confront was how to extend or resize in general an iSCSI LUN from the storage point of view. This is not as straight-forward as one might have expected.

My initial setup:

• Centos 7 or later is used.
• Using targetcli command-line (meaning – using LIO mechanism).
• I am using ZFS for the purpose of easily allocating block devices and files on filesystems. This is not a must – LVM can do just right.
• targetcli is using automatic saveconfig (default configuration).

I will not go over the whole process of setting up and running iSCSI target server. You can find this in so many guides around the web, such as this and that, as well as so many more. So, skipping that – we have a Linux providing three LUNs to another Linux over iSCSI. Currently – using a single network link.

Now comes the interesting part – if I want to expand/resize my LUN on the storage, there are several branches of possibilities.

Assuming we are using the ‘block’ backstore – there is nothing complicated about it – just extend the logical volume, or the ZFS volume, and you’re done with that. Here is an example:

LVM:

lvextend -L +1G /dev/storageVG/lun1

ZFS:

zfs set volsize=11G storage/lun1 # volsize should be the final size

Extremely simple. Starting at this point, LIO will know of the updated sizes, and will just notify any relevant party. The clients, of course, will need to rescan the iSCSI storage, and adept according to the methods in use (see my comment at the beginning of this post about my project).

It is as simple as that if using ‘fileio’ backstore with a block device. Although this is not the best recommended setup, it allows for (default) more aggressive write-back cache, and might reduce disk load. If this is how your backstore is defined (fileio + block device) – same procedure applies as before – extend the block device, and everyone is notified about it.

It becomes harder when using a real file as the ‘fileio’ backstore. By default, fileio will create a new file when defined, or use an existing one. It will use thin provisioning by default, which means it will not have the exact knowledge of the file’s size. Extending or shrinking the file, except for the possibility of data corruption, would have no impact.

Documentation about how to do is is non-existing. I have investigated it, and came to the following conclusion:

• It is a dangerous procedure, so do it at your own risk!
• It will result in a short IO failure because we will need to restart the service target.service

This is how it goes. Follow this short list and you shall win:

• Calculate the desired size in bytes.
• Copy to a backup the file /etc/target/saveconfig.json
• Edit the file, and identify the desired LUN – you can identify the file name/path
• Change the size from the specified size to the desired size
• Restart the target.service service

During the service restart all IO would fail, and client applications might get IO errors. It should be faster than the default iSCSI retransmission timeout, but this is not guaranteed. If using multipath (especially with queue_if_no_path flag) the likeness of this to affect your iSCSI clients is nearly zero. Make sure you test this on a non-production environment first, of course.

Hope it helps.

### HA ZFS NFS Storage

Tuesday, January 29th, 2019

I have described in this post how to setup RHCS (Redhat Cluster Suite) for ZFS services, however – this is rather outdated, and would work with RHEL/Centos version 6, but not version 7. RHEL/Centos 7 use Pacemaker as a cluster infrastructure, and it behaves, and configures, entirely differently.

This is something I’ve done several times, however, in this particular case, I wanted to see if there was a more “common” way of doing this task, if there was a path already there, or did I need to create my own agents, much like I’ve done before for RHCS 6, in the post mentioned above. The quick answer is that this has been done, and I’ve found some very good documentation here, so I need to thank Edmund White and his wiki.

I was required to perform several changes, though, because I wanted to use IPMI as the fencing mechanism before using SCSI reservation (which I trust less), and because my hardware was different, without multipathing enabled (single path, so there was no point in adding complexity for no apparent reason).

The hardware I’m using in this case is SuperMicro SBB, with 15x 3.5″ shared disks (for our model), and with some small internal storage, which we will ignore, except for placing the Linux OS on.

For now, I will only give a high-level view of the procedure. Edmund gave a wonderful explanation, and my modifications were minor, at best. So – this is a fast-paced procedure of installing everything, from a thin minimal Centos 7 system to a running cluster. The main changes between Edmund version and mine is as follows:

• I used /etc/zfs/vdev_id.conf and not multipathing for disk names aliases (used names with the disk slot number. Makes it easier for me later on)
• I have disabled SElinux. It is not required here, and would only increase complexity.
• I have used Stonith levels – a method of creating fencing hierarchy, where you attempt to use a single (or multiple) fencing method(s) before going for the next level. A good example would be to power fence, by disabling two APU sockets (both must be disconnected in parallel, or else the target server would remain on), and if it failed, then move to SCSI fencing. In my case, I’ve used IPMI fencing as the first layer, and SCSI fencing as the 2nd.
• This was created as a cluster for XenServer. While XenServer supports both NFSv3 and NFSv4, it appears that the NFSD for version 4 does not remove file handles immediately when performing ‘unexport’ operation. This prevents the cluster from failing over, and results in a node reset and bad things happening. So, prevented the system from exporting NFSv4 at all.
• The ZFS agent recommended by Edmund has two bugs I’ve noticed, and fixed. You can get my version here – which is a pull request on the suggested-by-Edmund version.
yum groupinstall "high availability"
yum install epel-release
# Edit ZFS to use dkms, and then
yum install kernel-devel zfs
wget -O /usr/lib/ocf/resource.d/heartbeat/ZFS https://raw.githubusercontent.com/skiselkov/stmf-ha/e74e20bf8432dcc6bc31031d9136cf50e09e6daa/heartbeat/ZFS
chmod +x /usr/lib/ocf/resource.d/heartbeat/ZFS
systemctl disable firewalld
systemctl stop firewalld
systemctl disable NetworkManager
systemctl stop NetworkManager
# disable SELinux -> Edit /etc/selinux/config
systemctl enable corosync
systemctl enable pacemaker
yum install kernel-devel zfs
systemctl enable pcsd
systemctl start pcsd
# edit /etc/zfs/vdev_id.conf  -> Setup device aliases
zpool create storage -o ashift=12 -o autoexpand=on -o autoreplace=on -o cachefile=none mirror d03 d04 mirror d05 d06 mirror d07 d08 mirror d09 d10 mirror d11 d12 mirror d13 d14 spare d15 cache s02
zfs set compression=lz4 storage
zfs set atime=off storage
zfs set acltype=posixacl  storage
zfs set xattr=sa storage
# edit /etc/sysconfig/nfs and add to RPCNFSDARGS  "-N 4.1 -N 4"
systemctl enable nfs-server
systemctl start nfs-server
zfs create storage/vm01
zfs set [email protected]/24,async,no_root_squash,no_wdelay storage/vm01
passwd hacluster # Setup a known password
systemctl start pcsd
pcs cluster auth storagenode1 storagenode2
pcs cluster setup --start --name zfs-cluster storagenode1,storagenode1-storage storagenode2,storagenode2-storage
pcs property set no-quorum-policy=ignore
pcs stonith create fence-scsi fence_scsi pcmk_monitor_action="metadata" pcmk_host_list="storagenode1,storagenode2" devices="/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdh,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp" meta provides=unfencing
pcs stonith level add 1 storagenode1 storagenode1-ipmi
pcs stonith level add 1 storagenode2 storagenode2-ipmi
pcs stonith level add 2 storagenode1 fence-scsi
pcs stonith level add 2 storagenode2 fence-scsi
pcs resource defaults resource-stickiness=100
pcs resource create storage ZFS pool="storage" op start timeout="90" op stop timeout="90" --group=group-storage