Posts Tagged ‘zfs’

targetcli extend fileio backend

Friday, April 3rd, 2020

I am working on an article which will describe the procedures required to extend LUN on Linux storage clients, with and without use of multipath (device-mapper-multipath) and with and without partitioning (I tend to partition storage disks, even when this is not exactly required). Also – it will deal with migration from MBR to GPT partition layout, as part of this process.

During my lab experiments, I have created a dedicated Linux storage machine for this purpose. This is not my first, of course, and not likely my last either, however, one of the challenges I’ve had to confront was how to extend or resize in general an iSCSI LUN from the storage point of view. This is not as straight-forward as one might have expected.

My initial setup:

  • Centos 7 or later is used.
  • Using targetcli command-line (meaning – using LIO mechanism).
  • I am using ZFS for the purpose of easily allocating block devices and files on filesystems. This is not a must – LVM can do just right.
  • targetcli is using automatic saveconfig (default configuration).

I will not go over the whole process of setting up and running iSCSI target server. You can find this in so many guides around the web, such as this and that, as well as so many more. So, skipping that – we have a Linux providing three LUNs to another Linux over iSCSI. Currently – using a single network link.

Now comes the interesting part – if I want to expand/resize my LUN on the storage, there are several branches of possibilities.

Assuming we are using the ‘block’ backstore – there is nothing complicated about it – just extend the logical volume, or the ZFS volume, and you’re done with that. Here is an example:


lvextend -L +1G /dev/storageVG/lun1


zfs set volsize=11G storage/lun1 # volsize should be the final size

Extremely simple. Starting at this point, LIO will know of the updated sizes, and will just notify any relevant party. The clients, of course, will need to rescan the iSCSI storage, and adept according to the methods in use (see my comment at the beginning of this post about my project).

It is as simple as that if using ‘fileio’ backstore with a block device. Although this is not the best recommended setup, it allows for (default) more aggressive write-back cache, and might reduce disk load. If this is how your backstore is defined (fileio + block device) – same procedure applies as before – extend the block device, and everyone is notified about it.

It becomes harder when using a real file as the ‘fileio’ backstore. By default, fileio will create a new file when defined, or use an existing one. It will use thin provisioning by default, which means it will not have the exact knowledge of the file’s size. Extending or shrinking the file, except for the possibility of data corruption, would have no impact.

Documentation about how to do is is non-existing. I have investigated it, and came to the following conclusion:

  • It is a dangerous procedure, so do it at your own risk!
  • It will result in a short IO failure because we will need to restart the service target.service

This is how it goes. Follow this short list and you shall win:

  • Calculate the desired size in bytes.
  • Copy to a backup the file /etc/target/saveconfig.json
  • Edit the file, and identify the desired LUN – you can identify the file name/path
  • Change the size from the specified size to the desired size
  • Restart the target.service service

During the service restart all IO would fail, and client applications might get IO errors. It should be faster than the default iSCSI retransmission timeout, but this is not guaranteed. If using multipath (especially with queue_if_no_path flag) the likeness of this to affect your iSCSI clients is nearly zero. Make sure you test this on a non-production environment first, of course.

Hope it helps.

HA ZFS NFS Storage

Tuesday, January 29th, 2019

I have described in this post how to setup RHCS (Redhat Cluster Suite) for ZFS services, however – this is rather outdated, and would work with RHEL/Centos version 6, but not version 7. RHEL/Centos 7 use Pacemaker as a cluster infrastructure, and it behaves, and configures, entirely differently.

This is something I’ve done several times, however, in this particular case, I wanted to see if there was a more “common” way of doing this task, if there was a path already there, or did I need to create my own agents, much like I’ve done before for RHCS 6, in the post mentioned above. The quick answer is that this has been done, and I’ve found some very good documentation here, so I need to thank Edmund White and his wiki.

I was required to perform several changes, though, because I wanted to use IPMI as the fencing mechanism before using SCSI reservation (which I trust less), and because my hardware was different, without multipathing enabled (single path, so there was no point in adding complexity for no apparent reason).

The hardware I’m using in this case is SuperMicro SBB, with 15x 3.5″ shared disks (for our model), and with some small internal storage, which we will ignore, except for placing the Linux OS on.

For now, I will only give a high-level view of the procedure. Edmund gave a wonderful explanation, and my modifications were minor, at best. So – this is a fast-paced procedure of installing everything, from a thin minimal Centos 7 system to a running cluster. The main changes between Edmund version and mine is as follows:

  • I used /etc/zfs/vdev_id.conf and not multipathing for disk names aliases (used names with the disk slot number. Makes it easier for me later on)
  • I have disabled SElinux. It is not required here, and would only increase complexity.
  • I have used Stonith levels – a method of creating fencing hierarchy, where you attempt to use a single (or multiple) fencing method(s) before going for the next level. A good example would be to power fence, by disabling two APU sockets (both must be disconnected in parallel, or else the target server would remain on), and if it failed, then move to SCSI fencing. In my case, I’ve used IPMI fencing as the first layer, and SCSI fencing as the 2nd.
  • This was created as a cluster for XenServer. While XenServer supports both NFSv3 and NFSv4, it appears that the NFSD for version 4 does not remove file handles immediately when performing ‘unexport’ operation. This prevents the cluster from failing over, and results in a node reset and bad things happening. So, prevented the system from exporting NFSv4 at all.
  • The ZFS agent recommended by Edmund has two bugs I’ve noticed, and fixed. You can get my version here – which is a pull request on the suggested-by-Edmund version.

Here is the list:

yum groupinstall “high availability”
yum install epel-release
# Edit ZFS to use dkms, and then
yum install kernel-devel zfs
Download ZFS agent
wget -O /usr/lib/ocf/resource.d/heartbeat/ZFS
chmod +x /usr/lib/ocf/resource.d/heartbeat/ZFS
systemctl disable firewalld
systemctl stop firewalld
systemctl disable NetworkManager
systemctl stop NetworkManager
# disable SELinux -> Edit /etc/selinux/config
systemctl enable corosync
systemctl enable pacemaker
yum install kernel-devel zfs
systemctl enable pcsd
systemctl start pcsd
# edit /etc/zfs/vdev_id.conf -> Setup device aliases
zpool create storage -o ashift=12 -o autoexpand=on -o autoreplace=on -o cachefile=none mirror d03 d04 mirror d05 d06 mirror d07 d08 mirror d09 d10 mirror d11 d12 mirror d13 d14 spare d15 cache s02
zfs set compression=lz4 storage
zfs set atime=off storage
zfs set acltype=posixacl storage
zfs set xattr=sa storage

# edit /etc/sysconfig/nfs and add to RPCNFSDARGS “-N 4.1 -N 4”
systemctl enable nfs-server
systemctl start nfs-server
zfs create storage/vm01
zfs set [email protected]/24,async,no_root_squash,no_wdelay storage/vm01
passwd hacluster # Setup a known password
systemctl start pcsd
pcs cluster auth storagenode1 storagenode2
pcs cluster setup –start –name zfs-cluster storagenode1,storagenode1-storage storagenode2,storagenode2-storage
pcs property set no-quorum-policy=ignore
pcs stonith create storagenode1-ipmi fence_ipmilan ipaddr=”storagenode1-ipmi” lanplus=”1″ passwd=”ipmiPassword” login=”cluster” pcmk_host_list=”storagenode1″
pcs stonith create storagenode2-ipmi fence_ipmilan ipaddr=”storagenode2-ipmi” lanplus=”1″ passwd=”ipmiPassword” login=”cluster” pcmk_host_list=”storagenode2″
pcs stonith create fence-scsi fence_scsi pcmk_monitor_action=”metadata” pcmk_host_list=”storagenode1,storagenode2″ devices=”/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdh,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp” meta provides=unfencing
pcs stonith level add 1 storagenode1 storagenode1-ipmi
pcs stonith level add 1 storagenode2 storagenode2-ipmi
pcs stonith level add 2 storagenode1 fence-scsi
pcs stonith level add 2 storagenode2 fence-scsi
pcs resource defaults resource-stickiness=100
pcs resource create storage ZFS pool=”storage” op start timeout=”90″ op stop timeout=”90″ –group=group-storage
pcs resource create storage-ip IPaddr2 ip= cidr_netmask=24 –group group-storage

# It might be required to unfence SCSI disks, so this is how:
fence_scsi -d /dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdh,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp -n storagenode1 -o on
# Checking if the node has reservation on disks – to know if we need to unfence
sg_persist –in –report-capabilities -v /dev/sdc