HA ZFS NFS Storage

I have described in this post how to setup RHCS (Redhat Cluster Suite) for ZFS services, however – this is rather outdated, and would work with RHEL/Centos version 6, but not version 7. RHEL/Centos 7 use Pacemaker as a cluster infrastructure, and it behaves, and configures, entirely differently.

This is something I’ve done several times, however, in this particular case, I wanted to see if there was a more “common” way of doing this task, if there was a path already there, or did I need to create my own agents, much like I’ve done before for RHCS 6, in the post mentioned above. The quick answer is that this has been done, and I’ve found some very good documentation here, so I need to thank Edmund White and his wiki.

I was required to perform several changes, though, because I wanted to use IPMI as the fencing mechanism before using SCSI reservation (which I trust less), and because my hardware was different, without multipathing enabled (single path, so there was no point in adding complexity for no apparent reason).

The hardware I’m using in this case is SuperMicro SBB, with 15x 3.5″ shared disks (for our model), and with some small internal storage, which we will ignore, except for placing the Linux OS on.

For now, I will only give a high-level view of the procedure. Edmund gave a wonderful explanation, and my modifications were minor, at best. So – this is a fast-paced procedure of installing everything, from a thin minimal Centos 7 system to a running cluster. The main changes between Edmund version and mine is as follows:

  • I used /etc/zfs/vdev_id.conf and not multipathing for disk names aliases (used names with the disk slot number. Makes it easier for me later on)
  • I have disabled SElinux. It is not required here, and would only increase complexity.
  • I have used Stonith levels – a method of creating fencing hierarchy, where you attempt to use a single (or multiple) fencing method(s) before going for the next level. A good example would be to power fence, by disabling two APU sockets (both must be disconnected in parallel, or else the target server would remain on), and if it failed, then move to SCSI fencing. In my case, I’ve used IPMI fencing as the first layer, and SCSI fencing as the 2nd.
  • This was created as a cluster for XenServer. While XenServer supports both NFSv3 and NFSv4, it appears that the NFSD for version 4 does not remove file handles immediately when performing ‘unexport’ operation. This prevents the cluster from failing over, and results in a node reset and bad things happening. So, prevented the system from exporting NFSv4 at all.
  • The ZFS agent recommended by Edmund has two bugs I’ve noticed, and fixed. You can get my version here – which is a pull request on the suggested-by-Edmund version.

Here is the list:

yum groupinstall “high availability”
yum install epel-release
# Edit ZFS to use dkms, and then
yum install kernel-devel zfs
Download ZFS agent
wget -O /usr/lib/ocf/resource.d/heartbeat/ZFS https://raw.githubusercontent.com/skiselkov/stmf-ha/e74e20bf8432dcc6bc31031d9136cf50e09e6daa/heartbeat/ZFS
chmod +x /usr/lib/ocf/resource.d/heartbeat/ZFS
systemctl disable firewalld
systemctl stop firewalld
systemctl disable NetworkManager
systemctl stop NetworkManager
# disable SELinux -> Edit /etc/selinux/config
systemctl enable corosync
systemctl enable pacemaker
yum install kernel-devel zfs
systemctl enable pcsd
systemctl start pcsd
# edit /etc/zfs/vdev_id.conf -> Setup device aliases
zpool create storage -o ashift=12 -o autoexpand=on -o autoreplace=on -o cachefile=none mirror d03 d04 mirror d05 d06 mirror d07 d08 mirror d09 d10 mirror d11 d12 mirror d13 d14 spare d15 cache s02
zfs set compression=lz4 storage
zfs set atime=off storage
zfs set acltype=posixacl storage
zfs set xattr=sa storage

# edit /etc/sysconfig/nfs and add to RPCNFSDARGS “-N 4.1 -N 4”
systemctl enable nfs-server
systemctl start nfs-server
zfs create storage/vm01
zfs set [email protected]/24,async,no_root_squash,no_wdelay storage/vm01
passwd hacluster # Setup a known password
systemctl start pcsd
pcs cluster auth storagenode1 storagenode2
pcs cluster setup –start –name zfs-cluster storagenode1,storagenode1-storage storagenode2,storagenode2-storage
pcs property set no-quorum-policy=ignore
pcs stonith create storagenode1-ipmi fence_ipmilan ipaddr=”storagenode1-ipmi” lanplus=”1″ passwd=”ipmiPassword” login=”cluster” pcmk_host_list=”storagenode1″
pcs stonith create storagenode2-ipmi fence_ipmilan ipaddr=”storagenode2-ipmi” lanplus=”1″ passwd=”ipmiPassword” login=”cluster” pcmk_host_list=”storagenode2″
pcs stonith create fence-scsi fence_scsi pcmk_monitor_action=”metadata” pcmk_host_list=”storagenode1,storagenode2″ devices=”/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdh,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp” meta provides=unfencing
pcs stonith level add 1 storagenode1 storagenode1-ipmi
pcs stonith level add 1 storagenode2 storagenode2-ipmi
pcs stonith level add 2 storagenode1 fence-scsi
pcs stonith level add 2 storagenode2 fence-scsi
pcs resource defaults resource-stickiness=100
pcs resource create storage ZFS pool=”storage” op start timeout=”90″ op stop timeout=”90″ –group=group-storage
pcs resource create storage-ip IPaddr2 ip=1.1.1.7 cidr_netmask=24 –group group-storage

# It might be required to unfence SCSI disks, so this is how:
fence_scsi -d /dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdh,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp -n storagenode1 -o on
# Checking if the node has reservation on disks – to know if we need to unfence
sg_persist –in –report-capabilities -v /dev/sdc

Oracle ACFS autostart on Oracle RAC stand alone (Oracle Restart)

I would like to start with a declaration – I would prefer not to use ACFS for a stand-alone system. It binds the “normal” order of startup and mounts to the cluster. Not only that – but while until version 12.1, RAC stand alone had a built-in service for ACFS, this is no longer the case for 12.2 and above. This resource/service exists only for a two (and above) node clusters.

If you have upgraded from a 12.1 (or 11.2) stand alone RAC to 12.2 or above, you will no longer be able to automatically mount your ACFS disks. This (and some minor bugs I’ve found with ACFS) is part of the reason I would recommend against using ACFS for stand alone system. HOWEVER – there are cases where you have no choice – either because you are using ACFS replication/snapshots, or because you are using Oracle ASM redundancy model (using either “normal” or “high” redundancy) over a JBoD – which forces you to use ADVM and with it – ACFS is only a small addition.

As I’ve written before – ACFS won’t auto start on 12.2 stand alone GI. A possible solution I thought of (but did not apply, and thus – cannot show it here) is to use a method of creating a 3rd party application service (as described in a document called “TWP-Oracle-Clusterware-3rd-party” to implement a custom service which will actually mount your ACFS for you, when the cluster is ready to do so. I would have done it like that in a recent project, however, a nice person called Pierre has done it for me, slightly differently – he used a systemd services to run custom scripts which attempted to run in loop until the cluster was ready to perform the required actions. I have tested it, and it works well. My only comment about it, which you will be able to see in his blog post, was that if your ORACLE_HOME resides on a dedicated mount point (which is my case, usually), you should force your systemd unit to require this mount as well as its prerequisites. Other than that – his solution worked well, and I thank him for his time and efforts. Kudos Pierre!

USB Auto mapping to Windows VM under KVM does not work

Let me first say, that it does work for Linux guest. It doesn’t work on Windows guest because there is a know bug (/issue) with the default hardware layout – made of i440FX BIOS. VirtManager would not allow us to replace the settings, so we need to create the VM ourselves using XML. You can export your XML settings (of an existing VM) using the command

virsh dumpxml  > /tmp/VM_NAME.xml

There are relevant fields there which you might want to save for later, like MAC addresses, network settings, and so on.

You can use this XML file to build your VM anew. Note that you will want to modify the network settings, the name and the UUID. Also – you will need a newer QEMU command (through the package qemu-system-x86), you can find in the Centos updates repository, . It has been providing me with /usr/bin/qemu-system-x86_64 command, which I am using, instead of the default qemu command used by default by VirtManager.

My Windows VM XML file (as a reference you can copy and use) is provided below. Major modifications are required to the hardware settings of the Windows VM – moving from PCI to PCIE, changing from IDE to SATA or VirtIO – and the provided XML gives a good reference of how this file should look like. This was taken from a machine tested to allow USB hot-add/remove via the method provided in my previous post.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
<domain type='kvm' id='1'>
  <name>Windows</name>                             <!-- Change the name to match your settings -->
  <uuid>9a10dc43-5c39-411d-8dc9-f6c6b849d212</uuid> <!-- Make sure you change the UUID. Dont have duplicates -->
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-2.0'>hvm</type> <!-- This is the key part - using different machine type -->
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
  </features>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator> <!-- This is important! -->
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/images/Windows-test1.qcow2'/>  <!-- Point to the right file. It can be raw or qcow2, so change the line above accordingly -->
      <backingStore/>
      <target dev='sda' bus='sata'/> <!-- I have used SATA and not VirtIO, because the later requires drivers on Windows. I will need to handle that in the future, but for our example, it should work -->
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <backingStore/>
      <target dev='sdb' bus='sata'/>
      <readonly/>
      <boot order='1'/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x05' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x05' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x05' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x05' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <model name='i82801b11-bridge'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='2'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='ioh3420'/>
      <target chassis='3' port='0x10'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x06' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='11:12:14:12:a1:03'/>  <!-- Change the MAC address! -->
      <source bridge='bridge'/>
      <model type='rtl8139'/>    <!-- Also wanted VirtIO, but needs drivers -->
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/10'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/10'>
      <source path='/dev/pts/10'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <graphics type='spice' port='5900' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
      <image compression='off'/>
    </graphics>
    <sound model='ich6'>
      <alias name='sound0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <redirdev bus='usb' type='spicevmc'>
      <alias name='redir0'/>
      <address type='usb' bus='0' port='2'/>
    </redirdev>
    <redirdev bus='usb' type='spicevmc'>
      <alias name='redir1'/>
      <address type='usb' bus='0' port='3'/>
    </redirdev>
    <memballoon model='virtio'>
      <stats period='5'/>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x07' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='none' model='none'/>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>

Now just import the VM using

virsh define /tmp/NEW_MACHINE.xml

and your changes are available via VirtManager, where you can edit (some of them).

Auto mapping USB Disk on Key to KVM VM using libvirt and udev

I was required to auto map a USB DoK to a KVM VM (specific VM, mind you!), as a result of connecting this device to the host. I’ve looked it up on the Internet, and the closest I could get there was this link. It was almost a complete solution, but it had a few bugs, so I will re-describe the whole process, with the fixes I’ve added to the process and udev rules file. While this guide is rather old, it did solve my requirement, which was to map a specific set of devices (“known USB devices”) to the VM, and not any and every USB device (or even – USB DoK) connected to the system.

In my example, I’ve used SanDisk Corp. Ultra Fit, which its USB identifier is 0781:5583, as can be seen using ‘lsusb’ command:

[[email protected] ~]# lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 020: ID 0781:5583 SanDisk Corp. Ultra Fit

My VM is called “centos7.0” in this example. I am using integrated KVM+QEMU+LIBVIRT on a generic CentOS 7.5 system.

Preparation

You will need to prepare two files:

  • USB definitions file (for easier config of libvirt)
  • UDEV rules file (which will be triggered by add/remove operation, and will call the USB definitions file)
USB Definitions file

I’ve placed it in /opt/autousb/hostdev-0781:5583.xml , and it holds the following (mind the USB device identifiers!)

1
2
3
4
5
6
<hostdev mode='subsystem' type='usb'>
  <source>
    <vendor id='0x0781'/>
    <product id='0x5583'/>
  </source>
</hostdev>

I’ve created a file /etc/udev/rules.d/90-libvirt-usb.rules with the content below. Note that the device identifiers are there, but in the “remove” section they appear differently. Remove leading zero(s) and change the string. This is caused because on removal, the device does not report all its properties to the OS. Also – you cannot connect more than three (3) such devices to a VM, so when you fail to detach three devices (following a consecutive insert/remove operations, for example), you will not be able to attach a fourth time.

ACTION=="add", \
    SUBSYSTEM=="usb", \
    ENV{ID_VENDOR_ID}=="0781", \
    ENV{ID_MODEL_ID}=="5583", \
    RUN+="/usr/bin/virsh attach-device centos7.0 /opt/autousb/hostdev-0781:5583.xml"
ACTION=="remove", \
    SUBSYSTEM=="usb", \
    ENV{PRODUCT}=="781/5583/100", \
    RUN+="/usr/bin/virsh detach-device centos7.0 /opt/autousb/config/hostdev-0781:5583.xml"

Now, all that’s left to do is to reload udev using the following command:

udevadm trigger

To monitor the system behaviour, run either of these commands:

udevadm monitor --property --udev 

or

udevadm monitor --environment --udev 

Oracle 12.2 Grid Infrastructure installation tips

There are many sites explaining how to install Oracle GI 12.2, however, there are some special tricks which can simplify GI installation.

For once, when installing GI and then installing the huge PatchSet (which is usually around 1.4GB in size) – it takes time. A lot of time. A simple but not very well documented trick is to run the installer with a specific flag, pointing to the extracted PSU. You can obtain the PSU from Oracle document ID 2118136.2 (Oracle support plan required).

After extracting the contents of the oracle grid home to the destination directory, and after extracting the contents of the PSU to a known (other) location, run from the oracle grid home directory the following command:

./gridSetup.sh -applyPSU /path/to/PSU

(case sensitive). You will need a working $DISPLAY because following the patch apply phase, an installation window will pop up.

That said, make sure you remove (rpm -e –nodeps stix-fonts) if you are on RHEL/OEL/Centos version newer than 7.4. These fonts will prevent the GUI installer from starting (java would crash) and will cause great frustration. You can later on restore this package if you feel the urge.

Additional trick I’ve seen, but yet to try, is how to run the GI installer unattended. This can be done like this:

./gridSetup.sh -silent -responseFile /path/to/response/file.rsp
< run root.sh as directed >
./gridSetup.sh -executeConfigTools -all -silent -responseFile /path/to/response/file.rsp

Hope this helps.