Archive for the ‘Disk Storage’ Category

Ubuntu 18.04 and TPM2 encrypted system disk

Monday, May 27th, 2019

When you encrypt your entire disk, you are required to enter your passphrase every time you boot your computer. The TPM device has a purpose – keeping your secrets secure (available only to your running system), and combined with SecureBoot, which prevents any unknown kernel/disk from booting, and with BIOS password – you should be fully protected against data theft, even when an attacker has a physical access to your computer in the comfort of her home.

It is easier said than done. For the passphrase to work, you need to make sure your initramfs (the initial RAM disk) has the means to extract the passphrase from the TPM, and give it to the encryptFS LUKS mechanism.

I am assuming you are installing an Ubuntu 18 (tested on 18.04) from scratch, you have TPM2 device (Dell Latitude 7490, in my case), and you know your way a bit around Linux. I will not delay on explaining how to edit a file in a restricted location and so on. Also – these steps do not include the hardening of your BIOS settings – passwords and the likes.

Install an Ubuntu 18.04 System

Install an Ubuntu system, and make sure to select ‘encrypt disk’. We aim at full disk encryption. The installer might ask if it should enable SecureBoot – it should, so let it do so. Make sure you remember the passphrases you’ve used in the disk encryption process. We will generate a more complex one later on, but this should do for now. Reboot the system, enter the passphrase, and let’s get to work

Make sure TPM2 works

TPM2 device is /dev/tpm0, in most cases. I did not go into TPM Resource Manager, because it felt overkill for this task. However, you will need to install (using ‘apt’) the package tpm2_tools. Use this opportunity to install ‘perl’.

You can, following that, check that your TPM is working by running the command:

sudo tpm2_nvdefine -x 0x1500016 -a 0x40000001 -s 64 -t 0x2000A -T device

This command will define a 64 byte space at the address 0x1500016, and will not go through the resource manager, but directly to the device.

Generate secret passphrase

To generate your 64bit secret passphrase, you can use the following command (taken from here):

cat /dev/urandom | tr -dc ‘a-zA-Z0-9’ | fold -w 64 | head -n 1 > root.key

Add this key to the LUKS disk

To add this passphrase, you will need to identify the device. Take a look at /etc/crypttab (we will edit it later) and identify the 1nd field – the label, which will relate to the device. Since you have to be in EFI mode, it is most likely /dev/sda3. Note that you will be required to enter the current (and known) passphrase. You can later (when everything’s working fine) remove this passphrase

sudo cryptsetup luksAddKey /dev/sda3 root.key

Save the key inside your TPM slot

Now, we have an additional passphrase, which we will save with the TPM device. Run the following command:

sudo tpm2_nvwrite -x 0x1500016 -a 0x40000001 -f root.key -T device

This will save the key to the slot we have defined earlier. If the command succeeded, remove the root.key file from your system, to prevent easy access to the decryption key.

Create a key recovery script

To read the key from the TPM device, we will need to run a script. The following script would do the trick. Save it to /usr/local/sbin/key and give it execution permissions by root (I used 750, which was excellent, and did not invoke errors later on)

1
2
3
4
5
#!/bin/sh
key=$(tpm2_nvread -x 0x1500016 -a 0x40000001 -s 64 -o 0 -T device | tail -n 1)
key=$(echo $key | tr -d ' ')
key=$(echo $key | /usr/bin/perl -ne 's/([0-9a-f]{2})/print chr hex $1/gie')
printf $key

Run the script, and you should get your key printed back, directly from the TPM device.

Adding required commands to initramfs

For the system to be capable of running the script, it needs several commands, and their required libraries and so on. For that, we will use a hook file called /etc/initramfs-tools/hooks/decryptkey

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/bin/sh
PREREQ=""
prereqs()
 {
     echo "$PREREQ"
 }
case $1 in
 prereqs)
     prereqs
     exit 0
     ;;
esac
. /usr/share/initramfs-tools/hook-functions
copy_exec /usr/sbin/tpm2_nvread
copy_exec /usr/bin/perl
exit 0

The file has 755 permissions, and is owned by root:root

Crypttab

To combine all of this together, we will need to edit /etc/crypttab and modify the argument (for our relevant LUKS) by appending the directive ‘keyscript=/usr/local/sbin/key’

My file looks like this:

sda3_crypt UUID=d4a5a9a4-a2da-4c2e-a24c-1c1f764a66d2 none luks,discard,keyscript=/usr/local/sbin/key

Of course – make sure to save a copy of this file before changing it.

Combining everything together

We should create a new initramfs file, however, we should make sure we keep the existing one for fault handing, if required to. I suggest you run the following command before anything else, so keep the original initramfs file:

sudo cp /boot/initrd.img-`uname -r` /boot/initrd.img-`uname -r`.orig

If we are required to use the regular mechanism, we should edit our GRUB entry and change the initrd entry, and append ‘.orig’ to its name.

Now comes the time when we create the new initramfs, and make ourselves ready for the big change. Do not do it if the TPM read script failed! If it failed, your system will not boot using this initramfs, and you will have to use the alternate (.orig) one. You should continue only if your whole work so far has been error-free. Be warned!

sudo mkinitramfs -o /boot/initrd.img-`uname -r` `uname -r`

You are now ready to reboot and to test your new automated key pull.

Troubleshooting

Things might not work well. There are a few methods to debug.

Boot the system with the original initramfs

If you discover your system cannot boot, and you want to boot your system as-it-were, use your original initramfs file, by changing GRUB during the initial menu (you might need to press on ‘Esc’ key at a critical point) . Change the initrd.img-<your kernel here> to initrd.img-<your kernel here>.orig and boot the system. It should prompt for the passphrase, where you can enter the passphrase used during installation.

Debug the boot process

By editing your GRUB menu and appending the word ‘debug=vc’ (without the quotes) to the kernel line, as well as removing the ‘quiet’, ‘splash’ and ‘$vt_hansoff’ directives, you will be able to view the boot process on-screen. It could help a lot.

Stop the boot process at a certain stage

The initramfs goes through a series of “steps”, and you can stop it wherever you want. The reasonable “step” would be right before ‘premount’ stage, which should be just right for you. Add to the kernel line in GRUB menu the directive ‘break’ without the quotes, and remove the excessive directives as mentioned above. This flag can work with ‘debug’ mentioned above, and might give you a lot of insight into your problems. You can read more here about initramfs and how to debug it.

Conclusion

These directives should work well for you. TPM2 enabled, SecureBoot enabled, fresh Ubuntu installation – you should be just fine. Please let me know how it works for you, and hope it helps!

HA ZFS NFS Storage

Tuesday, January 29th, 2019

I have described in this post how to setup RHCS (Redhat Cluster Suite) for ZFS services, however – this is rather outdated, and would work with RHEL/Centos version 6, but not version 7. RHEL/Centos 7 use Pacemaker as a cluster infrastructure, and it behaves, and configures, entirely differently.

This is something I’ve done several times, however, in this particular case, I wanted to see if there was a more “common” way of doing this task, if there was a path already there, or did I need to create my own agents, much like I’ve done before for RHCS 6, in the post mentioned above. The quick answer is that this has been done, and I’ve found some very good documentation here, so I need to thank Edmund White and his wiki.

I was required to perform several changes, though, because I wanted to use IPMI as the fencing mechanism before using SCSI reservation (which I trust less), and because my hardware was different, without multipathing enabled (single path, so there was no point in adding complexity for no apparent reason).

The hardware I’m using in this case is SuperMicro SBB, with 15x 3.5″ shared disks (for our model), and with some small internal storage, which we will ignore, except for placing the Linux OS on.

For now, I will only give a high-level view of the procedure. Edmund gave a wonderful explanation, and my modifications were minor, at best. So – this is a fast-paced procedure of installing everything, from a thin minimal Centos 7 system to a running cluster. The main changes between Edmund version and mine is as follows:

  • I used /etc/zfs/vdev_id.conf and not multipathing for disk names aliases (used names with the disk slot number. Makes it easier for me later on)
  • I have disabled SElinux. It is not required here, and would only increase complexity.
  • I have used Stonith levels – a method of creating fencing hierarchy, where you attempt to use a single (or multiple) fencing method(s) before going for the next level. A good example would be to power fence, by disabling two APU sockets (both must be disconnected in parallel, or else the target server would remain on), and if it failed, then move to SCSI fencing. In my case, I’ve used IPMI fencing as the first layer, and SCSI fencing as the 2nd.
  • This was created as a cluster for XenServer. While XenServer supports both NFSv3 and NFSv4, it appears that the NFSD for version 4 does not remove file handles immediately when performing ‘unexport’ operation. This prevents the cluster from failing over, and results in a node reset and bad things happening. So, prevented the system from exporting NFSv4 at all.
  • The ZFS agent recommended by Edmund has two bugs I’ve noticed, and fixed. You can get my version here – which is a pull request on the suggested-by-Edmund version.

Here is the list:

yum groupinstall “high availability”
yum install epel-release
# Edit ZFS to use dkms, and then
yum install kernel-devel zfs
Download ZFS agent
wget -O /usr/lib/ocf/resource.d/heartbeat/ZFS https://raw.githubusercontent.com/skiselkov/stmf-ha/e74e20bf8432dcc6bc31031d9136cf50e09e6daa/heartbeat/ZFS
chmod +x /usr/lib/ocf/resource.d/heartbeat/ZFS
systemctl disable firewalld
systemctl stop firewalld
systemctl disable NetworkManager
systemctl stop NetworkManager
# disable SELinux -> Edit /etc/selinux/config
systemctl enable corosync
systemctl enable pacemaker
yum install kernel-devel zfs
systemctl enable pcsd
systemctl start pcsd
# edit /etc/zfs/vdev_id.conf -> Setup device aliases
zpool create storage -o ashift=12 -o autoexpand=on -o autoreplace=on -o cachefile=none mirror d03 d04 mirror d05 d06 mirror d07 d08 mirror d09 d10 mirror d11 d12 mirror d13 d14 spare d15 cache s02
zfs set compression=lz4 storage
zfs set atime=off storage
zfs set acltype=posixacl storage
zfs set xattr=sa storage

# edit /etc/sysconfig/nfs and add to RPCNFSDARGS “-N 4.1 -N 4”
systemctl enable nfs-server
systemctl start nfs-server
zfs create storage/vm01
zfs set [email protected]/24,async,no_root_squash,no_wdelay storage/vm01
passwd hacluster # Setup a known password
systemctl start pcsd
pcs cluster auth storagenode1 storagenode2
pcs cluster setup –start –name zfs-cluster storagenode1,storagenode1-storage storagenode2,storagenode2-storage
pcs property set no-quorum-policy=ignore
pcs stonith create storagenode1-ipmi fence_ipmilan ipaddr=”storagenode1-ipmi” lanplus=”1″ passwd=”ipmiPassword” login=”cluster” pcmk_host_list=”storagenode1″
pcs stonith create storagenode2-ipmi fence_ipmilan ipaddr=”storagenode2-ipmi” lanplus=”1″ passwd=”ipmiPassword” login=”cluster” pcmk_host_list=”storagenode2″
pcs stonith create fence-scsi fence_scsi pcmk_monitor_action=”metadata” pcmk_host_list=”storagenode1,storagenode2″ devices=”/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdh,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp” meta provides=unfencing
pcs stonith level add 1 storagenode1 storagenode1-ipmi
pcs stonith level add 1 storagenode2 storagenode2-ipmi
pcs stonith level add 2 storagenode1 fence-scsi
pcs stonith level add 2 storagenode2 fence-scsi
pcs resource defaults resource-stickiness=100
pcs resource create storage ZFS pool=”storage” op start timeout=”90″ op stop timeout=”90″ –group=group-storage
pcs resource create storage-ip IPaddr2 ip=1.1.1.7 cidr_netmask=24 –group group-storage

# It might be required to unfence SCSI disks, so this is how:
fence_scsi -d /dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdh,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp -n storagenode1 -o on
# Checking if the node has reservation on disks – to know if we need to unfence
sg_persist –in –report-capabilities -v /dev/sdc

Oracle ACFS autostart on Oracle RAC stand alone (Oracle Restart)

Thursday, January 3rd, 2019

I would like to start with a declaration – I would prefer not to use ACFS for a stand-alone system. It binds the “normal” order of startup and mounts to the cluster. Not only that – but while until version 12.1, RAC stand alone had a built-in service for ACFS, this is no longer the case for 12.2 and above. This resource/service exists only for a two (and above) node clusters.

If you have upgraded from a 12.1 (or 11.2) stand alone RAC to 12.2 or above, you will no longer be able to automatically mount your ACFS disks. This (and some minor bugs I’ve found with ACFS) is part of the reason I would recommend against using ACFS for stand alone system. HOWEVER – there are cases where you have no choice – either because you are using ACFS replication/snapshots, or because you are using Oracle ASM redundancy model (using either “normal” or “high” redundancy) over a JBoD – which forces you to use ADVM and with it – ACFS is only a small addition.

As I’ve written before – ACFS won’t auto start on 12.2 stand alone GI. A possible solution I thought of (but did not apply, and thus – cannot show it here) is to use a method of creating a 3rd party application service (as described in a document called “TWP-Oracle-Clusterware-3rd-party” to implement a custom service which will actually mount your ACFS for you, when the cluster is ready to do so. I would have done it like that in a recent project, however, a nice person called Pierre has done it for me, slightly differently – he used a systemd services to run custom scripts which attempted to run in loop until the cluster was ready to perform the required actions. I have tested it, and it works well. My only comment about it, which you will be able to see in his blog post, was that if your ORACLE_HOME resides on a dedicated mount point (which is my case, usually), you should force your systemd unit to require this mount as well as its prerequisites. Other than that – his solution worked well, and I thank him for his time and efforts. Kudos Pierre!

Auto mapping USB Disk on Key to KVM VM using libvirt and udev

Monday, November 19th, 2018

I was required to auto map a USB DoK to a KVM VM (specific VM, mind you!), as a result of connecting this device to the host. I’ve looked it up on the Internet, and the closest I could get there was this link. It was almost a complete solution, but it had a few bugs, so I will re-describe the whole process, with the fixes I’ve added to the process and udev rules file. While this guide is rather old, it did solve my requirement, which was to map a specific set of devices (“known USB devices”) to the VM, and not any and every USB device (or even – USB DoK) connected to the system.

In my example, I’ve used SanDisk Corp. Ultra Fit, which its USB identifier is 0781:5583, as can be seen using ‘lsusb’ command:

[[email protected] ~]# lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 020: ID 0781:5583 SanDisk Corp. Ultra Fit

My VM is called “centos7.0” in this example. I am using integrated KVM+QEMU+LIBVIRT on a generic CentOS 7.5 system.

Preparation

You will need to prepare two files:

  • USB definitions file (for easier config of libvirt)
  • UDEV rules file (which will be triggered by add/remove operation, and will call the USB definitions file)
USB Definitions file

I’ve placed it in /opt/autousb/hostdev-0781:5583.xml , and it holds the following (mind the USB device identifiers!)

1
2
3
4
5
6
<hostdev mode='subsystem' type='usb'>
  <source>
    <vendor id='0x0781'/>
    <product id='0x5583'/>
  </source>
</hostdev>

I’ve created a file /etc/udev/rules.d/90-libvirt-usb.rules with the content below. Note that the device identifiers are there, but in the “remove” section they appear differently. Remove leading zero(s) and change the string. This is caused because on removal, the device does not report all its properties to the OS. Also – you cannot connect more than three (3) such devices to a VM, so when you fail to detach three devices (following a consecutive insert/remove operations, for example), you will not be able to attach a fourth time.

ACTION=="add", \
    SUBSYSTEM=="usb", \
    ENV{ID_VENDOR_ID}=="0781", \
    ENV{ID_MODEL_ID}=="5583", \
    RUN+="/usr/bin/virsh attach-device centos7.0 /opt/autousb/hostdev-0781:5583.xml"
ACTION=="remove", \
    SUBSYSTEM=="usb", \
    ENV{PRODUCT}=="781/5583/100", \
    RUN+="/usr/bin/virsh detach-device centos7.0 /opt/autousb/config/hostdev-0781:5583.xml"

Now, all that’s left to do is to reload udev using the following command:

udevadm trigger

To monitor the system behaviour, run either of these commands:

udevadm monitor --property --udev 

or

udevadm monitor --environment --udev 

Multiple iSCSI interface on the same subnet

Friday, April 14th, 2017

As far as routing goes, it is a very bad idea to place multiple network interfaces with IPs of a single network (subnet). The routing table, which decides which interface to route the data through, reads the table line by line, thus – all traffic goes through a single interface of the batch (-> of the interfaces “living” on the same network). A common way of working around it is by using network teaming (Linux = bonding) to handle round-robin or active/passive of multiple interfaces on a single network, and that means that the interfaces are “bonded” into a single logical interface, which, then, has no routing issues whatsoever.

Having multiple network interfaces with IPs of the same network does not help with throughput, but does it increase redundancy? The simple answer is “no” – It does not. In Linux, when an interface is disconnected, it does not go down automatically (unlike Windows, for example), meaning that the routing table does not get updated with remaining interfaces. The traffic would still be targeted at that “dead” interface. Moreover – even if Linux did do that, this is a solution for a single type of a problem. What if someone changed the switch to have a different network VLAN on that particular (one of two or more) port? Link remains up, but communication doesn’t get anywhere.

These problems are more noticeable when the single network is an iSCSI network, and the system needs both multiple path access to the iSCSI storage, and redundancy becomes an issue, because iSCSI network is commonly – a critical network.

It is common to connect multiple iSCSI networks and not multiple ports on a single network, maintaining fabric-like separation of networks, and devices, where possible. However – this article will point at what to do if the network layout is not under your control and you need to place multiple network interfaces on the same physical network.

As mentioned before, the routing table will make sure that all your communication to that particular network go by the first routed network interface. However – the iSCSI daemon (open-iscsi) has the capability to bind to specific interfaces. To do so, you need to define an interface to iSCSI. Do so by running the command:

iscsiadm -m iface -I NIC_NAME –op=new

The name is user defined. I recommend using the same names used by the OS, like ‘eth5’ or ‘ib0’ or ‘eno1’ – to keep it simple. This is a declarative definition only. To actually bind the ‘iface’ to the real interface device, you need to run the following command:

iscsiadm -m iface -I NIC_NAME  –op=update -n iface.hwaddress -v 00:AA:BB:CC:DD:EE

Use the real interface MAC address. Then you can discover and login to targets with the defined interfaces only:

iscsiadm -m discovery -t st -p TARGET_IP -I NIC1 -I NIC2

iscsiadm -m node -L all -I NIC1 -I NIC2

According to the documentations, this will require setting the sysctl net.ipv4.conf.default.rp_filter to 0.

This should do the trick, so now multipathing should show the right amount of paths.