Enable blk_mq on Redhat (OEL/Centos) 7 and 8

The IO scheduler blk_mq was designed to increase disk performance – better IOps and lower latency – on flash disk systems, with emphasis on NVME devices. A nice by-product, is that it increases performance on the “older” SCSI/SAS/SATA layer as well, when the disks in the back are SSD, and even when they are rotational disks, in some cases.

When attempting to increase system disk performance (where and when this is the bottleneck, such as in the case of databases), this is a valid and relevant configuration option. If you have a baseline performance metrics to compare to – great ; But even if you do not have other, previous performance details – in most cases – for a modern system – your performance will improve after this change.

NVME devices get this scheduler enabled by default, so no worry there. However, your system might not be aware of the backend storage type of devices, nor of the backend capabilities when dealing with SCSI subsystem, so – by default – these settings are not enabled.

To enable these settings on RHEL/OEL/Centos version 7 and 8, you should do the following:

Edit /etc/default/grub and append to your GRUB_CMDLINE_LINUX the following string:

scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=y

This will enable blk_mq both on the SCSI subsystem, and the device mapper (DM), which includes software RAID, LVM and device-mapper-multipath (aka – multipath).

Following this, run:

grub2-mkconfig -o /boot/grub2/grub.cfg  # for legacy BIOS

or

grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg  # for EFI

Of course – a reboot is required for these changes to take effect. Following the reboot, you will be able to see that the contents of /sys/block/sda/queue/scheduler (for /dev/sda, in my example. Check your relevant disks, of course) shows “[mq-deadline]”

If you want to read more about blk_mq – you can find it in great detail in this blog post.

Reduce traverse time of large directory trees on Linux

Every Linux admin is familiar with the long time running through a large directory tree (with hundred of thousands of files and more) can take. Most are aware that if you re-run the same run-through, it will be shorter.

This is caused by a short-valid filesystem cache, where the memory is allocated to other tasks, or the metadata required cache exceeds the available for this task.

If the system is focused on files, meaning that its prime task is holding files (like NFS server, for example) and the memory is largely available, a certain tunable can reduce recurring directory dives (like the ‘find’ or ‘rsync’ commands, which run huge amounts of attribute queries):

sysctl vm.vfs_cache_pressure=10

The default value is 100. Lower values will cause the system to prefer keeping this cache. A quote from kernel’s memory tunables page:

vfs_cache_pressure
——————————–
This percentage value controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to reclaim dentries and inodes at a “fair” rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will never reclaim dentries and inodes due to memory pressure and this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes.

Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are.

Installation boot of RHEL8 (network settings)

This blog is my extended memory, and as such, its task is to remind me things I tend to forget, saving me the time required to search them again. So here is another one of these things.

The network settings syntax for RHEL8/OEL8 or any of their compatible systems, when you want to pass these to Anaconda, as can be found here, are

inst.ks=http://url-to-ks.com ip=10.10.10.2::10.10.10.254:255.255.255.0:testsrv1:em1:none dns=10.10.10.253

These network settings works for static IP addresses, and would be constructed by these arguments:

ip=|IP address|::|gateway|:|netmask|:|hostname|:|interface|:|bootproto|

I find this syntax confusing, and so – I’ve kept it here to help me remember it.

Hope it helps.

Selective dnsmasq logging (split dnsmasq logging)

My system provides DNS services, using dnsmasq, to several different subnets. I wish to log specific queries to different files – as I want to identify, and maybe even respond to certain DNS queries of the IoT network.

The (excellent) utility dnsmasq is unable to split the logging into multiple log files, or filter logging by expressions, so we need to combine the power of dnsmasq’s logging with rsyslogd’s expression matching.

Let’s assume I have two networks. One is 192.168.1.x – the home LAN, and the other is 172.16.1.x – the IoT network.

I have added to my /etc/dnsmasq.conf file the following lines:

log-facility=DAEMON
log-async
log-queries=extra

I have created a file called /etc/rsyslog.d/dnsmasq.conf with the following contents:

if $programname == 'dnsmasq' and $msg contains ' 192.168.1.' then /var/log/dnsmasq/dnsmasq-lan.log
if $programname == 'dnsmasq' and $msg contains ' 172.16.1.' then /var/log/dnsmasq/dnsmasq-iot.log
if $programname == 'dnsmasq-dhcp' then /var/log/dnsmasq/dnsmasq-dhcp.log
if $programname == 'dnsmasq' then stop
if $programname == 'dnsmasq-dhcp' then stop

Of course – I need to create the directory /var/log/dnsmasq, and create a logrotate entry /etc/logrotate.d/dnsmasq as follows:

/var/log/dnsmasq/dnsmasq-iot.log /var/log/dnsmasq/dnsmasq-lan.log /var/log/dnsmasq/dnsmasq-dhcp.log {
  monthly
  missingok
  notifempty
  maxsize 5M
  rotate 14
  delaycompress
  # create 0640 dnsmasq root
  sharedscripts
  postrotate
    /usr/bin/systemctl kill -s HUP rsyslog.service >/dev/null 2>&1 || true
  endscript
}

Note that the DNS queries of the networks are kept in a dedicated per-network file (dnsmasq-lan.log and dnsmasq-iot.log) and all general (non IP specific) messages are kept in dnsmasq-dhcp.log file. Logrotate makes sure I do not overfill my directory, and I can later on identify which IoT (or home, for that matter) DNS query is sent and by whom.

Quick items about repackaging Linux ISO

There are two topics I would like to describe here, for later reference (by myself, of course. This blog is my extended memory). The first is about how to create a bootable ISO out of RHEL extracted ISO, and the other is about how to download only specific update, or make your own RHEL updates on-prem mirror.

Bootable ISO

From within the (modified?) extracted ISO of RHEL7.x (in this example. Match settings to your needs), in order to be able to boot the ISO both in legacy and uEFI BIOS – you can run this command:

genisoimage -J -T -o ../RHEL-7.9_`date +%F_%H-%M-%S`.iso -b isolinux/isolinux.bin -J -R -l -c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -eltorito-alt-boot -e images/efiboot.img -no-emul-boot -graft-points -V "RHEL-7.9 Server.x86_64" .

Create a local mirror of RHEL packages

This is a long one, so I will leave only a link to RedHat’s article about it. I hope you have access (you should, if you want to mirror their repository). If you don’t – it’s easy and free to open an account (even without subscribed systems), so you’ll have access to their articles. The article can be found here