Archive for July, 2021

RaspberryPi Zero loses connectivity

Friday, July 9th, 2021

I have had a problem with RPI Zero. The system was working fine, and then it did not. I am using Raspbery Linux (Debian-based) with kernel 5.10.17+. Once a while (usually with network load) the system loses connectivity. Everything seems to be fine, if you have a serial/USB console there, but the wireless network fails. This problem was also mentioned here.

My workaround was to create a script with a cron scheduling. I have identified that the fault lies with the wlan driver, and it needs to get reloaded. So cron calls this script every minute, like this:

*/1 * * * * /usr/local/sbin/check_connection.sh

And the script (/usr/local/sbin/check_connection.sh) has this in it:

1
2
3
4
5
6
7
8
9
#!/bin/bash
# DST is the network gateway
DST=192.168.230.1
if ! ping -c 5 -t 5 $DST > /dev/null
then
  #/usr/sbin/reboot
  /usr/bin/logger "Restarting wlan0 network driver"
  /usr/sbin/rmmod brcmfmac && /usr/sbin/modprobe brcmfmac roamoff=1
fi

Set this script to be executable, and your RPI Zero should work just fine. This is not a solution, but a workaround, of course, but it works well.

Better iostat visibility of ZFS vdevs

Sunday, July 4th, 2021

All ZFS users are familiar with ‘zpool iostat’ command, however, it is not easily translated into Linux ‘iostat’ command. Using large pools with many disks will result in a mess, where it’s hard to identify which disk is which, and going to a translation table from time to time, to identify a suspect slow disk.

Linux ‘iostat’ command allows to use aliases, and if you’r using vdev_id.conf file, and you are using ZFS aliased names, you can harness the same naming to your ‘iostat’ command. See my command example below – note that in this setup I do not use multipath or other DM devices, but a direct approach to /dev/sd devices. Also – in this case – I have a few (slightly more than a dozen) disks, so no need to address /dev/sd[a-z][a-z] devices:

iostat -kt 5 -j vdev -x /dev/sd? /dev/nvme0n1

You can chain more devices to the end of this line. The result should be something like this:

07/04/2021 10:36:54 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.26    0.00    0.87    1.49    0.00   97.39

     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
    0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00 nvme0n1
   42.80   15.40   5461.60   1938.40     0.00     0.00   0.00   0.00    0.50    0.87   0.01   127.61   125.87   1.12   6.50 sata500g2
   52.20    0.00   6631.20      0.00     0.00     0.00   0.00   0.00    0.43    0.00   0.00   127.03     0.00   1.40   7.30 sata500g1
   40.60    0.40   5196.80     51.20     0.00     0.00   0.00   0.00    0.44    0.50   0.00   128.00   128.00   1.41   5.80 sata500g4
   71.60    6.20   9148.00    776.80     0.00     0.00   0.00   0.00    0.45    0.48   0.00   127.77   125.29   1.34  10.46 sata500g3
   35.00    6.00   4463.20    768.00     0.00     0.00   0.00   0.00    0.44    0.53   0.00   127.52   128.00   1.24   5.08 sata1t
    0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00 mid1
   28.80   10.60    748.00     84.00     0.00     0.00   0.00   0.00    1.55    0.53   0.04    25.97     7.92   1.11   4.36 top4
    5.00   18.00    122.40    106.40     0.00     0.00   0.00   0.00   13.32   18.32   0.39    24.48     5.91   1.18   2.72 top5
    4.60   27.40    124.00    160.80     0.00     0.00   0.00   0.00   10.35   15.71   0.46    26.96     5.87   1.09   3.48 top2
   26.40   12.20    676.80     88.80     0.00     0.00   0.00   0.00    2.14    0.52   0.05    25.64     7.28   1.07   4.12 bot3
    4.60   25.40    104.80    137.60     0.00     0.00   0.00   0.00    5.26    0.64   0.04    22.78     5.42   0.31   0.94 mid4
    5.40   19.00    130.40    119.20     0.00     0.00   0.00   0.00    3.81    0.52   0.02    24.15     6.27   0.57   1.38 mid5
   25.00   12.00    596.80     80.00     0.00     0.20   0.00   1.64    3.61    0.13   0.08    23.87     6.67   1.03   3.80 mid2
   28.00   11.20    678.40     81.60     0.00     0.00   0.00   0.00    3.67    0.59   0.10    24.23     7.29   1.23   4.84 bot2
    5.00   23.40    114.40    140.80     0.00     0.00   0.00   0.00    9.28    0.43   0.05    22.88     6.02   0.51   1.44 bot4
    5.00   27.00    120.80    151.20     0.00     0.00   0.00   0.00    4.04    0.75   0.03    24.16     5.60   0.49   1.56 bot5
    0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00 mid3
   27.40    8.60    661.60     77.60     0.00     0.00   0.00   0.00    1.77    0.49   0.04    24.15     9.02   1.16   4.18 bot1
   27.00    9.60    692.00     84.80     0.00     0.00   0.00   0.00    2.10    0.52   0.05    25.63     8.83   1.15   4.20 top1

Max your CPU performance on Linux servers

Friday, July 2nd, 2021

I have had an interesting experience with HPE servers, where the BIOS was defined to allow max performance (in contrast to ‘balanced’ mode), but still – the CPU was not at max all the time.

While we generally, strive to a greener computing, when having low-latency workload, we expect a deterministic performance. We want our database to provide results in a timely, and more important – a consistent way. We want to expect a certain query/job to take the same time, whenever we run it. CPU throttling and frequency manipulation, which are part of CPU power-management are not our friends. A certain query can take longer, if the CPU is currently throttled, and can switch to another, slower, core mid-task. This can result in a very complex performance tuning and server behaviour troubleshooting.

The goal of setting the BIOS to ‘max-performance’ is obvious, then, however – it does not have the desired effect. The CPU keeps on throttling, and, while some power-reducing features are disabled, not all. This is not our goal.

Adding to the boot options of the server (via the GRUB/GRUB2/GRUB-EFI/Whatever) the following parameters, should disable Intel CPU throttling, and all (i)relevant C_states. I have not tested, nor investigated AMD behaviour:

nosoftlockup intel_idle.max_cstate=0 mce=ignore_ce

Of course – a reboot is required for these settings to take effect.

If you want to control power management afterwards, you can manually disable a certain core/CPU by running a command such as this:

echo 0 > /sys/devices/system/cpu/cpu<number>/online

It will disable the CPU (and place it in deep power save) until woken by another CPU, by echoing ‘1’ to that same ‘file’.