Archive for the ‘Linux’ Category

RaspberryPi Zero loses connectivity

Friday, July 9th, 2021

I have had a problem with RPI Zero. The system was working fine, and then it did not. I am using Raspbery Linux (Debian-based) with kernel 5.10.17+. Once a while (usually with network load) the system loses connectivity. Everything seems to be fine, if you have a serial/USB console there, but the wireless network fails. This problem was also mentioned here.

My workaround was to create a script with a cron scheduling. I have identified that the fault lies with the wlan driver, and it needs to get reloaded. So cron calls this script every minute, like this:

*/1 * * * * /usr/local/sbin/check_connection.sh

And the script (/usr/local/sbin/check_connection.sh) has this in it:

1
2
3
4
5
6
7
8
9
#!/bin/bash
# DST is the network gateway
DST=192.168.230.1
if ! ping -c 5 -t 5 $DST > /dev/null
then
  #/usr/sbin/reboot
  /usr/bin/logger "Restarting wlan0 network driver"
  /usr/sbin/rmmod brcmfmac && /usr/sbin/modprobe brcmfmac roamoff=1
fi

Set this script to be executable, and your RPI Zero should work just fine. This is not a solution, but a workaround, of course, but it works well.

Better iostat visibility of ZFS vdevs

Sunday, July 4th, 2021

All ZFS users are familiar with ‘zpool iostat’ command, however, it is not easily translated into Linux ‘iostat’ command. Using large pools with many disks will result in a mess, where it’s hard to identify which disk is which, and going to a translation table from time to time, to identify a suspect slow disk.

Linux ‘iostat’ command allows to use aliases, and if you’r using vdev_id.conf file, and you are using ZFS aliased names, you can harness the same naming to your ‘iostat’ command. See my command example below – note that in this setup I do not use multipath or other DM devices, but a direct approach to /dev/sd devices. Also – in this case – I have a few (slightly more than a dozen) disks, so no need to address /dev/sd[a-z][a-z] devices:

iostat -kt 5 -j vdev -x /dev/sd? /dev/nvme0n1

You can chain more devices to the end of this line. The result should be something like this:

07/04/2021 10:36:54 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.26    0.00    0.87    1.49    0.00   97.39

     r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util Device
    0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00 nvme0n1
   42.80   15.40   5461.60   1938.40     0.00     0.00   0.00   0.00    0.50    0.87   0.01   127.61   125.87   1.12   6.50 sata500g2
   52.20    0.00   6631.20      0.00     0.00     0.00   0.00   0.00    0.43    0.00   0.00   127.03     0.00   1.40   7.30 sata500g1
   40.60    0.40   5196.80     51.20     0.00     0.00   0.00   0.00    0.44    0.50   0.00   128.00   128.00   1.41   5.80 sata500g4
   71.60    6.20   9148.00    776.80     0.00     0.00   0.00   0.00    0.45    0.48   0.00   127.77   125.29   1.34  10.46 sata500g3
   35.00    6.00   4463.20    768.00     0.00     0.00   0.00   0.00    0.44    0.53   0.00   127.52   128.00   1.24   5.08 sata1t
    0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00 mid1
   28.80   10.60    748.00     84.00     0.00     0.00   0.00   0.00    1.55    0.53   0.04    25.97     7.92   1.11   4.36 top4
    5.00   18.00    122.40    106.40     0.00     0.00   0.00   0.00   13.32   18.32   0.39    24.48     5.91   1.18   2.72 top5
    4.60   27.40    124.00    160.80     0.00     0.00   0.00   0.00   10.35   15.71   0.46    26.96     5.87   1.09   3.48 top2
   26.40   12.20    676.80     88.80     0.00     0.00   0.00   0.00    2.14    0.52   0.05    25.64     7.28   1.07   4.12 bot3
    4.60   25.40    104.80    137.60     0.00     0.00   0.00   0.00    5.26    0.64   0.04    22.78     5.42   0.31   0.94 mid4
    5.40   19.00    130.40    119.20     0.00     0.00   0.00   0.00    3.81    0.52   0.02    24.15     6.27   0.57   1.38 mid5
   25.00   12.00    596.80     80.00     0.00     0.20   0.00   1.64    3.61    0.13   0.08    23.87     6.67   1.03   3.80 mid2
   28.00   11.20    678.40     81.60     0.00     0.00   0.00   0.00    3.67    0.59   0.10    24.23     7.29   1.23   4.84 bot2
    5.00   23.40    114.40    140.80     0.00     0.00   0.00   0.00    9.28    0.43   0.05    22.88     6.02   0.51   1.44 bot4
    5.00   27.00    120.80    151.20     0.00     0.00   0.00   0.00    4.04    0.75   0.03    24.16     5.60   0.49   1.56 bot5
    0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00 mid3
   27.40    8.60    661.60     77.60     0.00     0.00   0.00   0.00    1.77    0.49   0.04    24.15     9.02   1.16   4.18 bot1
   27.00    9.60    692.00     84.80     0.00     0.00   0.00   0.00    2.10    0.52   0.05    25.63     8.83   1.15   4.20 top1

Max your CPU performance on Linux servers

Friday, July 2nd, 2021

I have had an interesting experience with HPE servers, where the BIOS was defined to allow max performance (in contrast to ‘balanced’ mode), but still – the CPU was not at max all the time.

While we generally, strive to a greener computing, when having low-latency workload, we expect a deterministic performance. We want our database to provide results in a timely, and more important – a consistent way. We want to expect a certain query/job to take the same time, whenever we run it. CPU throttling and frequency manipulation, which are part of CPU power-management are not our friends. A certain query can take longer, if the CPU is currently throttled, and can switch to another, slower, core mid-task. This can result in a very complex performance tuning and server behaviour troubleshooting.

The goal of setting the BIOS to ‘max-performance’ is obvious, then, however – it does not have the desired effect. The CPU keeps on throttling, and, while some power-reducing features are disabled, not all. This is not our goal.

Adding to the boot options of the server (via the GRUB/GRUB2/GRUB-EFI/Whatever) the following parameters, should disable Intel CPU throttling, and all (i)relevant C_states. I have not tested, nor investigated AMD behaviour:

nosoftlockup intel_idle.max_cstate=0 mce=ignore_ce

Of course – a reboot is required for these settings to take effect.

If you want to control power management afterwards, you can manually disable a certain core/CPU by running a command such as this:

echo 0 > /sys/devices/system/cpu/cpu<number>/online

It will disable the CPU (and place it in deep power save) until woken by another CPU, by echoing ‘1’ to that same ‘file’.

VNC minimal server on RHEL8 / CentOS8 / OEL8

Tuesday, June 29th, 2021

With the update to version 8 (or, if to be accurate, according to common documentations, 8.3 and above), VNC configuration has had a major overhaul, and it is entirely different. Most documentations will refer to using ‘gnome’ as the desktop manager, however – if you want something lightweight with minimal impact (both package-wise and performance-wise), using the minimal ‘xinit-compat’ could work well for you.

To make it work, you will need to follow these steps:

Install the required packages

The required packages are xorg-x11-xinit-session, xorg-x11-xinit, tigervnc-server, metacity, xterm. All dependant packages need to be installed too, of course.

Configuration

The initial (and minimal) configuration required is to edit /etc/tigervnc/vncserver.users and add the required user with a display port. In this example, I have used ‘:1’. The content of the file should look something like this:

# TigerVNC User assignment
#
# This file assigns users to specific VNC display numbers.
# The syntax is <display>=<username>. E.g.:
#
# :2=andrew
# :3=lisa
:1=myuser

Then, set a vnc password for the user, using ‘vncpasswd’ as the user.

Then, create a file in the directory .vnc in the user’s home directory, called config, with contents such as this:

session=xinit-compat
securitytypes=tlsvnc
desktop=sandbox
geometry=1800x1000

Now comes the tricky part (and not well documented) of setting the xinit startup script. Create a file in the user’s home directory called ‘.xsession’ with execution permissions, containing the following code:

xterm &
xterm &
metacity

This file is being read by xinit-compat, and will allow running a minimal desktop with two xterm windows, and a window manager (metacity).

Starting the service

Now, all that is required is to start the vnc service,. Use the command ‘systemctl start [email protected]:1’ . Note that the number ‘:1’ is in correlation to the file mentioned earlier ‘vncserver.users’.

This should do the trick. Let me know how it worked for you.

sign_and_send_pubkey: signing failed for RSA from agent: agent refused operation

Tuesday, June 1st, 2021

This is a problem which bugs the Internet. Many solutions were suggested. All of them work sometimes (like making sure that the permissions on the private/public SSH key files are correct).

One solution which never gets mentioned is about DNS. When attempting login to a remote server, the server attempts to resolve the connecting address (yours) using the DNS server defined to the server. This is the default for SSHD, and I seldom see any reason to change it. However, if the DNS servers are inaccessible, then the login takes very long (actually – waiting for the DNS query attempt to timeout). If this thing happens when using SSH-agent (I use KeePass2 with KeeAgent, as a very secure means of protecting my private keys. I suggest you check it out), the timeout might be just long enough to timeout on the agent’s side, and thus, produce the dreaded ‘sign_and_send_pubkey: signing failed for RSA from agent: agent refused operation’ error message.

The easy two possible solutions (pick one)L

  • Disable SSH DNS query (Remove the comment before ‘UseDNS no’ directive in /etc/ssh/sshd_config)
  • Disable DNS servers – comment out the defined servers in /etc/resolv.conf

This solves the timeout, and with it – this message, and your connection problems.