Tips and tricks for Redhat Cluster

Redhat Cluster is a nice HA product. I have been implementing it for a while now, lecturing about it, and yes – I like it. But like any other software product, it has few flaws and issues which you should take under consideration – especially when you create custom “agents” – plugins to control (start/stop/status) your 3rd party application.

I want to list several tips and good practices which will help you create your own agent or custom script, and will help you sleep better at night.

Nighty Night: Sleeping is easier when your cluster is quiet. It usually means that you don’t want the cluster to suddenly failover during night time, or – for that matter, during any hour, unexpectedly.
Below are some tips to help you sleep better, or to perform an easier postmortem of any cluster failure.

Chop the Logs: Since RedHat Cluster logging might be hidden and filled with lots of irrelevant information, you want your agents to be nice about it. Let them log out somewhere the result of running “status” or “stop” or even “start”. Of course – either recycle the output logs, or rotate them away. You could use

exec &>/tmp/my_script_name.out

much like HACMP does (or at least – behaves as if it does). You can also use specific logging facility for different subsystems of the cluster (cman, rg, qdiskd)

Mind the Gap: Don’t trust unknown scripts or applications’ return codes. Your cluster will fail miserably if a script or a file you expect to run will not be there. Do not automatically assume that the vmware script, for example, will return normal values. Check the return codes and decide how to respond accordingly.

Speak the Language: A service in RedHat Cluster is a combination of one or more resources. This can be somewhat confusing as we tend to refer to resources as a (system) service. Use the correct lingo. I will try to do just that in this document, so heed the difference between the terms “service” and “system service”, which can be a cluster resource.

Divide and Conquer: Split your services to the minimal set of resources possible. If your service consists of hundreds of resources failure to one of them could cause the entire service to restart, taking down all other working resources. If you keep it to the minimum, you actually protect yourself.

Trust No One: To stress out the “Mind the Gap” point above – don’t trust 3rd party scripts or applications to return a correct error code. Don’t trust their configuration files, and don’t trust the users to “do it right”. They will not. Try to create your own services as fault-protected as possible. Don’t crash because some stupid user (or a stupid administrator – for a cluster implementer both are the same, right?) used incorrect input parameters, or because he has kept an important configuration file in a different name than was required.

I have some special things I want to do with regard to RedHat Cluster Suite. Stay tuned 🙂

Build ZFS on OEL with UEK7

Byetzion 07/05/202307/05/2023

ZFS version 2.1.11 is supposed to work correctly with Oracle UEK 7. However, on update (and later – a reboot) – it broke. The driver cannot build using dkms, and the error message is something like this: If you look into /boot/config-`uname -r` output, and look for CONFIG_MODULES string, you will indeed find that it…

General Hardware | Linux

RHEL 4 32bit on Tyan Thunder K8QE

Byetzion 09/09/2006

It’s sort of a relationship between myself and this Tyan. Same server, changing demands. This time, we’ve had to install on this server RHEL4 32bit. Net-installing RHEL4 Update 3 was impossible. See here, so we’ve installed it from CDs (didn’t have PCI-X network card handy). When booting the server, it showed us only a single…

Linux

An experiment

Byetzion 30/09/2008

My brother is a computer illiterate. He can use a computer for the purpose of e-mail messaging and for editing documents, spreadsheets, etc. I have decided to “abuse” his older laptop, an IBM X31 and install Ubuntu on it. This is some sort of an experiment. I wonder how he, a simple user, can cope…

Linux

Small but annoying – no XVideo for movies

Byetzion 28/06/2006

It means I cannot resize video. Using the x11 generic driver does not allow resize. I’ve searched for a solution just now, and got to this web page. After some tweeks with my own config file (to remind you, it was built using ATI’s tools), I’ve got it to work correctly. Here’s the updated config…

Linux

RHEL4 tends to change network interfaces names

Byetzion 06/07/2007

RHEL4 tends to change the names of network cards when there are more than one. If you had a NIC called eth0 during install time, it doesn’t mean that it will maintain that name after the first reboot. It could switch names with its friend, and be called now eth1, while the previous eth1 name…

bash | Linux | Scripting/Programming | Virtualization

Oracle VM and network bonding

Byetzion 09/05/201011/07/2015

Oracle VM, out of the box, does not allow network bonds. An excellent guide on how to enable bonding which I have partially followed, has convinced me that changing the relevant scripts would be better. That I have done, and reported in this wiki post. To sum things up – configure bonding/VLAN tagging as you…

Related posts:

Similar Posts

Leave a Reply Cancel reply