VMware Fencing in RedHat Cluster 5 (RHCS5)

Byetzion 14/06/200730/01/2025

Cluster fencing – Unlike many common thoughts, high-availability is not the highest priority of an high-availability cluster, but only the 2nd one. The highest priority of an high-availability cluster is maintenance of data integrity by prevention of multiple concurrent access of nodes to the shared disk.

On different cluster, depending on the vendor, this can be achieved by different methods, either by prevention of access based on the status of the cluster (for example – Microsoft Cluster, which will not allow access to the disks without cluster management and coordination), by panicking the node in question (Oracle RAC, for example, or IBM HACMP), or by preventing failover unless the status of the other node, as well as all heartbeat links were ok up to the exact moment of failure (VCS, for example).

Another method is based on a fence, or “Shoot the Other Node in the Head”. This “fence” is usually based on an hardware device which has no dependencies for the node’s OS, and is capable of shutting it down, many times brutally, upon request. A good fencing device can be a UPS, which supports the other node. The whole idea is that in a case of uncertainty, either one of the nodes can attempt to ‘kill’ the other node, independently of any connectivity issue one of them might experience. This race result is quite obvious: one node remains alive, capable of taking over the resource groups, the other node is off, unable to access the disk in an uncontrolled manner.

Linux-based clusters will not force you to use fencing of any sort, however, for a production environments, setups without any fencing device will be unsupported, as the cluster cannot handle cases of split-brain or uncertainty. These hardware devices, which can be, as said before, a manageable UPS, a remote-control power-switch, the server’s own IPMI (or any other independent system such as HP ILO, IBM HMC, etc), and even the fiber switch – as long as it can prevent the node in question from accessing the disks, are quite expensive, but comparing to hours of restore-from-backup, they sure justify their price.

On many sites there is a demand for a “test” setup which will be as similar to the production setup as possible. This test setup can be used to test upgrades, configuration changes, etc. Using fencing in this environment is important, for two reasons:

1. Simulation of the production system behavior is achieved with as similar setup as possible, and fencing takes an important part in the cluster and its logic.

2. A replicated production environment contain data which might have some importance, and if not that, at least re-replicating it from the production environment after a case of uncontrolled access to the disk by a faulty node (and this test cluster is in a higher risk, as defined by its role), or restoring from tapes is unpleasant and time consuming.

So we agree that the test cluster should have some sort of fencing device, even if not similar to production’s one, for the sake of the cluster logic.

On some sites, there is a demand for more than one test environment. Both setups – a single test environment and multiple test environments can be defined to work as guests on a virtual server. Virtualization assists in saving hardware (and power, and cooling) costs, and allows for easy duplication and replication, so this is a case where it is ideal for the task. This said, it brings up a problem – fencing a virtual server has implications – we can kill all guest systems in one go. We wouldn’t want that to happen. Lucky for us, RedHat Cluster has a fencing device for VMware, which, although not recommended in a production environment, will suffice for a test environment. These are the steps required to setup one such VMware fencing device in RHCS5:

1. Download the latest CVS fence_vmware from here. You can use this direct link (use with “save target as”). Save it in your /sbin directory under the name fence_vmware, and give it execution permissions.

2. Edit fence_vmware. In line 249 change the string “port” to “vmname”.

3. Install VMware Perl API on both cluster nodes. You will need to have gcc and openssl-devel installed on your system to be able to do so.

4. Change your fencing based on this example:

<?xml version="1.0"?>
<cluster alias="Gfs-test" config_version="39" name="Gfs-test">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="cent2" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="man2"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="cent1" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="man1"/>
                                </method>
                                <method name="2">
                                        <device domain="22" name="11"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_vmware" name="man2"
                          ipaddr="192.168.88.1" login="user" passwd="password"
                          vmname="c:vmwarevirt2rhel5.vmx"/>
                <fencedevice agent="fence_vmware" name="man1"
                          ipaddr="192.168.88.1" login="user" passwd="password"
                          vmname="c:vmwarevirt1rhel5.vmx"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources>
                        <fs device="/dev/sda" force_fsck="0" force_unmount="0"
				fsid="5" fstype="ext3" mountpoint="/data"
                                name="sda" options="" self_fence="0"/>
                </resources>
                <service autostart="1" name="smartd">
                        <ip address="192.168.88.201" monitor_link="1"/>
                </service>
                <service autostart="1" name="disk1">
                        <fs ref="sda"/>
                </service>
        </rm>
</cluster>

Change to your relevant VMware username and password.

If you have a Centos system, you will be required to perform these three steps:

1. ‘ln -s /usr/sbin/cman_tool /sbin/cman_tool‘

2. ‘cp /etc/redhat-release /etc/redhat-release.orig‘

3. ‘echo “Red Hat Enterprise Linux Server release 5 (Tikanga)” > /etc/redhat-release‘

This should do the trick. Good luck, and thanks again to Yoni who brought and fought the configuration steps.

***UPDATE***

Per comments (and a bit-late – common logic) I have broken lines in the XML quote for cluster.conf. In cases these line breaks might break something in RedHat Cluster, I have added the original xml file here: cluster.conf

Linux

Painful upgrade from Edgy x86_64 to Feisty x86_64
Byetzion 24/06/2007

If it works, don’t touch it. This is one of my mottoes. I have broken this rule just yesterday when I decided that I was too lazy to install Pidgin from source, and decided I wanted it to be installed directly from deb. Unfortunately, there was no pidgin deb for Edgy. None that I was…

Read More Painful upgrade from Edgy x86_64 to Feisty x86_64
Linux

RHEL5 100% CPU with LDAP client for Active Directory
Byetzion 05/06/2009

ADS integration has been available natively since Windows 2003 R2, and in heterogeneous sites this has become the preferred method of integrating login information, as well as utilizing the added security of using Kerberos wherever possible. The following guide is a very good one, and was the source of information I have used throughout my…

Read More RHEL5 100% CPU with LDAP client for Active Directory
Linux

Xgl, ATI fglrx and Dual-Head setup
Byetzion 11/12/2006

I’ve had a problem for a long while now configuring Xgl on my Dual-Head setup, using GLX and fglrx driver. I have been using 1280×1024 on both of my screen, so my total resolution was 2560×1024. When using Dual-Head setup in “Wide Desktop” mode (via the option Option “DesktopSetup” “horizontal” in the Device section in…

Read More Xgl, ATI fglrx and Dual-Head setup
Clusters | Disk Storage | Oracle

Reduce Oracle ASM disk size
Byetzion 21/01/202021/01/2020

I have had a system with Oracle ASM diskgroup from which I needed some space. The idea was to release some disk space, reduce the size of existing partitions, add a new partition and use it – all online(!) This is a grocery list of tasks to do, with some explanation integrated into it. Overview…

Read More Reduce Oracle ASM disk size
Linux

RHEL4 tends to change network interfaces names
Byetzion 06/07/2007

RHEL4 tends to change the names of network cards when there are more than one. If you had a NIC called eth0 during install time, it doesn’t mean that it will maintain that name after the first reboot. It could switch names with its friend, and be called now eth1, while the previous eth1 name…

Read More RHEL4 tends to change network interfaces names
Disk Storage | Linux

iSCSI persistent configurations agains us all
Byetzion 19/11/2009

Using iSCSI with dm-multipath is rather common setup. With iSCSI running over Ethernet cables, which are too easy to disconnect (either on purpose or by mistake), being cheap and common technology – multipath becomes a must. If you have multiple network links, this is only expected that you use multipath for your iSCSI configuration. It’s…

Read More iSCSI persistent configurations agains us all

22 Comments

Yonatan says:

27/06/2007 at 1:22 pm

thanks!

works on redhat 4 clustered with gfs 🙂

Reply
Ez-Aton says:

27/06/2007 at 2:17 pm

What works? VMware fencing? As described?

Reply
Yonatan says:

28/06/2007 at 11:22 am

Yup Vmware fencing works with a simple code modification

adding these 4 lines after line 264:

17a18
> #use strict;
264,268d264
< elsif ($name eq "nodename" )
< {
< $opt_ZZv = $val;
< }
<

Reply
Running Systems says:

10/11/2007 at 3:47 pm

During an attempt to use the VMware Perl SDK, I have encountered the following error:VMControl Panic: SSLLoadSharedLibrary: Failed to load library /usr/bin/libcrypto.so.0.9.7:/usr/bin/libcrypto.so.0.9.7: cannot open shared object file: No such file or dir

Reply
Running Systems says:

11/11/2007 at 12:22 am

I have been struggling with RH Cluster 4 with VMware fencing device. This was also a good experiance with qdiskd, the Disk Quorum directive and utilization. I have several conclusions out of this experience. First, the configuration, as is:<?xml versio

Reply
Falko says:

13/02/2008 at 9:46 am

Hello,

it should be said, that this only works for ESX 2.x because VI 3 uses another [url=http://www.vmware.com/download/download.do?downloadGroup=VI-PT-1-5]API[/url]

Has someone seen, a fencing agent which uses the new VI3 API?

Regards

Falko

Reply
Ez-Aton says:

13/02/2008 at 10:09 am

Hi Falko.

I have never tested it on ESX. I use (almost) VMware-Server almost solemnly, so there was not a chance.

You can hack it to actually run a script which does ssh to the ESX server (although for VI you might want to query on which physical server it is) and power off the server. Still, maintaining the same command-line switches and parameters, and you should be good to go on RHCS.

Ez

Reply
janont says:

19/08/2008 at 11:51 am

I’m using VI Perl Toolkit + some part of script from fence_vmware to fence virtual m/c by RHCS. A script will connect to VI server and reset or poweroff v/m if need.

Reply
Ez Aton says:

20/08/2008 at 7:08 am

Can you attach your script here? I would love to have it online (for the next time I get to use VI)

Ez

Reply
Pingback: Fencing Red Hat Cluster & VMWare server « Slimejuggernaut
Pingback: OpenOffice Download
Pingback: Redhat Cluster in Virtual Environment
Yobi says:

11/03/2012 at 10:05 am

Hello ez.

One small question :

Why the vmname is configured as a “Windows” style path (C: bla bla) ?

Thanks

Reply
1. ez-aton says:
  
  12/03/2012 at 10:54 pm
  
  This is because this particular system was on Vmware Workstation on Windows, and thus, the locations of the vmx files were ‘windows’…
  This custom agent is no longer needed with VMWare ESXi4 and above, as the new agent makes use of the http(s) and API access methods, and thus – probably more resilient to changes (no one is thrilled to perform API changes just like that).
  
  Ez
  
  Reply
Massimiliano Adamo says:

04/02/2013 at 11:59 am

Oracle RAC that you have cited (or IBM GPFS), don’t need a fence device, as it’s a clustered filesystem
On a clustered filesystem all nodes can acces the data at the same time…. therefore is no nedd to kick out a node from the cluster.

Reply
1. ez-aton says:
  
  05/02/2013 at 5:53 am
  
  You are mistaken, of course. All clustered filesystems enforce an internal fencing mechanism. Oracle OCFS2 reacts badly to communication failures, bu freezing all IO for a defined period of time (can you imaging 60 seconds IO freeze on a running DB? And what happens if it’s on the voting disk?) until a node ejects out of the cluster. The eject process usually involves self-fence of the node – aka – a reboot. Very common for self-fence. By the way, lack of access to the voting disk for a too long period of time can result, just the same, with a node forced reboot.
  
  Oracle ACFS makes use of the ASM locking mechanism. This locking mechanism is making use of the Oracle Cluster voting disk, and again – with split, some of the nodes might reset.
  
  Veritas Clusters, when using shared filesystem such as the shared option ov VxFS, use DG-based locking and reservation as a mechanism of fencing. A node can block the disk access of another node(s) using DG lock or SCSI reservation, and this, in turn, triggers the internal cluster logic regarding access to the witness disks.
  
  Nodes do get kicked out of a cluster. All and every cluster, as far as I know. However, while the underline methods might differ, the basic logic remains quite the same. RHCS does hardware fencing. Oracle RAC today performs self-fencing, but (and this is important!) later versions (11.2.0.2 and above, if I’m not mistaken) allow you to define IPMI port for the cluster nodes. Why would the cluster require the addresses of the IPMI ports, you might wonder? The answer is that the cluster might use them to complete a fencing operation.
  
  Cheers!
  Ez
  
  Reply
Vitaliy says:

14/04/2013 at 10:47 am

Hello.

Erorrs during installiation:

Running Mkbootstrap for VMware::VmPerl ()
vmcontrol.o: could not read symbols: File in wrong format
collect2: ld returned 1 exit status

I have rhel 6.3 x64_86, maybe through х64 architectory?

Reply
1. ez-aton says:
  
  22/06/2013 at 9:55 am
  
  Check out this:
  http://blog.network-outsourcing.de/2012/06/04/howto-install-vmware-perl-sdk-on-centos-6/
  
  Ez
  
  Reply
Md. Mudassir Mustafa says:

18/04/2013 at 11:32 am

Hi ,

I was configuring RHEL5 clustering on vmware workstation ( for testing). and following your steps for fencing device configuration. But when i was trying to install the ‘VMware Perl API’ the installer was giving me the error::
“Unable to compile the VMware VmPerl Scripting API.

********
The VMware VmPerl Scripting API was not installed. Errors encountered during
compilation and installation of the module can be found here: /tmp/api-config3

You will not be able to use the “vmware-cmd” program.”

My setup is like that::

host machine:: Oracle Linux 5.
Then VMWare Workstation 7.1
On VM Workstation i have two rhel5.6 nodes on which i was trying to create a cluster.

Can you suggest ,

Reply
1. ez-aton says:
  
  22/06/2013 at 9:52 am
  
  Did you check here:
  https://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vcli.getstart.doc_50%2Fcli_install.4.4.html
  
  Ez
  
  Reply
arka says:

10/01/2014 at 1:49 pm

will it run for vmware workstation?

Reply
1. ez-aton says:
  
  01/02/2014 at 11:16 am
  
  I never tested it, but I think it will.
  
  Ez
  
  Reply

Related posts:

Similar Posts

22 Comments

Leave a Reply Cancel reply