RedHat 4 working cluster (on VMware) config
I have been struggling with RH Cluster 4 with VMware fencing device. This was also a good experiance with qdiskd, the Disk Quorum directive and utilization. I have several conclusions out of this experience. First, the configuration, as is:
<?xml version=”1.0″?>
<cluster alias=”alpha_cluster” config_version=”17″ name=”alpha_cluster”>
<quorumd interval=”1″ label=”Qdisk1″ min_score=”3″ tko=”10″ votes=”3″>
<heuristic interval=”2″ program=”ping vm-server -c1 -t1″ score=”10″/>
</quorumd>
<fence_daemon post_fail_delay=”0″ post_join_delay=”3″/>
<clusternodes>
<clusternode name=”clusnode1″ nodeid=”1″ votes=”1″>
<multicast addr=”224.0.0.10″ interface=”eth0″/>
<fence>
<method name=”1″>
<device name=”vmware”
port=”/vmware/CLUSTER/Node1/Node1.vmx”/>
</method>
</fence>
</clusternode>
<clusternode name=”clusnode2″ nodeid=”2″ votes=”1″>
<multicast addr=”224.0.0.10″ interface=”eth0″/>
<fence>
<method name=”1″>
<device name=”vmware”
port=”/vmware/CLUSTER/Node2/Node2.vmx”/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman>
<multicast addr=”224.0.0.10″/>
</cman>
<fencedevices>
<fencedevice agent=”fence_vmware” ipaddr=”vm-server” login=”cluster”
name=”vmware” passwd=”clusterpwd”/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name=”cluster_domain” ordered=”1″ restricted=”1″>
<failoverdomainnode name=”clusnode1″ priority=”1″/>
<failoverdomainnode name=”clusnode2″ priority=”1″/>
</failoverdomain>
</failoverdomains>
<resources>
<fs device=”/dev/sdb2″ force_fsck=”1″ force_unmount=”1″ fsid=”62307″
fstype=”ext3″ mountpoint=”/mnt/sdb1″ name=”data”
options=”” self_fence=”1″/>
<ip address=”10.100.1.8″ monitor_link=”1″/>
<script file=”/usr/local/script.sh” name=”My_Script”/>
</resources>
<service autostart=”1″ domain=”cluster_domain” name=”Test_srv”>
<fs ref=”data”>
<ip ref=”10.100.1.8″>
<script ref=”My_Script”/>
</ip>
</fs>
</service>
</rm>
</cluster>
Several notes:
- You should run mkqdisk -c /dev/sdb1 -l Qdisk1 (or whatever device is for your quorum disk)
- qdiskd should be added to the chkconfig db (chkconfig –add qdiskd)
- qdiskd order should be changed from 22 to 20, so it precedes cman
- Changes to fence_vmware according to the past directives, including Yoni’s comment for RH4
- Changes in structure. Instead of using two fence devices, I use only one fence device but with different “ports”. A port is translated to “-n” in fence_vmware, just as it is being translated to “-n” in fence_brocade – fenced translates it
- lock_gulmd should be turned off using chkconfig
A little about command-line version change:
When you update the cluster.conf file, it is not enough to update the ccsd using “ccs_tool update /etc/cluster/cluster.conf“, but you also need to understand that cman is still on the older version. Using “cman_tool version -r <new version>“, you can force it to allow other nodes to join after a reboot, when they’re using the latest config version. If you fail to do it, other nodes might be rejected.
I will add additional information as I move along.