Archive for November, 2007

RedHat 4 working cluster (on VMware) config

Sunday, November 11th, 2007

I have been struggling with RH Cluster 4 with VMware fencing device. This was also a good experiance with qdiskd, the Disk Quorum directive and utilization. I have several conclusions out of this experience. First, the configuration, as is:

<?xml version=”1.0″?>
<cluster alias=”alpha_cluster” config_version=”17″ name=”alpha_cluster”>
<quorumd interval=”1″ label=”Qdisk1″ min_score=”3″ tko=”10″ votes=”3″>
<heuristic interval=”2″ program=”ping vm-server -c1 -t1″ score=”10″/>
</quorumd>
<fence_daemon post_fail_delay=”0″ post_join_delay=”3″/>
<clusternodes>
<clusternode name=”clusnode1″ nodeid=”1″ votes=”1″>
<multicast addr=”224.0.0.10″ interface=”eth0″/>
<fence>
<method name=”1″>
<device name=”vmware”
port=”/vmware/CLUSTER/Node1/Node1.vmx”/>
</method>
</fence>
</clusternode>
<clusternode name=”clusnode2″ nodeid=”2″ votes=”1″>
<multicast addr=”224.0.0.10″ interface=”eth0″/>
<fence>
<method name=”1″>
<device name=”vmware”
port=”/vmware/CLUSTER/Node2/Node2.vmx”/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman>
<multicast addr=”224.0.0.10″/>
</cman>
<fencedevices>
<fencedevice agent=”fence_vmware” ipaddr=”vm-server” login=”cluster”
name=”vmware” passwd=”clusterpwd”/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name=”cluster_domain” ordered=”1″ restricted=”1″>
<failoverdomainnode name=”clusnode1″ priority=”1″/>
<failoverdomainnode name=”clusnode2″ priority=”1″/>
</failoverdomain>
</failoverdomains>
<resources>
<fs device=”/dev/sdb2″ force_fsck=”1″ force_unmount=”1″ fsid=”62307″
fstype=”ext3″ mountpoint=”/mnt/sdb1″ name=”data”
options=”” self_fence=”1″/>
<ip address=”10.100.1.8″ monitor_link=”1″/>
<script file=”/usr/local/script.sh” name=”My_Script”/>
</resources>
<service autostart=”1″ domain=”cluster_domain” name=”Test_srv”>
<fs ref=”data”>
<ip ref=”10.100.1.8″>
<script ref=”My_Script”/>
</ip>
</fs>
</service>
</rm>
</cluster>

Several notes:

  1. You should run mkqdisk -c /dev/sdb1 -l Qdisk1 (or whatever device is for your quorum disk)
  2. qdiskd should be added to the chkconfig db (chkconfig –add qdiskd)
  3. qdiskd order should be changed from 22 to 20, so it precedes cman
  4. Changes to fence_vmware according to the past directives, including Yoni’s comment for RH4
  5. Changes in structure. Instead of using two fence devices, I use only one fence device but with different “ports”. A port is translated to “-n” in fence_vmware, just as it is being translated to “-n” in fence_brocade – fenced translates it
  6. lock_gulmd should be turned off using chkconfig

A little about command-line version change:

When you update the cluster.conf file, it is not enough to update the ccsd using “ccs_tool update /etc/cluster/cluster.conf“, but you also need to understand that cman is still on the older version. Using “cman_tool version -r <new version>“, you can force it to allow other nodes to join after a reboot, when they’re using the latest config version. If you fail to do it, other nodes might be rejected.

I will add additional information as I move along.

A note about VMware-Server machine security

Saturday, November 10th, 2007

VMware allow setting a virtual machine as a private machine. By doing so, it actually adds to “/etc/vmware/vm-list-private” an additional comment, stating who is the owner of the machine. For example:

cat /etc/vmware/vm-list-private
# This file is automatically generated.
# Hand-editing this file is not recommended.
config “/vmware/Centos4-01/Centos4-01.vmx|root”
config “/vmware/Centos4-02/Centos4-02.vmx|user”

While it is very effective when used with VMware-Console (the nice GUI) – you cannot see machines which are not owned by your own user (in our example – “user”). it has nothing to do with actual permissions on the machine.

Using vmware-cmd you can control machines which are not yours, and are supposed to be private. For example, using

vmware-cmd /vmware/Centos4-01/Centos4-01.vmx stop

as the user “user”, you might be able to turn it off, overriding the obvious, or so you think, permission scheme set up by VMware through the “private guest” settings done above.

This actually has to do with the permissions and ownership on the actual vmx file. To revoke the ability to control your machines or even list them by using vmware-cmd, by an unauthorized user.

The best practice I can suggest is by setting a directory for each user (for example: /vmware for production causes, /qa for QA machines, /user1 for user1 machines, etc), and granting, recursively, permissions on this directory only to the user or group who should have the ability to control these machines. That way, even “vmware-cmd -l” which lists the available guests on an host, will not be able to view guests not owned by the invoking users.

To sum things up, private guests are all about how the GUI decides if and when to display them. eXecute permissions on the vmx files will set who can actually control a guest machine.

VMware Perl SDK bug and workaround

Saturday, November 10th, 2007

During an attempt to use the VMware Perl SDK, I have encountered the following error:

VMControl Panic: SSLLoadSharedLibrary: Failed to load library /usr/bin/libcrypto.so.0.9.7:/usr/bin/libcrypto.so.0.9.7: cannot open shared object file: No such file or directory

This is weird, as it was compiled successfully on my system (Centos4), but still…

The workaround was to create two symlinks:

ln -s /usr/lib/libcrypto.so /usr/bin/libcrypto.so.0.9.7

ln -s /usr/lib/libssl.so /usr/bin/libssl.so.0.9.7

This was related to an attempt to setup VMware fencing in RH Cluster on VMware Server.