Posts Tagged ‘cluster’

Protect Vmware guest under RedHat Cluster

Monday, November 17th, 2008

Most documentation on the net is about how to run a cluster-in-a-box under Vmware. Very few seem to care about protecting Vmware guests under real RedHat cluster with a shared storage.

This article is just about it. While I would not recommend using Vmware in such a setup, it has been the case, and that Vmware guest actually resides on the shared storage. To relocate it is out of the question, so migrating it together with other resources is the only valid option.

To do so, I have created a simple script which will accept start/stop/status arguments. The Vmware guest VMX is hard-coded into the script, but in an easy-to-change format. This script will attempt to freeze the Vmware guest, and only if it fails, to shut it down. Mind you that the blog’s HTML formatting might alter quotation marks into UTF-8 marks which will not be understood well by shell.

#!/bin/bash
# This script will start/stop/status VMware machine
# Written by Ez-Aton
# http://www.tournament.org.il/run

# Hardcoded. Change to match your own settings!
VMWARE="/export/vmware/hosts/Windows_XP_Professional/Windows XP Professional.vmx"
VMRUN="/usr/bin/vmrun"
TIMEOUT=60

function status () {
  # This function will return success if the VM is up
  $VMRUN list | grep "$VMWARE" &>/dev/null
  if [[ "$?" -eq "0" ]]
  then
    echo "VM is up"
    return 0
  else
    echo "VM is down"
    return 1
  fi
}

function start () {
  # This function will start the VM
  $VMRUN start "$VMWARE"
  if [[ "$?" -eq "0" ]]
  then
    echo "VM is starting"
    return 0
  else
    echo "VM failed"
    return 1
  fi
}

function stop () {
  # This function will stop the VM
  $VMRUN suspend "$VMWARE"
  for i in `seq 1 $TIMEOUT`
  do
    if status
    then
      echo
    else
      echo "VM Stopped"
      return 0
    fi
    sleep 1
  done
  $VMRUN stop "$VMWARE" soft
}

case "$1" in
start)     start
        ;;
stop)      stop
        ;;
status)   status
        ;;
esac
RET=$?

exit $RET

Since the formatting is killed by the blog, you can find the script here: vmware1

I intend on building a “real” RedHat Cluster agent script, but this should do for the time being.

Enjoy!

Raw devices for Oracle on RedHat (RHEL) 5

Tuesday, October 21st, 2008

There is a major confusion among DBAs regarding how to setup raw devices for Oracle RAC or Oracle Clusterware. This confusion is caused by the turn RedHat took in how to define raw devices.

Raw devices are actually a manifestation of character devices pointing to block devices. Character devices are non-buffered, so they act as FIFO, and have no OS cache, which is why Oracle likes them so much for Clusterware CRS and voting.

On other Unix types, commonly there are two invocations for each disk device – a block device (i.e /dev/dsk/c0d0t0s1) and a character device (i.e. /dev/rdsk/c0d0t0s1). This is not the case for Linux, and thus, a special “raw”, aka character, device is to be defined for each partition we want to participate in the cluster, either as CRS or voting disk.

On RHEL4, raw devices were setup easily using the simple and coherent file /etc/sysconfig/rawdevices, which included an internal example. On RHEL5 this is not the case, and customizing in a rather less documented method the udev subsystem is required.

Check out the source of this information, at this entry about raw devices. I will add it here, anyhow, with a slight explanation:

1. Add to /etc/udev/rules.d/60-raw.rules:

ACTION==”add”, KERNEL==”sdb1″, RUN+=”/bin/raw /dev/raw/raw1 %N”

2. To set permission (optional, but required for Oracle RAC!), create a new /etc/udev/rules.d/99-raw-perms.rules containing lines such as:

KERNEL==”raw[1-2]“, MODE=”0640″, GROUP=”oinstall”, OWNER=”oracle”

Notice this:

  1. The raw-perms.rules file name has to begin with the number 99, which defines its order during rules apply, so that it will be used after all other rules take place. Using lower numbers might cause permissions to be incorrect.
  2. The following permissions have to apply:
  • OCR Device(s): root:oinstall , mode 0640
  • Voting device(s): oracle:oinstall, mode 0666
  • You don’t have to use raw devices for ASM volumes on Linux, as the ASMLib library is very effective and easier to manage.

    RedHat 4 working cluster (on VMware) config

    Sunday, November 11th, 2007

    I have been struggling with RH Cluster 4 with VMware fencing device. This was also a good experiance with qdiskd, the Disk Quorum directive and utilization. I have several conclusions out of this experience. First, the configuration, as is:

    <?xml version=”1.0″?>
    <cluster alias=”alpha_cluster” config_version=”17″ name=”alpha_cluster”>
    <quorumd interval=”1″ label=”Qdisk1″ min_score=”3″ tko=”10″ votes=”3″>
    <heuristic interval=”2″ program=”ping vm-server -c1 -t1″ score=”10″/>
    </quorumd>
    <fence_daemon post_fail_delay=”0″ post_join_delay=”3″/>
    <clusternodes>
    <clusternode name=”clusnode1″ nodeid=”1″ votes=”1″>
    <multicast addr=”224.0.0.10″ interface=”eth0″/>
    <fence>
    <method name=”1″>
    <device name=”vmware”
    port=”/vmware/CLUSTER/Node1/Node1.vmx”/>
    </method>
    </fence>
    </clusternode>
    <clusternode name=”clusnode2″ nodeid=”2″ votes=”1″>
    <multicast addr=”224.0.0.10″ interface=”eth0″/>
    <fence>
    <method name=”1″>
    <device name=”vmware”
    port=”/vmware/CLUSTER/Node2/Node2.vmx”/>
    </method>
    </fence>
    </clusternode>
    </clusternodes>
    <cman>
    <multicast addr=”224.0.0.10″/>
    </cman>
    <fencedevices>
    <fencedevice agent=”fence_vmware” ipaddr=”vm-server” login=”cluster”
    name=”vmware” passwd=”clusterpwd”/>
    </fencedevices>
    <rm>
    <failoverdomains>
    <failoverdomain name=”cluster_domain” ordered=”1″ restricted=”1″>
    <failoverdomainnode name=”clusnode1″ priority=”1″/>
    <failoverdomainnode name=”clusnode2″ priority=”1″/>
    </failoverdomain>
    </failoverdomains>
    <resources>
    <fs device=”/dev/sdb2″ force_fsck=”1″ force_unmount=”1″ fsid=”62307″
    fstype=”ext3″ mountpoint=”/mnt/sdb1″ name=”data”
    options=”” self_fence=”1″/>
    <ip address=”10.100.1.8″ monitor_link=”1″/>
    <script file=”/usr/local/script.sh” name=”My_Script”/>
    </resources>
    <service autostart=”1″ domain=”cluster_domain” name=”Test_srv”>
    <fs ref=”data”>
    <ip ref=”10.100.1.8″>
    <script ref=”My_Script”/>
    </ip>
    </fs>
    </service>
    </rm>
    </cluster>

    Several notes:

    1. You should run mkqdisk -c /dev/sdb1 -l Qdisk1 (or whatever device is for your quorum disk)
    2. qdiskd should be added to the chkconfig db (chkconfig –add qdiskd)
    3. qdiskd order should be changed from 22 to 20, so it precedes cman
    4. Changes to fence_vmware according to the past directives, including Yoni’s comment for RH4
    5. Changes in structure. Instead of using two fence devices, I use only one fence device but with different “ports”. A port is translated to “-n” in fence_vmware, just as it is being translated to “-n” in fence_brocade – fenced translates it
    6. lock_gulmd should be turned off using chkconfig

    A little about command-line version change:

    When you update the cluster.conf file, it is not enough to update the ccsd using “ccs_tool update /etc/cluster/cluster.conf“, but you also need to understand that cman is still on the older version. Using “cman_tool version -r <new version>“, you can force it to allow other nodes to join after a reboot, when they’re using the latest config version. If you fail to do it, other nodes might be rejected.

    I will add additional information as I move along.

    Single-Node Linux Heartbeat Cluster with DRBD on Centos

    Monday, October 23rd, 2006

    The trick is simple, and many of those who deal with HA cluster get at least once to such a setup – have HA cluster without HA.

    Yep. Single node, just to make sure you know how to get this system to play.

    I have just completed it with Linux Heartbeat, and wish to share the example of a setup single-node cluster, with DRBD.

    First – get the packages.

    It took me some time, but following Linux-HA suggested download link (funny enough, it was the last place I’ve searched for it) gave me exactly what I needed. I have downloaded the following RPMS:

    heartbeat-2.0.7-1.c4.i386.rpm

    heartbeat-ldirectord-2.0.7-1.c4.i386.rpm

    heartbeat-pils-2.0.7-1.c4.i386.rpm

    heartbeat-stonith-2.0.7-1.c4.i386.rpm

    perl-Mail-POP3Client-2.17-1.c4.noarch.rpm

    perl-MailTools-1.74-1.c4.noarch.rpm

    perl-Net-IMAP-Simple-1.16-1.c4.noarch.rpm

    perl-Net-IMAP-Simple-SSL-1.3-1.c4.noarch.rpm

    I was required to add up the following RPMS:

    perl-IO-Socket-SSL-1.01-1.c4.noarch.rpm

    perl-Net-SSLeay-1.25-3.rf.i386.rpm

    perl-TimeDate-1.16-1.c4.noarch.rpm

    I have added DRBD RPMS, obtained from YUM:

    drbd-0.7.21-1.c4.i386.rpm

    kernel-module-drbd-2.6.9-42.EL-0.7.21-1.c4.i686.rpm (Note: Make sure the module version fits your kernel!)

    As soon as I finished searching for dependent RPMS, I was able to install them all in one go, and so I did.

    Configuring DRBD:

    DRBD was a tricky setup. It would not accept missing destination node, and would require me to actually lie. My /etc/drbd.conf looks as follows (thanks to the great assistance of linux-ha.org):

    resource web {
    protocol C;
    incon-degr-cmd “echo ‘!DRBD! pri on incon-degr’ | wall ; sleep 60 ; halt -f”; #Replace later with halt -f
    startup { wfc-timeout 0; degr-wfc-timeout 120; }
    disk { on-io-error detach; } # or panic, …
    syncer {
    group 0;
    rate 80M; #1Gb/s network!
    }
    on p800old {
    device /dev/drbd0;
    disk /dev/VolGroup00/drbd-src;
    address 1.2.3.4:7788; #eth0 network address!
    meta-disk /dev/VolGroup00/drbd-meta[0];
    }
    on node2 {
    device /dev/drbd0;
    disk /dev/sda1;
    address 192.168.99.2:7788; #eth0 network address!
    meta-disk /dev/sdb1[0];
    }
    }

    I have had two major problems with this setup:

    1. I had no second node, so I left this “default” as the 2nd node. I never did expect to use it.

    2. I had no free space (non-partitioned space) on my disk. Lucky enough, I tend to install Centos/RH using the installation defaults unless some special need arises, so using the power of the LVM, I have disabled swap (swapoff -a), decreased its size (lvresize -L -500M /dev/VolGroup00/LogVol01), created two logical volumes for DRBD meta and source (lvcreate -n drbd-meta -L +128M VolGroup00 && lvcreate -n drbd-src -L +300M VolGroup00), reformatted the swap (mkswap /dev/VolGroup00/LogVol01), activated the swap (swapon -a) and formatted /dev/VolGroup00/drbd-src (mke2fs -j /dev/VolGroup00/drbd-src). Thus I have now additional two volumes (the required minimum) and can operate this setup.

    Solving the space issue, I had to start DRBD for the first time. Per Linux-HA DRBD Manual, it had to be done by running the following commands:

    modprobe drbd

    drbdadm up all

    drbdadm — –do-what-I-say primary all

    This has brought the DRBD up for the first time. Now I had to turn it off, and concentrate on Heartbeat:

    drbdadm secondary all

    Heartbeat settings were as follow:

    /etc/ha.d/ha.cf:

    use_logd on #?Or should it be used?
    udpport 694
    keepalive 1 # 1 second
    deadtime 10
    initdead 120
    bcast eth0
    node p800old #`uname -n` name
    crm yes
    auto_failback off #?Or no
    compression bz2
    compression_threshold 2

    I have also created a relevant /etc/ha.d/haresources, although I’ve never used it (this file has no importance when using “crm=yes” in ha.cf). I did, however, use it as a source for /usr/lib/heartbeat/haresources2cib.py:

    p800old IPaddr::1.2.3.10/8/1.255.255.255 drbddisk::web Filesystem::/dev/drbd0::/mnt::ext3 httpd

    It is clear that the virtual IP will be 1.2.3.10 in my class A network, and DRBD would have to go up before mounting the storage. After all this, the application would kick in, and would bring up my web page. The application, Apache, was modified beforehand to use the IP 1.2.3.10:80, and to search for DocumentRoot in /mnt

    Running /usr/lib/heartbeat/haresources2cib.py on the file (no need to redirect output, as it is already directed to /var/lib/heartbeat/crm/cib.xml), and I was ready to go.

    /etc/init.d/heartbeat start (while another terminal is open with tail -f /var/log/messages), and Heartbeat is up. It took it few minutes to kick the resources up, however, I was more than happy to see it all work. Cool.

    The logic is quite simple, the idea is very basic, and as long as the system is being managed correctly, there is no reason for it to get to a dangerous state. Moreover, since we’re using DRBD, Split Brain cannot actually endanger the data, so we get compensated for the price we might pay, performance-wise, on a real two-node HA environment following these same guidelines.

    I cannot express my gratitude to http://www.linux-ha.org, which is the source of all this (adding up with some common sense). Their documents are more than required to setup a full working HA environment.

    Setting up an AIX HA-CMP High Availability test Cluster

    Tuesday, July 4th, 2006

    This post will be divided into this common view part, and (the first in this blog) "click here for more" part.

    The main reason I’ve created this blog was to document, both for myself and other technical persons, the acts required to perform set tasks.

    My first idea was to document how to install HACMP on AIX. For those of you who do not know what I’m talking about, HACMP is a general-purpose high availability cluster made by IBM, which can work on AIX, and if I’m not mistaken, on other platforms as well. It is based, actually, on a large set of "event" scripts, which run in a predefined order.

    Unlike other HA clusters, this is a fixed-order cluster. You can bring your application up after the disk has been up, and after IP has been up. You cannot change this predefined order, and you have no freedom to set this order. Unless.

    Unless you create custom scripts, and use them as pre-event and post-event, naming them correctly and putting them in the right directories.

    This is not an easy cluster to manage. It has no flashy features, it is not versatile like other HA Clusters are (VCS being the best one, to my opinion, and MSCS, despite its tendency to reach race conditions, is quite versatile itself).

    It is a hard HA Cluster, for a hard working people. It is meant for a single method of operation, and for a single track of mind. It is rather stable, if you don’t go around adding volumes to VGs (know what you want before you do it.

    Below is a step by step list of actions to do, based on my work experience. I’ve brought up two clusters (four nodes) while actually doing copy-paste into a text document of every action done by me, any package installed, etc.

    It is meant for test purposes. It is not as HA as it could be (using the same network infrastructure), it doesn’t employ all heart-bit connections it might have had – it’s meant for lab purposes, and has done quite well there.

    It was installed on P5 machines, P510, if I’m not mistaken, using FastT200 (single port) for shared storage (single logical drive, small size – about 10GB), with Storage Manager 8.2 and adequate firmware.