Archive for the ‘Clusters’ Category

Oracle Clusterware as a 3rd party HA framework

Friday, June 12th, 2009

Oracle begin to push their Clusterware as a 3rd party HA framework. In this article we will review a quick example of how to do it. I will refer to this post as a quick-guide, as this is by no means any full-scale guide.

This article assumes you have installed Oracle Clusterware following one of the few links and guides available on the net. This quick-guide applies to both Clusterware 10 and Clusterware 11.

We will discuss the method of adding an additional NFS service on Linux.

In order to do so, you will need a shared storage – assuming the goal of the exercise is to supply the clients with a consistent storage services based on NFS. I, for myself, prefer to use OCFS2 as the choice file system for shared disks. This goes well with Oracle Clusterware, as this cluster framework does not handle disk mounts very well, and unless you are to write/search an agent which will make sure that every mount and umount behave correctly (you wouldn’t want to get a file system corruption, would you?), you will probably prefer to do the same. The lack of need to manage the disk mount actions will both save time on planned failover, and will guarantee storage safety. If you have not placed your CRS and Vote on OCFS2, you will need to install OCFS2 from here and here, and then to configure it. We will not discuss OCFS2 configuration in this post.

We will need to assume the following prerequisites:

  • Service-related IP address: 1.2.3.4. Netmask 255.255.255.248. You need this IP to be member of the same class as your public network card is.
  • Shared Storage: Formatted to OCFS2, and mounted on both nodes on /shared
  • Oracle Clusterware installed and working
  • Cluster nodes names are “node1” and “node2”
  • Have $CRS_HOME point to your CRS installation
  • Have $CRS_HOME/bin in your $PATH

We need to create the service-related IP resource first. I would recommend to have an entry in /etc/hosts for this IP address on both nodes. Assuming the public NIC is eth0, The command would be

crs_profile -create nfs_ip -t application -a $CRS_HOME/bin/usrvip -o oi=eth0,ov=1.2.3.4,on=255.255.255.248

Now you will need to set running permissions for the oracle user. In my case, the user name is actually “oracle”:

crs_setperm nfs_ip -o root
crs_serperm nfs_ip -u user:oracle:r-x

Test that you can start the service as the oracle user:

crs_start nfs_ip

Now we need to setup NFS. For this to work, we need to setup the NFS daemon first. Edit /etc/exports and add a line such as this:

/shared *(rw,no_root_sqush,sync)

Make sure that nfs service is disabled during startup:

chkconfig nfs off
chkconfig nfslock off

Now is the time to setup Oracle Clusterware for the task:

crs_profile -create share_nfs -t application -B /etc/init.d/nfs -d “Shared NFS” -r nfs_ip -a sharenfs.scr -p favored -h “node1 node2” -o ci=30,ft=3,fi=12,ra=5
crs_register share_nfs

Deal with permissions:

crs_setperms share_nfs -o root
crs_setperms share_nfs -u user:oracle:r-x

Fix the “sharenfs.scr” script. First, find it. It should reside in $CRS_HOME/crs/scripts if everything is OK. If not, you will be able to find it in $CRS_HOME using find.

Edit the “sharenfs.scr” script and modify the following variables which are defined relatively in the beginning of the script:

PROBE_PROCS=”nfsd”
START_APPCMD=”/etc/init.d/nfs start
START_APPCMD2=”/etc/init.d/nfslock start”
STOP_APPCMD=”/etc/init.d/nfs stop”
STOP_APPCMD2=”/etc/init.d/nfslock stop”

Copy the modified script file to the other node. Verify this script has execution permissions on both nodes.

Start the service as the oracle user:

crs_start sharenfs

Test the service. The following command should return the export path:

showmount -e 1.2.3.4

Relocate the service and test again:

crs_relocate -f sharenfs
showmount -e 1.2.3.4

Done. You now have HA NFS service above Oracle Clusterware framework.

I used this web page as a reference. I thank him for his great work!

RedHat Cluster custom Oracle “Agent”/script V1.0

Friday, April 24th, 2009

Working with RH Cluster quite a lot, I have decided to create an online store of customer agents/scripts.

I have not, so far, invested the effort of making these agents accept settings from the cluster.conf file, but this might happen.

Let the library be!

Oracle DB script/agent:

Although I discovered (a bit late) that RH Cluster for Oracle Ent. Linux 5.2 does include oracle DB agent, this script should be good enough for RHEL4 RH Cluster versions as well.

This script only checks that the ‘smon’ process is up. Nothing fancy. This script can include, in the future, the ability to check that Oracle responses to SQL queries (meaning – actually working).

#!/bin/bash
#Service script for Oracle DB under RH Cluster
#Written by Ez-Aton
#http://run.tournament.org.il

# Global variables
ORACLE_USER=oracle
HOMEDIR=/home/$ORACLE_USER
OVERRIDE_FILE=/var/tmp/oracle_override
REC_LIST="[email protected]"

function override () {
	if [ -f $OVERRIDE_FILE ]
	then
		exit 0
	fi
}

function start () {
	su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; sqlplus / as sysdba << EOF
startup
EOF
"
	status
}

function stop () {
	su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; sqlplus / as sysdba << EOF
shutdown immediate
EOF
"
	status && return 1 || return 0
}

function status () {
	ps -afu $ORACLE_USER | grep -v grep | grep smon
	return $?
}

function notify () {
	mail -s "$1 oracle on `hostname`" $REC_LIST < /dev/null
}

override
case "$1" in
start)	start
	notify $1
	;;
stop)	stop
#	notify $1
	;;
status)	status
	;;
*)	echo "Usage: $0 start|stop|status"
	;;
esac

I usually place this script (with execution permissions, of course) in /usr/local/sbin and call it as a “script” from the cluster configuration. You will probably be required to alter the first few variable lines to match to your environment.

Listener Agent/script:

The tnslsnr should be started/stopped as well, if we want the $ORACLE_HOME to migrate as well. This is its agent/script:

#!/bin/bash
#Service script for Oracle DB under RH Cluster
#Written by Ez-Aton
#http://run.tournament.org.il

ORACLE_USER=oracle
HOMEDIR=/home/$ORACLE_USER
OVERRIDE_FILE=/var/tmp/oracle_override

function override () {
if [ -f $OVERRIDE_FILE ]
then
exit 0
fi
}

function start () {
su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; lsnrctl start"
status
}

function stop () {
su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; lsnrctl stop"
status && return 1 || return 0
}

function status () {
su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; lsnrctl status"
}

override
case "$1" in
start)    start
;;
stop)    stop
;;
status)    status
;;
*)    echo "Usage: $0 start|stop|status"
;;
esac

Again – place it in /usr/local/sbin and call it from the cluster configuration file as type “script”.

I will add more agents and more resources for RedHat Cluster in the future.

Tips and tricks for Redhat Cluster

Saturday, January 31st, 2009

Redhat Cluster is a nice HA product. I have been implementing it for a while now, lecturing about it, and yes – I like it. But like any other software product, it has few flaws and issues which you should take under consideration – especially when you create custom “agents” – plugins to control (start/stop/status) your 3rd party application.

I want to list several tips and good practices which will help you create your own agent or custom script, and will help you sleep better at night.

Nighty Night: Sleeping is easier when your cluster is quiet. It usually means that  you don’t want the cluster to suddenly failover during night time, or – for that matter, during any hour, unexpectedly.
Below are some tips to help you sleep better, or to perform an easier postmortem of any cluster failure.

Chop the Logs: Since RedHat Cluster logging might be hidden and filled with lots of irrelevant information, you want your agents to be nice about it. Let them log out somewhere the result of running “status” or “stop” or even “start”. Of course – either recycle the output logs, or rotate them away. You could use

exec &>/tmp/my_script_name.out

much like HACMP does (or at least – behaves as if it does). You can also use specific logging facility for different subsystems of the cluster (cman, rg, qdiskd)

Mind the Gap: Don’t trust unknown scripts or applications’ return codes. Your cluster will fail miserably if a script or a file you expect to run will not be there. Do not automatically assume that the vmware script, for example, will return normal values. Check the return codes and decide how to respond accordingly.

Speak the Language: A service in RedHat Cluster is a combination of one or more resources. This can be somewhat confusing as we tend to refer to resources as a (system) service. Use the correct lingo.  I will try to do just that in this document, so heed the difference between the terms “service” and “system service”, which can be a cluster resource.

Divide and Conquer: Split your services to the minimal set of resources possible. If your service consists of hundreds of resources failure to one of them could cause the entire service to restart, taking down all other working resources. If you keep it to the minimum, you actually protect yourself.

Trust No One: To stress out the “Mind the Gap” point above – don’t trust 3rd party scripts or applications to return a correct error code. Don’t trust their configuration files, and don’t trust the users to “do it right”. They will not. Try to create your own services as fault-protected as possible. Don’t crash because some stupid user (or a stupid administrator – for a cluster implementer both are the same, right?) used incorrect input parameters, or because he has kept  an important configuration file in a different name than was required.

I have some special things I want to do with regard to RedHat Cluster Suite. Stay tuned 🙂

Redhat Cluster NFS client service – things to notice

Friday, January 16th, 2009

I encountered an interesting bug/feature of RHCS on RHEL4.

A snip of my configuration looks like this:

<resources>
    <fs device="/dev/mapper/mpath6p1" force_fsck="1" force_umount="1" fstype="ext3" name="share_prd" mountpint="/share_prd" options="" self_fence="0" fsid="02001"/>
    <nfsexport name="nfs export4"/>
    <nfsclient name="all ro" target="192.168.0.0/255.255.255.0" options="ro,no_root_sqush,sync"/>
    <nfsclient name="app1" target="app1" options="rw,no_root_squash,sync"/>
</resources>

<service autostart="1" domain="prd" name="prd" nfslock="1">
    <fs ref="share_prd">
       <nfsexport ref="nfs export 4">
          <nfsclient ref="all ro"/>
          <nfsclient ref="app1"/>
       </nfsexport>
    </fs>
</service>

This setup was working just fine, until a glitch in the DNS occurred.This glitch resulted in inability to resolve names (which were not present inside /etc/hosts at this time), and lead to a failover with the following error:

clurgmgrd: [7941]: <err> nfsclient:app1 is missing!

All range-based nfsclient agents seemed to function correctly. I could manage to look into it only a while later (after setting simple range-based allow-all access), and through some googling, I found out this explanation – it was a change of how the agent responds to “status” command.

I should have looked inside /var/lib/nfs/etab and see that app1 server appeared with its full name. I changed the resource settings to reflect it:

<nfsclient name="app1" target="app1.mydomain.org" options="rw,no_root_squash,sync"/>

and it seems to work just fine now.

Persistent raw devices for Oracle RAC with iSCSI

Saturday, December 6th, 2008

If you’re into Oracle RAC over iSCSI, you should be rather content – this configuration is a simple and supported. However, working with some iSCSI target devices, order and naming is not consistent between both Oracle nodes.

The simple solutions are by using OCFS2 labels, or by using ASM, however, if you decide to place your voting disks and cluster registry disks on raw devices, you are to face a problem.

iSCSI on RHEL5:

There are few guides, but the simple method is this:

  1. Configure mapping in your iSCSI target device
  2. Start the iscsid and iscsi services on your Linux
    • service iscsi start
    • service iscsid start
    • chkconfig iscsi on
    • chkconfig iscsid on
  3. Run “iscsiadm -m discovery -t st -p target-IP
  4. Run “iscsiadm -m node -L all
  5. Edit /etc/iscsi/send_targets and add to it the IP address of the target for automatic login on restart

You need to configure partitioning according to the requirements.

If you are to setup OCFS2 volumes for the voting and for the cluster registry, there should not be a problem as long as you use labels, however, if you require raw volumes, you need to change udev to create your raw devices for you.

On a system with persistent disk naming, follow this process, however, on a system with changing disk names (every reboot names are different), the process can become a bit more complex.

First, detect your scsi_id for each device. While names might change upon reboots, scsi_ids do not.

scsi_id -g -u -s /block/sdc

Replace sda with the device name you are looking for. Notice that /block/sda is a reference to /sys/block/sdc

Use the scsi_id generated by that to create the raw devices. Edit /etc/udev/rules.d/50-udev.rules and find line 298. Add a line below with the following contents:

KERNEL==”sd*[0-9]”, ENV{ID_SERIAL}==”14f70656e66696c000000000004000000010f00000e000000″, SYMLINK+=”disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n” ACTION==”add” RUN+=”/bin/raw /dev/raw/raw%n %N”

Things to notice:

  1. The ENV{ID_SERIAL} is the same scsi_id obtained earlier
  2. This line will create a raw device in the name of raw and number in /dev/raw for each partition
  3. If you want to differtiate between two (or more) disks, change the name from raw to an aduqate name, like “crsa”, “crsb”, etc, for example:

KERNEL==”sd*[0-9]”, ENV{ID_SERIAL}==”14f70656e66696c000000000005000000010f00000e000000″, SYMLINK+=”disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n” ACTION==”add” RUN+=”/bin/raw /dev/raw/crs%n %N”

Following these changes, run “udevtrigger” to reload the rules. Be advised that “udevtrigger” might reset network connection.