Archive for June, 2009

Oracle Clusterware as a 3rd party HA framework

Friday, June 12th, 2009

Oracle begin to push their Clusterware as a 3rd party HA framework. In this article we will review a quick example of how to do it. I will refer to this post as a quick-guide, as this is by no means any full-scale guide.

This article assumes you have installed Oracle Clusterware following one of the few links and guides available on the net. This quick-guide applies to both Clusterware 10 and Clusterware 11.

We will discuss the method of adding an additional NFS service on Linux.

In order to do so, you will need a shared storage – assuming the goal of the exercise is to supply the clients with a consistent storage services based on NFS. I, for myself, prefer to use OCFS2 as the choice file system for shared disks. This goes well with Oracle Clusterware, as this cluster framework does not handle disk mounts very well, and unless you are to write/search an agent which will make sure that every mount and umount behave correctly (you wouldn’t want to get a file system corruption, would you?), you will probably prefer to do the same. The lack of need to manage the disk mount actions will both save time on planned failover, and will guarantee storage safety. If you have not placed your CRS and Vote on OCFS2, you will need to install OCFS2 from here and here, and then to configure it. We will not discuss OCFS2 configuration in this post.

We will need to assume the following prerequisites:

  • Service-related IP address: 1.2.3.4. Netmask 255.255.255.248. You need this IP to be member of the same class as your public network card is.
  • Shared Storage: Formatted to OCFS2, and mounted on both nodes on /shared
  • Oracle Clusterware installed and working
  • Cluster nodes names are “node1” and “node2”
  • Have $CRS_HOME point to your CRS installation
  • Have $CRS_HOME/bin in your $PATH

We need to create the service-related IP resource first. I would recommend to have an entry in /etc/hosts for this IP address on both nodes. Assuming the public NIC is eth0, The command would be

crs_profile -create nfs_ip -t application -a $CRS_HOME/bin/usrvip -o oi=eth0,ov=1.2.3.4,on=255.255.255.248

Now you will need to set running permissions for the oracle user. In my case, the user name is actually “oracle”:

crs_setperm nfs_ip -o root
crs_serperm nfs_ip -u user:oracle:r-x

Test that you can start the service as the oracle user:

crs_start nfs_ip

Now we need to setup NFS. For this to work, we need to setup the NFS daemon first. Edit /etc/exports and add a line such as this:

/shared *(rw,no_root_sqush,sync)

Make sure that nfs service is disabled during startup:

chkconfig nfs off
chkconfig nfslock off

Now is the time to setup Oracle Clusterware for the task:

crs_profile -create share_nfs -t application -B /etc/init.d/nfs -d “Shared NFS” -r nfs_ip -a sharenfs.scr -p favored -h “node1 node2” -o ci=30,ft=3,fi=12,ra=5
crs_register share_nfs

Deal with permissions:

crs_setperms share_nfs -o root
crs_setperms share_nfs -u user:oracle:r-x

Fix the “sharenfs.scr” script. First, find it. It should reside in $CRS_HOME/crs/scripts if everything is OK. If not, you will be able to find it in $CRS_HOME using find.

Edit the “sharenfs.scr” script and modify the following variables which are defined relatively in the beginning of the script:

PROBE_PROCS=”nfsd”
START_APPCMD=”/etc/init.d/nfs start
START_APPCMD2=”/etc/init.d/nfslock start”
STOP_APPCMD=”/etc/init.d/nfs stop”
STOP_APPCMD2=”/etc/init.d/nfslock stop”

Copy the modified script file to the other node. Verify this script has execution permissions on both nodes.

Start the service as the oracle user:

crs_start sharenfs

Test the service. The following command should return the export path:

showmount -e 1.2.3.4

Relocate the service and test again:

crs_relocate -f sharenfs
showmount -e 1.2.3.4

Done. You now have HA NFS service above Oracle Clusterware framework.

I used this web page as a reference. I thank him for his great work!

RHEL5 100% CPU with LDAP client for Active Directory

Friday, June 5th, 2009

ADS integration has been available natively since Windows 2003 R2, and in heterogeneous sites this has become the preferred method of integrating login information, as well as utilizing the added security of using Kerberos wherever possible.

The following guide is a very good one, and was the source of information I have used throughout my work integrating Linux into ADS. So far it has worked quite well for RHEL4.

RHEL5, on the other hand, is a different story. While it can work, and ldap queries return sensible results, it is too common for a process to utilize 100% CPU while doing absolutely nothing.

My research brought me to the following conclusions:

  • The high CPU utilization is being caused by something RHEL5 specific (tested to work correctly for RHEL4)
  • High CPU utilization is caused by nss_ldap module.
  • Yes, it does happen to every nss related service. NSCD does not help, and gets to 100% CPU also.
  • Tracing to nss_ldap modules return after a very long time (if ever) that the session to the ADS server has somehow hanged.

You can see an example of this bug in this specific bugzilla entry.

A quick and effective workaround was used after examining the differences between configuration directives for RHEL4 and RHEL5. Forcing LDAP version 2 instead of 3 (which is the default for RHEL5 ldap client, as it attempts the highest version possible) results in a correct behavior.The line in /etc/ldap.conf is:

ldap_version 2

FYI