Posts Tagged ‘rhel4’

Redhat Cluster NFS client service – things to notice

Friday, January 16th, 2009

I encountered an interesting bug/feature of RHCS on RHEL4.

A snip of my configuration looks like this:

<resources>
    <fs device="/dev/mapper/mpath6p1" force_fsck="1" force_umount="1" fstype="ext3" name="share_prd" mountpint="/share_prd" options="" self_fence="0" fsid="02001"/>
    <nfsexport name="nfs export4"/>
    <nfsclient name="all ro" target="192.168.0.0/255.255.255.0" options="ro,no_root_sqush,sync"/>
    <nfsclient name="app1" target="app1" options="rw,no_root_squash,sync"/>
</resources>

<service autostart="1" domain="prd" name="prd" nfslock="1">
    <fs ref="share_prd">
       <nfsexport ref="nfs export 4">
          <nfsclient ref="all ro"/>
          <nfsclient ref="app1"/>
       </nfsexport>
    </fs>
</service>

This setup was working just fine, until a glitch in the DNS occurred.This glitch resulted in inability to resolve names (which were not present inside /etc/hosts at this time), and lead to a failover with the following error:

clurgmgrd: [7941]: <err> nfsclient:app1 is missing!

All range-based nfsclient agents seemed to function correctly. I could manage to look into it only a while later (after setting simple range-based allow-all access), and through some googling, I found out this explanation – it was a change of how the agent responds to “status” command.

I should have looked inside /var/lib/nfs/etab and see that app1 server appeared with its full name. I changed the resource settings to reflect it:

<nfsclient name="app1" target="app1.mydomain.org" options="rw,no_root_squash,sync"/>

and it seems to work just fine now.

Oracle RAC with EMC iSCSI Storage Panics

Tuesday, October 14th, 2008

I have had a system panicking when running the mentioned below configuration:

  • RedHat RHEL 4 Update 6 (4.6) 64bit (x86_64)
  • Dell PowerEdge servers
  • Oracle RAC 11g with Clusterware 11g
  • EMC iSCSI storage
  • EMC PowerPate
  • Vote and Registry LUNs are accessible as raw devices
  • Data files are accessible through ASM with libASM

During reboots or shutdowns, the system used to panic almost before the actual power cycle. Unfortunately, I do not have a screen capture of the panic…

Tracing the problem, it seems that iSCSI, PowerIscsi (EMC PowerPath for iSCSI) and networking services are being brought down before “killall” service stops the CRS.

The service file init.crs was never to be executed with a “stop” flag by the start-stop of services, as it never left a lock file (for example, in /var/lock/subsys), and thus, its existence in /etc/rc.d/rc6.d and /etc/rc.d/rc0.d is merely a fake.

I have solved it by changing /etc/init.d/init.crs script a bit:

  • On “Start” action, touch a file called /var/lock/subsys/init.crs
  • On “Stop” action, remove a file called /var/lock/subsys/init.crs

Also, although I’m not sure about its necessity, I have changed init.crs script SYSV execution order in /etc/rc.d/rc0.d and /etc/rc.d/rc6.d from wherever it was (K96 in one case and K76 on another) to K01, so it would be executed with the “stop” parameter early during shutdown or reboot cycle.

It solved the problem, although future upgrades to Oracle ClusterWare will require being aware of this change.

Hot adding Qlogic LUNs – the new method

Friday, August 8th, 2008

I have demonstrated how to hot-add LUNs to a Linux system with Qlogic HBA. This has become irrelevant with the newer method, available for RHEL4 Update 3 and above.

The new method is as follow:

echo 1 > /sys/class/fc_host/host<ID>/issue_lip
echo “—” > /sys/class/scsi_host/host<ID>/scan

Replace “<ID>” with your relevant HBA ID.

Notice – due to the blog formatting, the 2nd line might appear incorrect – these are three dashes, and not some Unicode specialy formatted dash.

Bonding in RedHat RHEL4

Sunday, July 20th, 2008

This is a rather common knowledge now that on RHEL4 you need to state inside /etc/modprobe.conf the following line, when you want more than one bonding interfaces:

options bonding max_bonds=2

Then you attempt to use a trick to address different bonding devices with their name (aka, bond0 and bond1, and maybe bond2, etc), using an option as follows in your /etc/modprobe.conf:

options bond1 -o bond1 miimon=100

It works perfectly fine, until you try to set different parameters for your bonding devices, such as in this example (again, from /etc/modprobe.conf):

options bond0 -o bond0 mode=1 miimon=100
options bond1 -o bond1 mode=1 arp_validate=1 arp_ip_target=1.2.3.4 arp_interval=1000

These different options will not work. The 2nd (and all next) bonding devices will use bond0’s settings.

A note about this can be found in /usr/share/doc/kernel-doc-2.6.9/Documentation/networking/bonding.txt (requires the package “kernel-doc”):

NOTE: It has been observed that some Red Hat supplied kernels are apparently unable to rename modules at load time (the “-o bond1” part).  Attempts to pass that option to modprobe will produce an “Operation not permitted” error.  This has been reported on some Fedora Core kernels, and has been seen on RHEL 4 as well.  On kernels exhibiting this problem, it will be impossible to configure multiple bonds with differing parameters.

Without the ability to rename modules, we are unable to set, through /etc/modprobe.conf any bond-specific options.

An option which cannot be found in /usr/share/doc/initscripts-7.93.31.EL/sysconfig.txt (part of the “initscripts” package) is to remove any bond-specific parameters from /etc/modprobe.conf and to add to /etc/sysconfig/network-scripts/ifcfg-bondX a line as follows:

BONDING_MODULE_OPTS=’miimon=100 primary=eth0′

Here you can state your bonding options, and when you will restart your networking (provided you actually unload the “bonding” module during that process), your bonds will behave as you expect them to.

A small thing I need to confirm yet is the behavior of the bonding device if settings are changed without unloading the “bonding” module between the ifdown and ifup commands.

Vlan Tagging with bonding network interface on RHEL4

Saturday, May 17th, 2008

This is not a simple task, as there are few things which should actually happen for it to work.

First – the switch port should support vlan tagging (of course, right?)

I have used vlan2 for “external” network, and vlan3 for “internal” network.

My configuration looks like this:

ifcfg-eth0:

DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
ISALIAS=no

ifcfg-eth1:

DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
ISALIAS=no

ifcfg-bond0:

DEVICE=bond0
BOOTPROTO=none
ONBOOT=yes

ifcfg-bond0.2:

DEVICE=bond0.2
BOOTPROTO=static
IPADDR=1.2.3.4
NETMASK=255.255.255.0
ONBOOT=yes
VLAN=yes

ifcfg-bond0.3:

DEVICE=bond0.3
BOOTPROTO=static
IPADDR=192.168.0.1
NETMASK=255.255.255.0
ONBOOT=yes
VLAN=yes

I hope it helps anyone who is into vlan tagging over bonding interfaces.