Posts Tagged ‘Red Hat’

Oracle RAC with EMC iSCSI Storage Panics

Tuesday, October 14th, 2008

I have had a system panicking when running the mentioned below configuration:

  • RedHat RHEL 4 Update 6 (4.6) 64bit (x86_64)
  • Dell PowerEdge servers
  • Oracle RAC 11g with Clusterware 11g
  • EMC iSCSI storage
  • EMC PowerPate
  • Vote and Registry LUNs are accessible as raw devices
  • Data files are accessible through ASM with libASM

During reboots or shutdowns, the system used to panic almost before the actual power cycle. Unfortunately, I do not have a screen capture of the panic…

Tracing the problem, it seems that iSCSI, PowerIscsi (EMC PowerPath for iSCSI) and networking services are being brought down before “killall” service stops the CRS.

The service file init.crs was never to be executed with a “stop” flag by the start-stop of services, as it never left a lock file (for example, in /var/lock/subsys), and thus, its existence in /etc/rc.d/rc6.d and /etc/rc.d/rc0.d is merely a fake.

I have solved it by changing /etc/init.d/init.crs script a bit:

  • On “Start” action, touch a file called /var/lock/subsys/init.crs
  • On “Stop” action, remove a file called /var/lock/subsys/init.crs

Also, although I’m not sure about its necessity, I have changed init.crs script SYSV execution order in /etc/rc.d/rc0.d and /etc/rc.d/rc6.d from wherever it was (K96 in one case and K76 on another) to K01, so it would be executed with the “stop” parameter early during shutdown or reboot cycle.

It solved the problem, although future upgrades to Oracle ClusterWare will require being aware of this change.

Bonding in RedHat RHEL4

Sunday, July 20th, 2008

This is a rather common knowledge now that on RHEL4 you need to state inside /etc/modprobe.conf the following line, when you want more than one bonding interfaces:

options bonding max_bonds=2

Then you attempt to use a trick to address different bonding devices with their name (aka, bond0 and bond1, and maybe bond2, etc), using an option as follows in your /etc/modprobe.conf:

options bond1 -o bond1 miimon=100

It works perfectly fine, until you try to set different parameters for your bonding devices, such as in this example (again, from /etc/modprobe.conf):

options bond0 -o bond0 mode=1 miimon=100
options bond1 -o bond1 mode=1 arp_validate=1 arp_ip_target=1.2.3.4 arp_interval=1000

These different options will not work. The 2nd (and all next) bonding devices will use bond0’s settings.

A note about this can be found in /usr/share/doc/kernel-doc-2.6.9/Documentation/networking/bonding.txt (requires the package “kernel-doc”):

NOTE: It has been observed that some Red Hat supplied kernels are apparently unable to rename modules at load time (the “-o bond1” part).  Attempts to pass that option to modprobe will produce an “Operation not permitted” error.  This has been reported on some Fedora Core kernels, and has been seen on RHEL 4 as well.  On kernels exhibiting this problem, it will be impossible to configure multiple bonds with differing parameters.

Without the ability to rename modules, we are unable to set, through /etc/modprobe.conf any bond-specific options.

An option which cannot be found in /usr/share/doc/initscripts-7.93.31.EL/sysconfig.txt (part of the “initscripts” package) is to remove any bond-specific parameters from /etc/modprobe.conf and to add to /etc/sysconfig/network-scripts/ifcfg-bondX a line as follows:

BONDING_MODULE_OPTS=’miimon=100 primary=eth0′

Here you can state your bonding options, and when you will restart your networking (provided you actually unload the “bonding” module during that process), your bonds will behave as you expect them to.

A small thing I need to confirm yet is the behavior of the bonding device if settings are changed without unloading the “bonding” module between the ifdown and ifup commands.

RHEL4 tends to change network interfaces names

Friday, July 6th, 2007

RHEL4 tends to change the names of network cards when there are more than one. If you had a NIC called eth0 during install time, it doesn’t mean that it will maintain that name after the first reboot. It could switch names with its friend, and be called now eth1, while the previous eth1 name is now eth0.

A solution using udev was posted in HPs forums, and can be reached directly through here. I will quote it:

Device persistence can also be enabled to ensure that the NICs identifying themselves as eth1, eth2, etc… always remain on the same hardware ports in case of a failure of a single NIC port. You don’t want your eth names to shift.

Upgrading to udev-095 from udev-039 that ships with RHEL4 is the smoothest solution, but that wasn’t an option for me. Using names other than eth0 – eth3 also wasn’t an option for me. Here is what we ended up using to get around udev-039’s inability to re-use eth0-ethx names.

Create a udev rule using YOUR MACS
Create an /etc/mactab file using YOUR MACS
Modify /etc/init.d/network to run nameif

/etc/udev/rules.d/20-net.rules
——————————–
KERNEL=”eth*”, SYSFS{address}=”00:0b:cd:69:c3:66″, NAME=”NIC1″
KERNEL=”eth*”, SYSFS{address}=”00:0b:cd:69:c3:65″, NAME=”NIC2″
KERNEL=”eth*”, SYSFS{address}=”00:11:0a:17:66:26″, NAME=”NIC3″
KERNEL=”eth*”, SYSFS{address}=”00:11:0a:17:66:27″, NAME=”NIC4″

/etc/mactab
————-
eth0 00:0b:cd:69:c3:66
eth1 00:0b:cd:69:c3:65
eth2 00:11:0a:17:66:26
eth3 00:11:0a:17:66:27

/etc/init.d/network
————————
(add right after
>>
# Check that networking is up.
[ “${NETWORKING}” = “no” ] && exit 0
<<)

# RDD: add ‘nameif’ usage; uses /etc/mactab
nameif || echo “nameif: reports error”

Hot-Adding SAN lun to Linux (RH with Qlogic drivers)

Monday, September 18th, 2006

cat /proc/scsi/qla2xxx/$Z” where Z represents the SCSI interface the Qlogic has taken for itself, you’ll get something like this:

<Snip>

.

.

</Snip>

SCSI LUN Information:
(Id:Lun) * – indicates lun is not registered with the OS.
( 0: 0): Total reqs 63185608, Pending reqs 0, flags 0x2, 0:0:81 00

Assuming you’ve just added the next LUN, in our case, LUN1, after reboot you would get an additional line below such as:

( 0: 1): Total reqs 1923, Pending reqs 0, flags 0x2, 0:0:81 00

However, on a production server we want to add this line without a reboot.

To achieve this goal, we need to run the following command:

“echo “scsi-qlascan” > /proc/scsi/qla2xxx/$Z” where $Z represents, like before, the SCSI interface.

Then you get the additional line(s) in the file. Now you should help your Linux see them (and attach a module to them). You can do it by using the following convention (taken from here): Using “dmesg“.

you can obtain the required details for the next stage: Controller, Channel, Target and LUN. Example:

Host: scsi2 Channel: 00 Id: 00 Lun: 01
Vendor: IBM Model: 1742 Rev: 0520
Type: Direct-Access ANSI SCSI revision: 03

Obtain the following details:

Controller=2 (scsi2)

Channel=0 (Channel: 00)

Target=0 (Id: 00)

LUN=1 (Lun: 01)

We will ask Linux nicely to reattach the device. Replace the descriptors with the numeric values

echo “scsi add-single-device Controller Channel Target LUN” > /proc/scsi/scsi”

In our example: “echo “scsi add-single-device 2 0 0 1” > /proc/scsi/scsi

To remove a device prior to unmapping it from the SAN, replace add-single-device with “remove-single-device”.

This post’s Qlogic discovery was the insight of a friend of mine, and the credit is his 🙂