Oracle RAC with EMC iSCSI Storage Panics

Tuesday, October 14th, 2008

I have had a system panicking when running the mentioned below configuration:

  • RedHat RHEL 4 Update 6 (4.6) 64bit (x86_64)
  • Dell PowerEdge servers
  • Oracle RAC 11g with Clusterware 11g
  • EMC iSCSI storage
  • EMC PowerPate
  • Vote and Registry LUNs are accessible as raw devices
  • Data files are accessible through ASM with libASM

During reboots or shutdowns, the system used to panic almost before the actual power cycle. Unfortunately, I do not have a screen capture of the panic…

Tracing the problem, it seems that iSCSI, PowerIscsi (EMC PowerPath for iSCSI) and networking services are being brought down before “killall” service stops the CRS.

The service file was never to be executed with a “stop” flag by the start-stop of services, as it never left a lock file (for example, in /var/lock/subsys), and thus, its existence in /etc/rc.d/rc6.d and /etc/rc.d/rc0.d is merely a fake.

I have solved it by changing /etc/init.d/ script a bit:

  • On “Start” action, touch a file called /var/lock/subsys/
  • On “Stop” action, remove a file called /var/lock/subsys/

Also, although I’m not sure about its necessity, I have changed script SYSV execution order in /etc/rc.d/rc0.d and /etc/rc.d/rc6.d from wherever it was (K96 in one case and K76 on another) to K01, so it would be executed with the “stop” parameter early during shutdown or reboot cycle.

It solved the problem, although future upgrades to Oracle ClusterWare will require being aware of this change.

Hot adding Qlogic LUNs – the new method

Friday, August 8th, 2008

I have demonstrated how to hot-add LUNs to a Linux system with Qlogic HBA. This has become irrelevant with the newer method, available for RHEL4 Update 3 and above.

The new method is as follow:

echo 1 > /sys/class/fc_host/host<ID>/issue_lip
echo “—” > /sys/class/scsi_host/host<ID>/scan

Replace “<ID>” with your relevant HBA ID.

Notice – due to the blog formatting, the 2nd line might appear incorrect – these are three dashes, and not some Unicode specialy formatted dash.

HP EVA bug – Snapshot removed through sssu is still there

Friday, May 2nd, 2008

This is an interesting bug I have encountered:

The output of an sssu command should look like this:



It still leaves the snapshot (SNAP_ORACLE in this case) visible, until the web interface is used to press on “Ok”.

This happened to me on HP EVA with HP StorageWorks Command View EVA 7.0 build 17.

When sequential delete command is given, it looks like this:


Error: Error cannot get object properties. [ Deletion completed]


When this command is given for a non-existing snapshot, it looks like this:


Error: Virtual DisksLinuxoracleSNAP_ORACLE not found

So I run the removal command twice (scripted) on an sssu session without “halt_on_errors”. This removes the snapshots correctly.

Fabric Mess, or how to do things right

Tuesday, May 29th, 2007

When a company is about to relocate to a new floor or a new building, that is the time when the little piles of dirt swept under the rug come back to hunt you.

In several companies I have been to, I have stressed the need of an ordered environment. This is valid to networking and hardware serial numbers as well as it is valid for FC hardware, but when it comes to FC, things always get somewhat more complicated, and when the fit hit the shan, the time you require for tracking down a single faulty cable, or a link led turned off is a time you need for other things.

I can sum it up to a single sentence – Keep you SAN tidy.

Unless you have planned your entire future SAN deployment ahead (and this can be planning ahead for years and years), your SAN environment will grow up. Unlike network, where short cable disconnections have only small influence on the overall status of a server (and this allows you to tidy up network cables on the fly after rush hours when traffic volume is low – without downtime), tidying up FC cables is a matter for a planned downtime, and let me see the high-level manager who would approve 30 minutes downtime (at least) with some risk (as there is always when changing cables) for "tidying up"…

So, your SAN looks like this, and this is the case, and this can be considered quite good:

Cables length is always an issue

You can track down cables, but it requires time, and time is an issue when there is a problem, or when changes are to take place. Wait – which server is connected to switch1 port 12? Donno?

The magic is, like most magic trick, non magical at all. Keep track of every cable, every path, every detail. You will not be sorry.

I have found out that the spreadsheet with the following columns would do the work for me, and I’ve been to some quire large SAN sites:

1. Switch name and port

2. Server Name (if server has multiple FC ports, add 0, 1, etc. Select a fixed convention for directions, for example – 0 is the leftmost, when you look directly at the back of the server). Same goes for storage devices.

3. SAN Zone of VSAN, if valid.

4. Patch panel port. If you go through several patch panels, write down all of them, one after the other.

5. Server’s port PWWN

On another spreadsheet I have the following information:

1. Server name

2. Storage ports accessible to it (using the same convention as mentioned above)

3. LUN ID on the storage described above.

If you keep these two spreadsheets up-to-date, you will be able to find your hands and legs anytime, anywhere in your own SAN. Maintaining the spreadsheet is the actual wizardry in all this.

Additional tip – If your company relocates to another building, and your SAN is pretty much fixed and known, hiring a person to manufacture by-the-length FC cables per device can be one of the greatest things you could do. If every cable has its own exact length, your SAN environment would look much better. This is a tip for the lucky ones. Most of us are not luck enough to either affort such a person, or to relocate with a never-changing SAN environment.

HP MSA1000 controller failover

Tuesday, March 27th, 2007

HP MSA1000 is an entry-level disk storage capable of communicating via different types of interfaces, such as SCSI and FC, and can allow FC failover. This FC failover, however, is controller failover and not path failover. It means that if the primary controller fails entirely, the backup controller will “kick in”. However, if a multi-path capable client will fail its primary interface, there is no guarantee that communication with the disks through the backup controller.

The symptom I have encountered was that the secondary path, while exposing the disks (while the primary path was down for one of the servers) to the server, did not allow any SCSI I/O operations. This prevented the Linux server’s SCSI layer from accessing the disks. So they did appear when doing “cat /proc/scsi/scsi“, however, they were not detected using, for example, “fdisk -l“, and the system logs got filled with “SCSI Error” messages.

About a month ago, after almost two years, a new firmware update has been released (can be found here). Two versions exist – Active/Passive and Active/Active.

I have upgraded the MSA1000 storage device.

After installing the Active/Active firmware upgrade (Notice Linux users – You must have X to run the “msa1500flash” utility), and after power cycling the MSA1000 device, things start to look good.

I have tested performance with a person on-site disconnecting fiber connections on-demand, and it worked great. About 2-5 seconds failover time.

Since this system run Oracle RAC, and it uses OCFS2, I had to update the failed-node timeout to be 31 seconds (per this Oracle’s OCFS site, which includes some really good tips).

So real High Availability can be archived after upgrading MSA1000 firmware.