HP MSA1000 controller failover

ByEtzion 27/03/2007

HP MSA1000 is an entry-level disk storage capable of communicating via different types of interfaces, such as SCSI and FC, and can allow FC failover. This FC failover, however, is controller failover and not path failover. It means that if the primary controller fails entirely, the backup controller will “kick in”. However, if a multi-path capable client will fail its primary interface, there is no guarantee that communication with the disks through the backup controller.

The symptom I have encountered was that the secondary path, while exposing the disks (while the primary path was down for one of the servers) to the server, did not allow any SCSI I/O operations. This prevented the Linux server’s SCSI layer from accessing the disks. So they did appear when doing “cat /proc/scsi/scsi“, however, they were not detected using, for example, “fdisk -l“, and the system logs got filled with “SCSI Error” messages.

About a month ago, after almost two years, a new firmware update has been released (can be found here). Two versions exist – Active/Passive and Active/Active.

I have upgraded the MSA1000 storage device.

After installing the Active/Active firmware upgrade (Notice Linux users – You must have X to run the “msa1500flash” utility), and after power cycling the MSA1000 device, things start to look good.

I have tested performance with a person on-site disconnecting fiber connections on-demand, and it worked great. About 2-5 seconds failover time.

Since this system run Oracle RAC, and it uses OCFS2, I had to update the failed-node timeout to be 31 seconds (per this Oracle’s OCFS site, which includes some really good tips).

So real High Availability can be archived after upgrading MSA1000 firmware.

Disk Storage | Linux

LVM Recovery
ByEtzion 29/05/2009

A friend of mine made a grieve mistake – partition a disk containing Linux LVM directly on it, without any partition table. Oops. When dealing with multi-Tera sized disks, one gets to encounter limitations not known on smaller scales – the 2TB limitation. Normal partition table can contain only around 2TB mapping, meaning that to…

Read More LVM Recovery
Linux

Installation boot of RHEL8 (network settings)
ByEtzion 10/11/202127/01/2023

This blog is my extended memory, and as such, its task is to remind me things I tend to forget, saving me the time required to search them again. So here is another one of these things. The network settings syntax for RHEL8/OEL8 or any of their compatible systems, when you want to pass these…

Read More Installation boot of RHEL8 (network settings)
General Hardware | Linux | Virtualization

x86 Scale Up
ByEtzion 11/09/2008

I have been introduced to a very cool software/hardware combination yesterday. It has been, without exaggerating, one of the coolest things I have seen in a while. As you may know, x86 has an issue with scaling up. It’s that x86 architectures and price don’t justify scaling up to tenths and hundreds of CPUs. The…

Read More x86 Scale Up
Linux | Scripting/Programming

Centreon and batch-adding hosts
ByEtzion 27/04/2009

Centreon is a nice GUI wrapper for Nagios. It is using MySQL as its configuration engine, and it functions quite well. One thing Cacti can do but Centreon can’t is mass automatic addition of servers. I have had a new site with an installed Centreon, and I wanted to add about 40 servers to be…

Read More Centreon and batch-adding hosts
Disk Storage

NetApp internals – how to add SSH keys without C$ nor NFS shares
ByEtzion 03/04/2014

This post will describe the process of placing SSH keys using the internal ‘systemshell’ command of NetApp. As always – when doing something which the vendor did not intend you to do, do it very carefully. This data was obtained from NetApp forums, and while I do not have the original post to link (I…

Read More NetApp internals – how to add SSH keys without C$ nor NFS shares
Uncategorized

Things to remember…
ByEtzion 24/10/2011

As my work takes me to various places (where technology is concerned), I collect lots of browser tab of things I want to keep for later reference. I have to admit, sadly, that I lack the time to sort them out, to make a real good and nice post about them. I do not want…

Read More Things to remember…

3 Comments

Ian Harper says:

24/05/2007 at 4:41 pm

Shalom,

I have been looking at your blog and also your entries on the Redhat Certified forum.

We also have the MSA1000 and recently two disks (which were a mirror off each other) came up with fail lights on and the Oracle database (on ASM on MSA1000) died and wouldnt recover – had to get Oracle in to recover data with thier DUL utility. Have you experienced anything like this ?

Also have you any experience of RHEL on th DL145 G3 servers ?

Finally, how easy is it to get work in Israel if your not Israeli or Jewish ?

Toda raba
Ian

Reply
Ez-Aton says:

24/05/2007 at 8:56 pm

Hi.
Answers to your questions:

1. I have avoided successfully from using ASM due to the complicated procedure required to recover data. I know a person who had created a generic application using the tnslsnr just to allow access to this data, and he is one of the better Oracle DBAs I know.

I know that hot-backup (which cannot happen in ASM env) or archive log backup can do a descent job in recovering DB, although I don’t know the way to do it (I could search for it on Google, but still – haven’t had to do it yet).

It seems odd to me that two disks failed at the same time. Maybe they failed on different times, and you didn’t monitor the storage, and therefore didn’t get a warning in time?

I have never used Oracle’s DUL utility, and have no experience with it.

I have experience with DL145. What is the problem?

Reply
sandrar says:

11/09/2009 at 12:48 am

Hi! I was surfing and found your blog post… nice! I love your blog. 🙂 Cheers! Sandra. R.

Reply

Related posts:

Similar Posts

3 Comments

Leave a Reply Cancel reply