HP MSA1000 controller failover

Byetzion 27/03/2007

HP MSA1000 is an entry-level disk storage capable of communicating via different types of interfaces, such as SCSI and FC, and can allow FC failover. This FC failover, however, is controller failover and not path failover. It means that if the primary controller fails entirely, the backup controller will “kick in”. However, if a multi-path capable client will fail its primary interface, there is no guarantee that communication with the disks through the backup controller.

The symptom I have encountered was that the secondary path, while exposing the disks (while the primary path was down for one of the servers) to the server, did not allow any SCSI I/O operations. This prevented the Linux server’s SCSI layer from accessing the disks. So they did appear when doing “cat /proc/scsi/scsi“, however, they were not detected using, for example, “fdisk -l“, and the system logs got filled with “SCSI Error” messages.

About a month ago, after almost two years, a new firmware update has been released (can be found here). Two versions exist – Active/Passive and Active/Active.

I have upgraded the MSA1000 storage device.

After installing the Active/Active firmware upgrade (Notice Linux users – You must have X to run the “msa1500flash” utility), and after power cycling the MSA1000 device, things start to look good.

I have tested performance with a person on-site disconnecting fiber connections on-demand, and it worked great. About 2-5 seconds failover time.

Since this system run Oracle RAC, and it uses OCFS2, I had to update the failed-node timeout to be 31 seconds (per this Oracle’s OCFS site, which includes some really good tips).

So real High Availability can be archived after upgrading MSA1000 firmware.

Uncategorized

Broadcom (tg3) and quirks in Linux
Byetzion 02/06/2007

Most modern servers use Broadcom network cards. The module is called tg3 and is known to have issues. You can find in my blog several posts about weird problems with tg3. This post is about another one I’ve encountered only recently. The server: Dell PowerEdge (PE) 830. Linux version: RedHat Advanced Server 4 (RHEL4) Update…

Read More Broadcom (tg3) and quirks in Linux
Disk Storage | Virtualization

XenServer and its damn too small system disks
Byetzion 26/12/2013

I love XenServer. I love the product, I believe it to be a very good answer for SMBs, and enterprises. It lacks on external support, true, but the price tag for many of the ‘external capabilities’ on VMware, for instance, are very high, so many SMBs, especially, learn to live without them. XenServer gives a…

Read More XenServer and its damn too small system disks
bash | Disk Storage | Linux | Scripting/Programming

An excellent alternative to ‘rsync’ for backups to cloud
Byetzion 23/02/202323/02/2023

I have been using NextCloud for a long while now. It is a smart solution, and although I do not like its agent (I’ve had many problems with it around Hebrew file names on multiple types of operating systems, which never matured enough for a full bug description and details) – for an easy access…

Read More An excellent alternative to ‘rsync’ for backups to cloud
Linux

Service account
Byetzion 27/05/2009

Assume you have a single purpose account. Maybe some service account, user which should run a single task ever, or even a case of a limited menu interface. You want your user(s) to reach there using SSH because that is the method to do it right and secured. You want it to be easy for…

Read More Service account
Linux

RedHat/OEL 8 self-repo “package is filtered out by modular filtering”
Byetzion 12/11/202227/01/2023

RedHat introduced using modules in DNF on RHEL version 8 (and above). This opens very interesting new options, however, sometimes – a quick-n-dirty repo is needed. An example – for offline systems, a repo of already installed packages (on exactly similar system) is created for later use. This used to work well on RHEL version…

Read More RedHat/OEL 8 self-repo “package is filtered out by modular filtering”
Laptop | Linux

Kernel update – 2.6.14.2
Byetzion 24/12/2005

My tiny laptop has worked rather well so far, but I’ve decided, due to some unexplained problems, to upgrade it to a newer version of kernel, aka 2.6.14.2. Based on my own blog’s entry (what good are blogs if not to hold some long forgotten knowledge?), which can be found here, I’ve upgraded my kernel…

Read More Kernel update – 2.6.14.2

3 Comments

Ian Harper says:

24/05/2007 at 4:41 pm

Shalom,

I have been looking at your blog and also your entries on the Redhat Certified forum.

We also have the MSA1000 and recently two disks (which were a mirror off each other) came up with fail lights on and the Oracle database (on ASM on MSA1000) died and wouldnt recover – had to get Oracle in to recover data with thier DUL utility. Have you experienced anything like this ?

Also have you any experience of RHEL on th DL145 G3 servers ?

Finally, how easy is it to get work in Israel if your not Israeli or Jewish ?

Toda raba
Ian

Reply
Ez-Aton says:

24/05/2007 at 8:56 pm

Hi.
Answers to your questions:

1. I have avoided successfully from using ASM due to the complicated procedure required to recover data. I know a person who had created a generic application using the tnslsnr just to allow access to this data, and he is one of the better Oracle DBAs I know.

I know that hot-backup (which cannot happen in ASM env) or archive log backup can do a descent job in recovering DB, although I don’t know the way to do it (I could search for it on Google, but still – haven’t had to do it yet).

It seems odd to me that two disks failed at the same time. Maybe they failed on different times, and you didn’t monitor the storage, and therefore didn’t get a warning in time?

I have never used Oracle’s DUL utility, and have no experience with it.

I have experience with DL145. What is the problem?

Reply
sandrar says:

11/09/2009 at 12:48 am

Hi! I was surfing and found your blog post… nice! I love your blog. 🙂 Cheers! Sandra. R.

Reply

Related posts:

Similar Posts

3 Comments

Leave a Reply Cancel reply