Posts Tagged ‘multipath’

iSCSI persistent configurations agains us all

Thursday, November 19th, 2009

Using iSCSI with dm-multipath is rather common setup. With iSCSI running over Ethernet cables, which are too easy to disconnect (either on purpose or by mistake), being cheap and common technology – multipath becomes a must. If you have multiple network links, this is only expected that you use multipath for your iSCSI configuration. It’s cheap, it’s easy and it works.

This, however, comes with a price tag. Not money – the components are cheap and common, but there are configuration acts which should take place.

It is easy to find info either in the open-iscsi documentations, the Internet, whatever, and I will go over them just below, but there are some catches which one should be aware of.

Per the common documentation, unlike regular iSCSI communication, when dealing with multipath, you would like iSCSI to fail rather quickly and let the SCSI layer handle the errors, thus letting dm-multipath handle the errors and do its work.

The configuration directives are rather simple. In the iscsid.conf file (on RHEL5 located in /etc/iscsi/iscsid.conf ), you need to change the value

node.session.timeo.replacement_timeout =

To a very short period of time. By default, it is set to 120 seconds, which are two minutes before anyone will notify the SCSI subsystem of any disk IO errors. A good value would be 5 seconds, which should allow for very short network disconnection (which could happen) and still – let the dm-multipath manage errors fast enough so that applications would not fail on disk timeouts.

Another two parameters which should be defined are the following (read the comments above them in the config file):

node.conn[0].timeo.noop_out_interval =
node.conn[0].timeo.noop_out_timeout =

These values control the interval in which the iSCSI layer tests communication to the targets.

Also, in multipath.conf you will need to set the following feature, so that IOPs will not be lost:

features		"1 queue_if_no_path"

These configuration directives can be found in these two pages from RedHat:

iscsi-modifying-link-loss-behavior-dmmultipath

iscsi-replacements_timeout

This is nice and pretty. However, if you have failed to do so at start, and defined your iSCSI targets based on the default configurations, you will notice that it still takes very long for iSCSI to notify the SCSI subsystem of the errors. You could check the values used by iSCSI through running the command:

iscsiadm -m node -T <target name>

Check out especially the line called node.session.timeo.replacement_timeout. Its value is the one which decides the actual behavior of iSCSI.

To change it, there are several methods. One of them is to clean up the iSCSI persistent configurations, located in /var/lib/iscsi and then re-login to the iSCSI targets. Only then you will have the new target configuration.

Check again with iscsiadm as described above, and check that this value matches.

dm-multipath and loss of all paths

Tuesday, May 13th, 2008

dm-multipath is a great tool. Its abilities were proven to me on many occasions, and I’m sure I’m not the only one. NetApp, for example, use it. HP use it as well (a slightly modified version, and still), and it works.

A problem I have encountered is as follow – if a single path fails, the device-mapper continue to work correctly (as expected) and the remaining path becomes active. However – if the last link fails, all processes which require disk access become stale. It means that many tests which search for a given process pass correctly even when this process becomes stale through delayed (forever) access to the filesystem. Also – tests which attempt to write/read a file to/from such a stale filesystem, become stale themselves, which can bring down an entire system (assume we have a cron which creates a file every minute. Every new process becomes stale immediately, so after an hour, we’ll have 60 more processes, and after a day – 1440 additional processes – all stale (D) and waiting for the disk to come back).

Certain detection systems actually fail to auto-detect cases of stale filesystems when using dm-multipath. This is caused by a (default) option called “1 queue_if_no_path”. I discovered that when this option is omitted, such as in the configuration below (only the “device” section):

device
{
vendor “NETAPP”
product “LUN”
getuid_callout “/sbin/scsi_id -g -u -s /block/%n”
prio_callout “/sbin/mpath_prio_ontap /dev/%n”
# features “1 queue_if_no_path”
hardware_handler “0″
path_grouping_policy group_by_prio
failback immediate
rr_weight uniform
rr_min_io 128
path_checker readsector0
}

multiple disk failures will actually result in the filesystem layout reporting I/O errors (which is good). A disk mounted through these options can be mounted with special parameters, such as (for ext2/3): errors=continue ; errors=read-only or errors=panic – my favorite, as it ensured data integrity through self-fencing mechanism.

HP-UX – allowed shells, and connecting FC Multipath to NetApp

Thursday, August 10th, 2006

When adding a certain shell to an HP-UX system, for example, /usr/bin/tcsh, each user set to use this shell will not be able to FTP to the machine, until there is entry in /etc/shells. The trick is that even if the file doesn’t exist, you have to create it. By default, HP-UX allows only /sbin/sh and /bin/sh shells, but as soon as you setup this file, you can allow more shells. Mind you that you have to include /sbin/sh and /bin/sh in /etc/shells, else other things might not work correctly. Taken from here.

Connecting HP-UX to SAN storage is never too simple. The actual list of actions is:

1. Install HP-UX drivers for the FC adapter

2. Map the PWWN obtained from (reading the sticker at the back of the machine, or querying the storage/SAN switch) the machine to the relevant LUNs.

3. Run “/usr/sbin/ioscan -fnC disk” and see that the new disk devices are detected.

4. Run “/usr/sbin/ioinit -i” to create the relevant device files.

A note – HP-UX might require a reboot after the initial connection. On several cases I’ve noticed that if the server was running for a while with disconnected fiber, only being connected during before startup would result in link and in SAN registration. Of course, the driver must be installed then.

If you are to connect your HP-UX to NetApp device, as we did, take a day (or more) notice and open “now” account in http://now.netapp.com. You can find documentation about HP-UX (including step-by-step), you can find the “SAN Attach Kit for HP-UX” which will make your life easier, and set of best-practice guides. Just follow these guides, and you will find it easy and simple task to do.

Hitachi HW100 limitations

Sunday, March 19th, 2006

Not too long ago I have purchased a brand new Hitachi HW100 Workgroup Storage. 14+1 400GB SATA disks, dual-interface, each holding two fiber ports (2Gb/s LC). A nice machine. However, two limitations I have discovered are clouding my day:

1. It supports only Raid0 and Raid5. It will not allow me no-parity Raid, aka, Raid1. It is a pity, because my storage is meant for the lab, and not for production, and I wish to decide my own Raid level. That is why it’s 14+1 disks, and not 15. You must have a spare disk.

2. While not connected directly to hosts (Private Loop), only one port out of two ports in an interface works. It means I bought 4 ports, and actually got two. Why would I need four? I need to simulate multi-path, load balancing and utilize maximum flexibility when connecting this storage to my fiber network. After all, this network is all lab, no production network, and I need to maximize my output, with as little redundancy as possible.

These two limitations are somewhat hidden, and, especially the second one, was discovered only after I’ve questioned HDS’ support person. Pity. I will want to return it, and get a more capable one. I need to see what I can do with it.