Archive for the ‘Disk Storage’ Category

Multiple iSCSI interface on the same subnet

Friday, April 14th, 2017

As far as routing goes, it is a very bad idea to place multiple network interfaces with IPs of a single network (subnet). The routing table, which decides which interface to route the data through, reads the table line by line, thus – all traffic goes through a single interface of the batch (-> of the interfaces “living” on the same network). A common way of working around it is by using network teaming (Linux = bonding) to handle round-robin or active/passive of multiple interfaces on a single network, and that means that the interfaces are “bonded” into a single logical interface, which, then, has no routing issues whatsoever.

Having multiple network interfaces with IPs of the same network does not help with throughput, but does it increase redundancy? The simple answer is “no” – It does not. In Linux, when an interface is disconnected, it does not go down automatically (unlike Windows, for example), meaning that the routing table does not get updated with remaining interfaces. The traffic would still be targeted at that “dead” interface. Moreover – even if Linux did do that, this is a solution for a single type of a problem. What if someone changed the switch to have a different network VLAN on that particular (one of two or more) port? Link remains up, but communication doesn’t get anywhere.

These problems are more noticeable when the single network is an iSCSI network, and the system needs both multiple path access to the iSCSI storage, and redundancy becomes an issue, because iSCSI network is commonly – a critical network.

It is common to connect multiple iSCSI networks and not multiple ports on a single network, maintaining fabric-like separation of networks, and devices, where possible. However – this article will point at what to do if the network layout is not under your control and you need to place multiple network interfaces on the same physical network.

As mentioned before, the routing table will make sure that all your communication to that particular network go by the first routed network interface. However – the iSCSI daemon (open-iscsi) has the capability to bind to specific interfaces. To do so, you need to define an interface to iSCSI. Do so by running the command:

iscsiadm -m iface -I NIC_NAME –op=new

The name is user defined. I recommend using the same names used by the OS, like ‘eth5’ or ‘ib0’ or ‘eno1’ – to keep it simple. This is a declarative definition only. To actually bind the ‘iface’ to the real interface device, you need to run the following command:

iscsiadm -m iface -I NIC_NAME  –op=update -n iface.hwaddress -v 00:AA:BB:CC:DD:EE

Use the real interface MAC address. Then you can discover and login to targets with the defined interfaces only:

iscsiadm -m discovery -t st -p TARGET_IP -I NIC1 -I NIC2

iscsiadm -m node -L all -I NIC1 -I NIC2

According to the documentations, this will require setting the sysctl net.ipv4.conf.default.rp_filter to 0.

This should do the trick, so now multipathing should show the right amount of paths.

Smartmontools 6.5 for RHEL 6

Monday, September 26th, 2016

I have created an RPM package, and SRPM package, which I will share here, for smartmontools version 6.5 on RHEL 6. Note that the official version is 5.43 which is clueless with many modern SSD disks. I have yet to test it correctly, and in general – use at your own risk.

smartmontools-6-5-1-el6-src

XenServer – Map all VMs disks to Storage Repositories

Tuesday, November 10th, 2015

(more…)

Migration of Oracle GI quorum disk to another diskgroup

Sunday, June 28th, 2015

When installing Oracle RAC (or in its more modern name – GI) version 11.2.0.1 and above, you can use Oracle ASM DiskGroup as your CRS+Voting file location.

It is fairly simple changing the disk membership in Oracle ASM DiskGroup, however, when you face some unknown bugs which prevent you from doing just that, or when you are required to modify the ASM DiskGroup on which the CRS+Voting files are placed, the article below is the one for you. You would have to remember, in addition, the ASM spfile.

So, as a reminder (which is one of the purposes of this blog), here’s a link to a very extensive article about how to migrate the CRS, Vote and spfile from one ASM DiskGroup to another: Migrate OCR to another DiskGroup

 

Connecting EMC/NetApp shelves as JBOD to a Linux machine

Wednesday, April 29th, 2015

Let’s say you have old shelves of either EMC or NetApp with SAS or SATA disks in them. And let’s say you want to connect them via FC to a Linux machine and have some nice ZFS machine/cluster, or whatever else. There are few things to know, and to attend in order for it to work.

The first one is the sector size. For NetApp – this applies only to non SATA disks (I don’t know about SSDs, though), and for EMC this could apply, as far as I noticed, to all disks – sector size is not 512 bytes, but 520 – the additional 8 bytes are used for block checksum. Linux does not handle well 520 blocks – the following error message will appear in the logs:

Unsupported sector size 520.

To solve it, we will need to identify the disks – using sg3_utils (in Centos-like – yum install sg3_utils) and then modify them to block size of 512 bytes. To identify the disks, run:

sg_scan -i
/dev/sg0: scsi0 channel=3 id=0 lun=0
HP P410i 3.66 [rmb=0 cmdq=1 pqual=0 pdev=0xc]
/dev/sg1: scsi0 channel=0 id=0 lun=0
HP LOGICAL VOLUME 3.66 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg2: scsi3 channel=0 id=0 lun=0 [em]
hp DVD A DS8A5LH 1HE3 [rmb=1 cmdq=0 pqual=0 pdev=0x5]
/dev/sg3: scsi1 channel=0 id=0 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg4: scsi1 channel=0 id=1 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg5: scsi1 channel=0 id=2 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg6: scsi1 channel=0 id=3 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg7: scsi1 channel=0 id=4 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg8: scsi1 channel=0 id=5 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg9: scsi1 channel=0 id=6 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg10: scsi1 channel=0 id=7 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg11: scsi1 channel=0 id=8 lun=0
FUJITSU MXW3300FE 0906 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg12: scsi1 channel=0 id=9 lun=0
FUJITSU MXW3300FE 0906 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg13: scsi1 channel=0 id=10 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg14: scsi1 channel=0 id=11 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg15: scsi1 channel=0 id=12 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg16: scsi1 channel=0 id=13 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg17: scsi1 channel=0 id=14 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]

So, for each sg device (member of our batch of disks) we need to modify the sector size.

Two ways to do so – the first suggested by this post here, is by using sg_format in the following manner:

sg_format –format –size=512 /dev/sg2

Another post suggested using a dedicated program called ‘setblocksize’. I followed this one, and it worked fine. I had to power cycle the disks before the Linux could use them.

I did notice that disk performance were not bright. I got about 45MB/s write, and about 65-70 MB/s read for large sequential operations, using something like:

dd bs=1M if=/dev/sdf of=/dev/null bs=1M count=10000
dd bs=1M if=/dev/null of=/dev/sdf oflag=direct count=10000 # WARNING – this writes on the disk. Do not use for disks with data!

Fairly disappointing. Also, using multipath, when the shelf is connected to one FC port, and then back to another, showed me that with the setting:

path_grouping_policy multibus

I got about 10MB/s less compared to using “failover” flag (the default for Centos 6). Whatever modification I did to the multipathd.conf, I was unable to exceed this number when using multiple access. These results were consistent when using multibus or group_by_serial, however, when a single path was active and the other was passive, It clearly showed better. I did modify rr_min_io and rr_min_io_rq, but with no effect.

The low disk performance could suggest I need to flush the original disk firmware, however, I am not sure I will do so. If anyone is reading this and had different results – I would love to hear about it.