Posts Tagged ‘netapp’

Connecting EMC/NetApp shelves as JBOD to a Linux machine

Wednesday, April 29th, 2015

Let’s say you have old shelves of either EMC or NetApp with SAS or SATA disks in them. And let’s say you want to connect them via FC to a Linux machine and have some nice ZFS machine/cluster, or whatever else. There are few things to know, and to attend in order for it to work.

The first one is the sector size. For NetApp – this applies only to non SATA disks (I don’t know about SSDs, though), and for EMC this could apply, as far as I noticed, to all disks – sector size is not 512 bytes, but 520 – the additional 8 bytes are used for block checksum. Linux does not handle well 520 blocks – the following error message will appear in the logs:

Unsupported sector size 520.

To solve it, we will need to identify the disks – using sg3_utils (in Centos-like – yum install sg3_utils) and then modify them to block size of 512 bytes. To identify the disks, run:

sg_scan -i
/dev/sg0: scsi0 channel=3 id=0 lun=0
HP P410i 3.66 [rmb=0 cmdq=1 pqual=0 pdev=0xc]
/dev/sg1: scsi0 channel=0 id=0 lun=0
HP LOGICAL VOLUME 3.66 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg2: scsi3 channel=0 id=0 lun=0 [em]
hp DVD A DS8A5LH 1HE3 [rmb=1 cmdq=0 pqual=0 pdev=0x5]
/dev/sg3: scsi1 channel=0 id=0 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg4: scsi1 channel=0 id=1 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg5: scsi1 channel=0 id=2 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg6: scsi1 channel=0 id=3 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg7: scsi1 channel=0 id=4 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg8: scsi1 channel=0 id=5 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg9: scsi1 channel=0 id=6 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg10: scsi1 channel=0 id=7 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg11: scsi1 channel=0 id=8 lun=0
FUJITSU MXW3300FE 0906 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg12: scsi1 channel=0 id=9 lun=0
FUJITSU MXW3300FE 0906 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg13: scsi1 channel=0 id=10 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg14: scsi1 channel=0 id=11 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg15: scsi1 channel=0 id=12 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg16: scsi1 channel=0 id=13 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg17: scsi1 channel=0 id=14 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]

So, for each sg device (member of our batch of disks) we need to modify the sector size.

Two ways to do so – the first suggested by this post here, is by using sg_format in the following manner:

sg_format –format –size=512 /dev/sg2

Another post suggested using a dedicated program called ‘setblocksize’. I followed this one, and it worked fine. I had to power cycle the disks before the Linux could use them.

I did notice that disk performance were not bright. I got about 45MB/s write, and about 65-70 MB/s read for large sequential operations, using something like:

dd bs=1M if=/dev/sdf of=/dev/null bs=1M count=10000
dd bs=1M if=/dev/null of=/dev/sdf oflag=direct count=10000 # WARNING – this writes on the disk. Do not use for disks with data!

Fairly disappointing. Also, using multipath, when the shelf is connected to one FC port, and then back to another, showed me that with the setting:

path_grouping_policy multibus

I got about 10MB/s less compared to using “failover” flag (the default for Centos 6). Whatever modification I did to the multipathd.conf, I was unable to exceed this number when using multiple access. These results were consistent when using multibus or group_by_serial, however, when a single path was active and the other was passive, It clearly showed better. I did modify rr_min_io and rr_min_io_rq, but with no effect.

The low disk performance could suggest I need to flush the original disk firmware, however, I am not sure I will do so. If anyone is reading this and had different results – I would love to hear about it.

NetApp LUN Serial and SCSI Word 83

Sunday, November 2nd, 2014

I was wandering for a long while about the connection between NetApp’s LUN Serial and the identifier the host sees, aka “Word 83”. There was an obvious connection, but I figured it out only today.

The LUN Serial is an ASCII representation of the hexadecimal Word 83, or, to be exact, the last 22 hex characters of it.
See:

lun serial /vol/volume/qtree/lun
Serial#:  7S1PW?Bym7B

When querying the multipath device represented there, we get:

360a9800037533150573f4279316d3742 dm-7 NETAPP,LUN
[size=4.0T][features=0][hwhandler=0][rw]
_ round-robin 0 [prio=4][active]
 _ 1:0:0:30 sdm  8:192  [active][ready]
 _ 2:0:0:30 sdz  65:144 [active][ready]

Using a simple web calculator of Hex-to-Text (example: Use this), we can see that 7S1PW?Bym7B is translated to
37533150573f42796d3742 , which represents the last 22 characters of the reported Word 83. I assume that the leftmost nine hex characters represent the storage device. So, easy to identify.

An additional nice trick is to ask the NetApp to represent the LUN Serial in hex:

lun serial -x /vol/volume/qtree/lun
Serial (hex)#: 0x37533150573f4279316d3742

which represents the same Word 83 we’ve seen before. However, NetApp will not allow you to set (under priv mode) the LUN Serial directly to a hex value. There comes the importance of the Hex-to-Text calculation tools.

NetApp internals – how to add SSH keys without C$ nor NFS shares

Thursday, April 3rd, 2014

This post will describe the process of placing SSH keys using the internal ‘systemshell’ command of NetApp. As always – when doing something which the vendor did not intend you to do, do it very carefully. This data was obtained from NetApp forums, and while I do not have the original post to link (I usually link to the original, as a courtesy to the original author), this is the content, as is.

First, set to advanced mode:
filer> priv set advanced

Then, unlock and set a password to diag account:
filer*> useradmin diaguser unlock
filer*> useradmin diaguser password

Start the systemshell, create the directory you need and put the pubkey generated in the authorized_keys file:
filer*> systemshell

login: diag
Password: the same you set in the previous step

filer% mkdir -p /mroot/etc/sshd/root/.ssh
filer% vi /mroot/etc/sshd/root/.ssh/authorized_keys
filer% sudo chown -R root:wheel /mroot/etc/sshd/root
filer% sudo chmod -R 0600 /mroot/etc/sshd/root

Last, exit systemshell, lock diag account and exit advanced mode:
filer% exit
filer*> useradmin diaguser lock
filer*> priv set admin

If you want to do it for any other user, just replace the word ‘root’ with the said user.

An additional note – I had to create a user to perform ‘df’ operations only. The purpose was to be able to obtain data using ‘ssh’ without disclosing the keys used for root SSH access, by having a very limited user, designed to do that.

So the commands to create such a user are as follows:

useradmin role add df -a cli-df*,login-ssh
useradmin group add df_users -r df
useradmin user add df -g df_users
(here you will be asked to enter the user’s password)

Hope it helps!

 

 

NetApp “Broken disk label”

Thursday, August 1st, 2013

When using ‘disk show -v’ on a NetApp filer version 7.3.x, following replacement or addition of disk(s), you might see the above mentioned message. It is caused by incorrect disk label – of OnTap version 8, on an OnTap version 7.3.x system. The system cannot handle the incorrect label, and thus – ignores the disk.

A set of actions is required to clean the label and allow the NetApp to use this specific disk. The easiest method (although it will not be described here) would be to place the disk back in an OnTap 8 NetApp device, and clean the label from there, however, it is not always possible.

On your OnTap 7.3.x system, do the following (assuming you know the address of the disk, right?) – taken from NetApp’s forums here.

disk assign <diskid>
priv set diag
labelmaint isolate <diskid>
label wipe <diskid>
label wipev1 <diskid>
label makespare <diskid>
labelmaint unisolate
priv set

The fifth or sixth lines might fail to run, but still – the process will succeed as a whole.

Cacti NetApp Ontap API data query

Sunday, June 23rd, 2013

I have been using the excellent template and scripts from this forum post, however, when the NetApp device is loaded with LUNs and volumes, the script will cause the Cacti to timeout, and during that time, consume CPU. The original cause of this problem was a workaround to some NetApp Perl API bug the original author found, which forced him to query the entire data set for each sub-query. This is nice for five, or even ten volumes, but when you’re around 400 volumes, things just look bad.

Due to that, I have taken upon myself to make this script more scalable, but forcing a single data query from the NetApp for each data type (volume, LUN, system, etc) and data query type (get, index, etc). A unique file was created with its name being the storage_device_name.data_type.query_type. Following queries to any subset of this data were just accessing this file, and not the remote NetApp device, killing network, CPU, and tending to time out on operation and leave huge blank parts in the graphs.

I will post my modified template in the forum as well, but I place it here, just so that it will be both available for me, and for any interested reader.

Get it here:NetApp_OnTap-SDK_cacti-20130623.tar.gz