Posts Tagged ‘SCSI’

Hot adding Qlogic LUNs – the new method

Friday, August 8th, 2008

I have demonstrated how to hot-add LUNs to a Linux system with Qlogic HBA. This has become irrelevant with the newer method, available for RHEL4 Update 3 and above.

The new method is as follow:

echo 1 > /sys/class/fc_host/host<ID>/issue_lip
echo “—” > /sys/class/scsi_host/host<ID>/scan

Replace “<ID>” with your relevant HBA ID.

Notice – due to the blog formatting, the 2nd line might appear incorrect – these are three dashes, and not some Unicode specialy formatted dash.

Oracle ASM and EMC PowerPath

Wednesday, May 28th, 2008

Setting up an Oracle ASM disks is rather simple, and the procedure can be easily obtained from here, for example. This is nice and pretty, and works well for most environments.

EMC PowerPath creates meta devices which utilize the underlying paths, as mod_scsi sees them in Linux, without hiding them (unlike IBM’s RDAC, for example). This results in the ability to view and access each LUN either through the PowerPath meta device (/dev/emcpower*) or through the underlying SCSI disk device (/dev/sd*). You can obtain the existing paths of a single meta devices through running the command

powermt display dev=emcpowera

where ‘emcpowera’ is an example. It can be any of your power meta devices. You will see the underlying SCSI devices.

During startup, Oracle ASM (startup script: /etc/init.d/oracleasm) scans all block devices for ASM headers. On a system with many LUNs, this can take a while (half an hour, and sometimes much more). Not only that, but since ASM scans the available block devices in a semi-random order, the chances are very high that the /dev/sd* will be used instead of the /dev/emcpower* block device. This results in degraded performance, where active-active configuration has been set for PowerPath (because it will not be used), and moreover – a failure of that specific link will result in failure to access the specific LUN through that path, with disregard to any other existing paths to the LUN.

To "set things right", you need to edit /etc/sysconfig/oracleasm, and exclude all ’sd’ devices from ASM scan.

To verify that you’re actually using the right block device:

/etc/init.d/oracleasm listdisks

Select any one of the DG disks, and then

/etc/init.d/oracleasm querydisk DATA1
Disk “DATA1″ is a valid ASM disk on device [120, 6]

The numbers are the major and minor of the block device. You can easily find the device through this command:

ls -la /dev/ | grep MAJOR | grep MINOR

In our example, the MAJOR will be 120, and the MINOR will be 6. The result would look like a single block device.

If you’re using EMC PowerPath, your block device major would be 120 and around that number. If you’re (mistakenly) using one of the underlying paths, your major would be 8 and nearby numbers. If you’re using Linux LVM, your major would be around the number 253. The expected result, when using EMC PowerPath is always with major of 120 – always using the /dev/emcpower* devices.

This also decreases the boot time rather dramatically.

dm-multipath and loss of all paths

Tuesday, May 13th, 2008

dm-multipath is a great tool. Its abilities were proven to me on many occasions, and I’m sure I’m not the only one. NetApp, for example, use it. HP use it as well (a slightly modified version, and still), and it works.

A problem I have encountered is as follow – if a single path fails, the device-mapper continue to work correctly (as expected) and the remaining path becomes active. However – if the last link fails, all processes which require disk access become stale. It means that many tests which search for a given process pass correctly even when this process becomes stale through delayed (forever) access to the filesystem. Also – tests which attempt to write/read a file to/from such a stale filesystem, become stale themselves, which can bring down an entire system (assume we have a cron which creates a file every minute. Every new process becomes stale immediately, so after an hour, we’ll have 60 more processes, and after a day – 1440 additional processes – all stale (D) and waiting for the disk to come back).

Certain detection systems actually fail to auto-detect cases of stale filesystems when using dm-multipath. This is caused by a (default) option called “1 queue_if_no_path”. I discovered that when this option is omitted, such as in the configuration below (only the “device” section):

device
{
vendor “NETAPP”
product “LUN”
getuid_callout “/sbin/scsi_id -g -u -s /block/%n”
prio_callout “/sbin/mpath_prio_ontap /dev/%n”
# features “1 queue_if_no_path”
hardware_handler “0″
path_grouping_policy group_by_prio
failback immediate
rr_weight uniform
rr_min_io 128
path_checker readsector0
}

multiple disk failures will actually result in the filesystem layout reporting I/O errors (which is good). A disk mounted through these options can be mounted with special parameters, such as (for ext2/3): errors=continue ; errors=read-only or errors=panic – my favorite, as it ensured data integrity through self-fencing mechanism.

iSCSI target/client for Linux in 5 whole minutes

Tuesday, December 4th, 2007

I was playing a bit with iSCSI initiator (client) and decided to see how complicated it is to setup a shared storage (for my purposes) through iSCSI. This proves to be quite easy…

On the server:

1. Download iSCSI Enterprise Target from here, or you can install scsi-target-utils from Centos5 repository

2. Compile (if required) and install on your server. Notice – you will need kernel-devel packages

3. Create a test Logical Volume:

lvcreate -L 1G -n iscsi1 /dev/VolGroup00

4. Edit your /etc/ietd.conf file to look something like this:

Target iqn.2001-04.il.org.tournament:diskserv.disk1
Lun 0 Path=/dev/VolGroup00/iscsi1,Type=fileio
InitialR2T Yes
ImmediateData No
MaxRecvDataSegmentLength 8192
MaxXmitDataSegmentLength 8192
MaxBurstLength 262144
FirstBurstLength 65536
DefaultTime2Wait 2
DefaultTime2Retain 20
MaxOutstandingR2T 8
DataPDUInOrder Yes
DataSequenceInOrder Yes
ErrorRecoveryLevel 0
HeaderDigest CRC32C,None
DataDigest CRC32C,None
# various target parameters
Wthreads 8

5. Start iscsi-target service:

/etc/init.d/iscsi-target start

On the client:

1. Install open-iscsi package. It will be called iscsi-initiator-utils for RHEL5 and Centos5

2. Run detection command:

iscsiadm -m discovery -t sendtargets -p <server IP address>

3. You should get a nice reply. Something like this. <IP> refers to the server’s IP

<IP>:3260,1 iqn.2001-04.il.org.tournament:diskserv.disk1

4. Login to the devices using the following command:

iscsiadm -m node -T iqn.2001-04.il.org.tournament:diskserv.disk1 -p <IP>:3260,1 -l

5. Run fdisk to view your new disk

fdisk -l

6. To disconnect the iSCSI device, run the following command:

iscsiadm -m node -T iqn.2001-04.il.org.tournament:diskserv.disk1 -p <IP>:3260,1 -u

This will not allow you to set the iSCSI initiator during boot time. You will have to google your own distro and its bolts and nuts, but this will allow you a proof of concept of a working iSCSI

Good luck!

Aquiring and exporting external disk software RAID and LVM

Wednesday, August 22nd, 2007

I had one of my computers die a short while ago. I wanted to get the data inside its disk into another computer.

Using the magical and rather cheap USB2SATA I was able to connect the disk, however, the disk was part of a software mirror (md device) and had LVM on it. Gets a little complicated? Not really:

(connect the device to the system)

Now we need to query which device it is:

dmesg

It is quite easy. In my case it was /dev/sdk (don’t ask). It shown something like this:

usb 1-6: new high speed USB device using address 2
Initializing USB Mass Storage driver…
scsi5 : SCSI emulation for USB Mass Storage devices
Vendor: WDC WD80 Model: WD-WMAM92757594 Rev: 1C05
Type: Direct-Access ANSI SCSI revision: 02
SCSI device sdk: 156250000 512-byte hdwr sectors (80000 MB)
sdk: assuming drive cache: write through
SCSI device sdk: 156250000 512-byte hdwr sectors (80000 MB)
sdk: assuming drive cache: write through
sdk: sdk1 sdk2 sdk3
Attached scsi disk sdk at scsi5, channel 0, id 0, lun 0

This is good. The original system was RH4, so the standard structure is /boot on the first partition, swap and then one large md device containing LVM (at least – my standard).

Lets list the partitions, just to be sure:

# fdisk -l /dev/sdk

Disk /dev/sdk: 80.0 GB, 80000000000 bytes
255 heads, 63 sectors/track, 9726 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdk1 * 1 13 104391 fd Linux raid autodetect
/dev/sdk2 14 144 1052257+ 82 Linux swap
/dev/sdk3 145 9726 76967415 fd Linux raid autodetect

Good. As expected. Let’s activate the md device:

# mdadm –assemble /dev/md2 /dev/sdk3
mdadm: /dev/md2 has been started with 1 drive (out of 2).

It’s going well. Now we have the md device active, and we can try to scan for LVM:

# pvscan

PV /dev/md2 VG SVNVG lvm2 [73.38 GB / 55.53 GB free]

Activating the VG is a desired action. Notice the name – SVNVG (a note at the bottom):

# vgchange -a y /dev/SVNVG
3 logical volume(s) in volume group “SVNVG” now active

Now we can list the LVs and mount them on our desired location:

]# lvs
LV VG Attr LSize Origin Snap% Move Log Copy%
LogVol00 SVNVG -wi-a- 2.94G
LogVol01 SVNVG -wi-a- 4.91G
VarVol SVNVG -wi-a- 10.00G

Mounting:

mount /dev/SVNVG/VarVol /mnt/

and it’s all ours.

To remove this connected the disk, we need to reverse the above process.

First, we will umount the volume:

umount /mnt

Now we need to disable the Volume Group:

# vgchange -a n /dev/SVNVG
0 logical volume(s) in volume group “SVNVG” now active

0 logical volumes active means we were able to disable the whole VG.

Disable the MD device:

# mdadm –manage -S /dev/md2

Now we can disconnect the physical disk (actually, the USB) and continue with out life.

A note: RedHat systems name their logical volumes using a default name VolGroup00. You cannot have two VGs with the same name! If you activate a VG which originated from RH system and used a default name, and your current system uses the same defaults, you need to connect the disk to an external system (non RH would do fine) and change the VG name using vgrename before you can proceed.

Installing RHEL4 on HP DL140 G3 with the embedded RAID enabled

Friday, July 6th, 2007

While DL140 G3 is quite a new piece of hardware, RHEL4, even with the later updates, is rather old.

When you decide to install RHEL4 on a DL140 G3 server, my first recommendation is this: if you decide to use the embedded SATA-II RAID controller – don’t. This is a driver-based RAID, much like the past win-modem devices. Some major parts of its operations are based on calculations done through the driver, directly on the host CPU. It has no advantages comparing to software RAID, and its major disadvantage is its immobile state – unlike software based mirror, this array cannot “migrate” to another server, unless this server is of the same type of hardware. Not sure about it, but it might also require close enough version of firmware as well.

It happened that your boss believes in win-RAID devices (despite the note above), or for some other reason you decide to use this win-RAID, here are the steps to install the system.

1. Download the latest driver disk image for RHEL4 from HP site.

2. If you have the privilage of having an NFS server, uncompress the image and put it on it, where it can be accessed through network.

3. Test that you can mount it from another server. Verify you can reach the image file. Debugging incorrect NFS issues can waste lots of time.

4. If you don’t have the privilege, I hope you have a USB floppy. Put the image on a floppy disk:

gzip -dc /path/to/compressed/image/file.gz | dd of=/dev/fd0

(/dev/fd0 assuming this is not /dev/sdX, as it tends to be with USB floppies)

5. Boot the server with the first RHEL4 CD in the drive, or with PXE, or whatever is your favorite method. In the initial boot prompt type:

linux text dd=nfs:server:/path/to/nfs/disk/disk_image.dd

This assuming that the name of the file (including its full path) is /path/to/nfs/disk/disk_image.dd. For floppy users, type dd=floppy instead.

6. RHEL will boot, loading “ahci” module (which is bad) during its startup. It will ask you to select through which network card the system is to reach the NFS server. I assume you have a working DHCP in your site.

7. As soon as you are able to use the virtual terminal (Ctrl+F2) switch to it.

8. Run the following commands:

cd /tmp
mkdir temp
cd temp
gzip -S .cgz -dc /tmp/ramfs/DD-0/modules.czf | cpio -id

cd to the modules directory, and look at the modules. Know which is the module which fits your running kernel. You can do this by using ‘uname’ command.

9. Run the following commands

rmmod ahci
rmmod adpahci
insmod KERNEL_VER/ARCH/adpahci.ko

Replace KERNEL_VER with your running(!!!) single-CPU kernel version, and replace ARCH with your architecture, either i386 or x86_64

10. Using Ctrl+F1 return to your running installer. Continue installation until the end but do not reboot the system when done.

11. When installation is done, before the reboot, return to the virtual console using Ctrl+F2.

12. Run the following commands to prepare your system for a happy reboot:

cp /tmp/temp/KERNEL_VER/ARCH/adpachi.ko /mnt/sysimage/lib/modules/KERNEL_VER/kernel/drivers/scsi/

cp /tmp/temp/KERNEL_VERsmp/ARCH/adpachi.ko /mnt/sysimage/lib/modules/KERNEL_VERsmp/kernel/drivers/scsi

Notice that we’ve copied both the single CPU (UP) and the SMP versions.

13. Edit modprobe.conf of the system-to-be and remove the line containing “alias scsi_hostadapter ahci” from the file.

14. Chroot into the system-to-be, and build your initrd:

chroot /mnt/sysimage
cd /boot
mv initrd-KERNEL_VER.img initrd-KERNEL_VER.img.orig
mv initrd-KERNEL_VERsmp.img initrd-KERNEL_VERsmp.img
mkinitrd /boot/initrd-KERNEL_VER.img KERNEL_VER
mkinitrd /boot/initrd-KERNEL_VERsmp.img KERNEL_VERsmp
exit

If things went fine so far, you are now ready to reboot. Use Ctrl+F1 to return to the installation (anaconda) console, and reboot the system.

Notes:

1. You need to download the “Driver Diskette” from HP site.

2. The latest Driver Diskette will support only Update3 and Update4 based systems. At this time, Update5 has no modules by HP yet. You can compile your own, but this is not in our scope.

3. Avoid using floppies at all cost.

4. Do not install the system in full GUI mode. In the model I have installed the VGA (Matrox device) had a bug and did not allow to reach the virtual text consoles. It disconnected the VGA. If you use GUI installation, you will be required to reboot the system into rescue mode and do steps 7 to 14 then.

5. Underlining the word smp is meant to help you not forget it. This is the more important one.

6. On the system itself, using Xorg, I was able to reach max resolution of 640×480 even with the display drivers supplied by HP. I was able to reach 1024×768 only when using 256 colors.

HP MSA1000 controller failover

Tuesday, March 27th, 2007

HP MSA1000 is an entry-level disk storage capable of communicating via different types of interfaces, such as SCSI and FC, and can allow FC failover. This FC failover, however, is controller failover and not path failover. It means that if the primary controller fails entirely, the backup controller will “kick in”. However, if a multi-path capable client will fail its primary interface, there is no guarantee that communication with the disks through the backup controller.

The symptom I have encountered was that the secondary path, while exposing the disks (while the primary path was down for one of the servers) to the server, did not allow any SCSI I/O operations. This prevented the Linux server’s SCSI layer from accessing the disks. So they did appear when doing “cat /proc/scsi/scsi“, however, they were not detected using, for example, “fdisk -l“, and the system logs got filled with “SCSI Error” messages.

About a month ago, after almost two years, a new firmware update has been released (can be found here). Two versions exist – Active/Passive and Active/Active.

I have upgraded the MSA1000 storage device.

After installing the Active/Active firmware upgrade (Notice Linux users – You must have X to run the “msa1500flash” utility), and after power cycling the MSA1000 device, things start to look good.

I have tested performance with a person on-site disconnecting fiber connections on-demand, and it worked great. About 2-5 seconds failover time.

Since this system run Oracle RAC, and it uses OCFS2, I had to update the failed-node timeout to be 31 seconds (per this Oracle’s OCFS site, which includes some really good tips).

So real High Availability can be archived after upgrading MSA1000 firmware.

Compaq Proliant 360/370/380 G1 cpqarray problems with Ubuntu

Saturday, March 24th, 2007

Or, for that matter, any other Linux distribution that:

a. uses kernel 2.6.x up to 2.6.18

b. Does not dynamically create the initrd as part of the installation

Ubuntu, for that matter, is an example of not doing both. While it does create the initrd, it doesn’t create it dynamically per the output of ‘lspci‘, which results in inclusion of every SCSI module which exists.

The symptoms – you can install the system, however, you are unable to boot it afterwards. You might get into your Busybox initrd. The cpqarray module doesn’t detect any arrays. Error is "cpqarray: error sending ID controller" . You will notice that the module sym53c8xx is loaded.

I’ve searched for a solution and found an initial hint in this blog, however, the entry was not completely accurate. Following the tips given in this page, I was able to understand that there was a bug in the kernel which caused sym53c8xx modules to take-over the cpqarray. I was required to remove the modules from the initrd. I booted into rescue mode from the Ubuntu Server CD, and from there did the following:

1. mount /boot

2. add the following modules list to your /etc/initramfs-tools/modules – modules-proliantG1.txt

3. Edit /etc/initramfs-tools/initramfs.conf to change "MODULES=most" to "MODULES=list"

4. Run "update-initramfs -k 2.6.17-11-server -c" (this is relevant in my case – up-to-date Ubuntu server 6.10. For other versions, check what is the latest version of installed kernel. This can be found by a mere ls on /lib/modules/)

After reboot I was pleased to discover that my system was able to boot correctly, and I know it will do so for updated versions of the kernel

Installing SuSE 8 SP3 on HP DL380 G4

Thursday, January 4th, 2007

Installing older Linux systems on new hardware is always fun. SuSE 8 especially might be tricky, due to several assumptions during install:

1. The install kernel is the Single CPU version. However, on an SMP system, the Single CPU kernel would not be actually installed.

2. HP assume you have a floppy with your server. You probably don’t.

The procedure went like this:

1. Download from HP site the required drivers RPM.

2. Extract the RPM file using rpm2cpio (for example: "rpm2cpio drivers.rpm | cpio -id" ). Copy the drivers, per the kernel versions which will be used to a Disk-on-Key. Also, for later use, put the RPM file on the Disk-on-Key.

3. Boot the server with the SP3 CD. It will ask for the first SuSE CD. Insert it, and it will claim it cannot detect disks.

4. Insert USB Disk-on-Key to USB slot. Go to command shell (Ctrl+Alt+F2).

5. run "modprobe usb-storage"

6. Using "dmesg" verify the USB device is detected and attached to a SCSI device. Mount it to some temporary location.

7. Insert the cciss.o module manually, per the correct kernel version. When done, umount the USB volume and disconnect it.

8. Using "Ctrl+Alt+F6" go back to the installation GUI. Select partitioning, and using the manual (advanced) interface, rescan the disks. You will see your CCISS disks.

9. Install the system. When done, you will have to reboot it back to stage 6.

10. Insert the cciss.o module manually, per the correct kernel version.

11. Mount your system volume to a temporary location.

12. Copy the drivers RPM to the /root of the system volume.

13. Chroot into the mounted system volume.

14. edit /etc/sysconfig/kernel and add the module "cciss" to the modules list in it.

15. Install the drivers RPM.

16. Exit from the chrooted shell, and restart the server. Eject the CD when done.

Now you should have a running SuSE 8 system on an HP DL380 G4.

I assume such stages can be performed on a G5 server as well, however, with HP not supplying CCISS drivers on their site for SuSE 8, the only option seems valid (and I didn’t try it yet) is to compile the CCISS module based on sources obtained from the CCISS SF site on another SuSE 8 server, and using them in a way similar to the one described above.

Hot-Adding SAN lun to Linux (RH with Qlogic drivers)

Monday, September 18th, 2006

cat /proc/scsi/qla2xxx/$Z” where Z represents the SCSI interface the Qlogic has taken for itself, you’ll get something like this:

<Snip>

.

.

</Snip>

SCSI LUN Information:
(Id:Lun) * – indicates lun is not registered with the OS.
( 0: 0): Total reqs 63185608, Pending reqs 0, flags 0×2, 0:0:81 00

Assuming you’ve just added the next LUN, in our case, LUN1, after reboot you would get an additional line below such as:

( 0: 1): Total reqs 1923, Pending reqs 0, flags 0×2, 0:0:81 00

However, on a production server we want to add this line without a reboot.

To achieve this goal, we need to run the following command:

“echo “scsi-qlascan” > /proc/scsi/qla2xxx/$Z” where $Z represents, like before, the SCSI interface.

Then you get the additional line(s) in the file. Now you should help your Linux see them (and attach a module to them). You can do it by using the following convention (taken from here): Using “dmesg“.

you can obtain the required details for the next stage: Controller, Channel, Target and LUN. Example:

Host: scsi2 Channel: 00 Id: 00 Lun: 01
Vendor: IBM Model: 1742 Rev: 0520
Type: Direct-Access ANSI SCSI revision: 03

Obtain the following details:

Controller=2 (scsi2)

Channel=0 (Channel: 00)

Target=0 (Id: 00)

LUN=1 (Lun: 01)

We will ask Linux nicely to reattach the device. Replace the descriptors with the numeric values

echo “scsi add-single-device Controller Channel Target LUN” > /proc/scsi/scsi”

In our example: “echo “scsi add-single-device 2 0 0 1″ > /proc/scsi/scsi

To remove a device prior to unmapping it from the SAN, replace add-single-device with “remove-single-device”.

This post’s Qlogic discovery was the insight of a friend of mine, and the credit is his :-)