Posts Tagged ‘RAID’

HP ML110 G3 and Linux Centos 4.3 / RHEL 4 Update 3

Tuesday, May 30th, 2006

Using the same installation server as before, my laptop, I was able to install Linux Centos 4.3, with the addition of HP’s drivers for Adaptec SATA raid controller, on my new HP ML110 G3.

Using just the same method as before, when I’ve installed Centos 4.3 on IBM x306, but with HP drivers, I was able to do the job easily.

To remind you the process of preparing the setup:

(A note – When I say "replace it with it" I always recommend you keep the older one aside for rainy days)

1. Obtain the floppy image of the drivers, and put it somewhere accessible, such as some easily accessible NFS share.

2. Obtain the PXE image of the kernel of Centos4.1 or RHEL 4 Update 1, and replace your PXE kernel with it (downgrade it)

3. Prepare the driver’s RPM and Centos 4.1 / RHEL 4 Update 1 kernel RPM handy on your NFS share.

4. Do the same for the PXE initrd.img file.

5. Obtain the /Centos/base/stage2.img file from Centos 4.1 or RHEL 4 Update 1 (depends on the installation distribution, of course), and replace your existing one with it.

6. I assume your installation media is actually NFS, so your boot command should be something like: linux dd=nfs:NAME_OF_SERVER:/path/to/NFS/Directory

Should and would work like charm. Notice you need to use the 64bit kernel with the 64bit driver, and same for the 32bit. Won’t work otherwise, of course.

After you’ve finished the installation, *before the reboot*, press Ctrl+Alt+F2 to switch to text console, and do the following:

1. Copy your kernel RPM to the new system /root directory: cp /mnt/source/prepared_dir/kernel….rpm /mnt/sysimage/root/

2. Do the same for HP drivers RPM

3. Chroot into the new system: chroot /mnt/sysimage

4. Install (with –force if required, but *never* try it first) the RPMs you’ve put in /root. First the kernel and then HP driver.

5. HP Driver RPM will fail the post install. It’s OK. rename /boot/initrd-2.6.9-11.ELsmp (or non SMP, depends on your installed kernel)

6. Verify you have alias for the new storage device in your /etc/modprobe.conf

7. run mkinitrd /boot/initrd-2.6.9-11.ELsmp 2.6.9-11.ELsmp (or non SMP, depending on your kernel)

8. Edit manually your /etc/grub.conf to your needs.

Note – I do not like Grub. Actually, I find it lacking in many ways, so I install Lilo from the i386 (not the 64bit, since it’s not there) version of the distro. Later on, you can rename /etc/lilo.conf.anaconda to /etc/lilo.conf, and work with it. Don’t forget to run /sbin/lilo after changes to this file.

Ontap Simulator, and some insights about NetApp

Tuesday, May 9th, 2006

First and foremost – the Ontap simulator, a great tool which surely can assist in learning NetApp interface and utilization, lacks in performance. It has some built-in limitations – No FCP, no disks (virtual disks) larger than 1GB (per my trial-and-error. I might find out I was wrong somehow, and put in on this website), and low performance. I’ve got about 300KB/s transfer rate both on iSCSI and on NFS. To make sure it was not due to some network hog hiding somewhere on my net(s), I’ve even tried it from the host of the simulator itself, but to no avail. Low performance. Don’t try to use it as your own home iSCSI Target. Better just use Linux for this purpose, with the drivers obtained from here (It’s one of my next steps into “shared storage(s) for all”).

Another issue – After much reading through NetApp documentation, I’ve reached the following concepts of the product. Please correct me if you see fit:

The older method was to create a volume (vol create) directly from disks. Either using raid_dp or raid4.

The current method is to create aggregations (aggr create) from disks. Each aggregate consists of raid groups. A raid group (rg) can be made up of up to eight physical disks. Each group of disks (an rg) has one or two parity disks, depending on the type of raid (raid 4 uses one parity, and raid_dp uses “double parity”, as its name can suggest).

Actually, I can assume that each aggregation is formatted using the WAFL filesystem, which leads to the conclusion that modern (flex) volumes are logical “chunks” of this whole WAFL layout. In the past, each volume was a separated WAFL formatted unit, and each size change required adding disks.

This separation of the flex volume from the aggregation suggests to me the possibility of multiple-root capable WAFL. It can explain the lack of requirement for a continuous space on the aggregation. This eases the space management, and allows for fast and easy “cloning” of volumes.

I believe that the new “clone” method is based on the WAFL built-in snapshot capabilities. Although WAFL Snapshots are supposed to be space conservatives, they require a guaranteed space on the aggregation prior to committing the clone itself. If the aggregation is too crowded, they will fail with the error message “not enough space”. If there is enough for snapshots, but not enough to guarantee a full clone, you’ll get a message saying “space not guaranteed”.

I see the flex volumes as some combination between filesystem (WAFL) and LVM, living together on the same level.

LUNs on NetApp: iSCSI and/or Fibre LUNs are actually managed as a single (per-LUN) large file contained within a volume. This file has special permissions (I was not able to copy it or modify it while it was online and I had root permissions. However, I am rather new to NetApp technology), and it is being exported as a disk outside. Much like an ISO image (which is a large file containing a whole filesystem layout) these files contain a whole disk layout, including partition tables, LVM headers, etc – just like a real disk.

Thinking about it, it’s neither impossible nor very surprising. A disk is no more than a container of data, of blocks, and if you can utilize the required communication protocol used for accessing it and managing its blocks (aka, the transport layer on which filesystem can access the block data), you can, with just a little translation interface, set up a virtual disk which will behave just like any regular disk.

This brings us to the advantages of NetApp’s WAFL – the ability to minimize I/O while maintaining a set of snapshots for the system – a list of per-block modification history. It means you can “snapshot” your LUN, being physically no more than a file on a WAFL-based volume, and you can go back with your data to a previous date – an hour, a day, a week. Time travel for your data.

There are, unfortunately, some major side effects. If you’ve read the WAFL description from NetApp, my summary will be inaccurate at best. If you haven’t, it will be enough, but still you are most encouraged to read it. The idea is that this filesystem is made out of multi-layers of pointers, and of blocks. A pointer can point to more than one block. When you commit a snapshot, you do not change the pointers, you do not move data, you just modify the set of pointers. When there is any change in the data (meaning a block is changed), the pointer points to the alternate block instead of the previous (historical) block, but keeps reference of the older block’s location. This way, only modified blocks are actually recreated, while any unmodified data remains on the same spot on the physical disk. An additional claim of NetApp is that their WAFL is optimized for the raid4 and raid_dp they use, and utilizes it in a smart manner.

The problem with WAFL, as can be easily seen, is fragmentation. For CIFS and NFS, it does not cause much of a problem, as the system is very capable of read-ahead just to solve this issue. However, A LUN (which is supposed to act as a continuous layout, just like any hard-drive or raid-array in the world and on which various file-system related operations occur) gets fragmented.

Unlike CIFS or NFS, LUN read-ahead is harder to predict, as the client tries to do just the same. Unlike real disks, NetApp LUNs do not behave, performance-wise, like the hard-drive layout any DB or FS has learned to expect and was best optimized for. It means, for my example, that on a DB with lots of small changes, that the DB itself would have tried to commit changes in large write operations, committed every so and so interval, and would thrive to commit them as close to each other, as continuous as possible. On NetApp LUN this will cause fragmentation, and will result in lower write (and later read) performance.

That’s all for today.

Hitachi HW100 limitations

Sunday, March 19th, 2006

Not too long ago I have purchased a brand new Hitachi HW100 Workgroup Storage. 14+1 400GB SATA disks, dual-interface, each holding two fiber ports (2Gb/s LC). A nice machine. However, two limitations I have discovered are clouding my day:

1. It supports only Raid0 and Raid5. It will not allow me no-parity Raid, aka, Raid1. It is a pity, because my storage is meant for the lab, and not for production, and I wish to decide my own Raid level. That is why it’s 14+1 disks, and not 15. You must have a spare disk.

2. While not connected directly to hosts (Private Loop), only one port out of two ports in an interface works. It means I bought 4 ports, and actually got two. Why would I need four? I need to simulate multi-path, load balancing and utilize maximum flexibility when connecting this storage to my fiber network. After all, this network is all lab, no production network, and I need to maximize my output, with as little redundancy as possible.

These two limitations are somewhat hidden, and, especially the second one, was discovered only after I’ve questioned HDS’ support person. Pity. I will want to return it, and get a more capable one. I need to see what I can do with it.

Dell PowerEdge 1800 and Linux – Part 1

Tuesday, September 27th, 2005

As part of my voluntary actions, I manage and support Israeli Amateur Radio Committee Internet server. This machine is an old poor machine, custom made about 5-7 years ago, containing Pentium2 300MHz, and 256MB RAM. It serves few hundreds of users, and you can guess for yourself how slow and miserable this machine is.

long wait, the committee has decided to purchase a new server. This server, Dell PE 1800, has landed in my own personal lab just two days ago. It’s a nice machine, cheap, considering its abilities, and it’s waiting just to be installed. Or so it was, just up to now.

Mind you that brand PC servers containing more than one CPU can cost around
3K$ and above. This baby has come with a minimal, yet scalable setup, containing only one CPU out of two, 1GB RAM, our of 8GB possible max, and two SCSI HotSwap HDDs, using 2 out of 6 slots. Nice machine. And it was cheap too. Couldn’t complain about it.

At first, I’ve tried using Dell’s CDs. The "Server Setup" CD is supposed to help me prepare the machine to OS installation, either it be Windows, Linux, Novell, etc. I’ve tried using it, preparing it to a new Centos install, when I’ve noticed it didn’t partition quite as I’ve expected. Well, the "Server Setup" tool has decided I would not use Mirrors, and that I would not use LVM, but would use a predefined permanent setup, and that’s all. This machine did not come with a RAID controller, so I’ve had to configure Software RAID. What better time is there than during the install? Dell’s people think otherwise, so I’ve had to boot into a bootable media of Centos 4.1 (my whole installation tree resides on NFS share). The installation was smooth, and worked just like expected. Fast, sleek, smooth. All I’ve ever expected out of Linux installation on a server class PC. Just like it should have been.

I’ve partitioned the system using the following guidelines:

1) Software mirror Raid /dev/md0, containing /boot (150MB)

2) Two stand alone SWAP partitions, 1GB each, one on each HDD. I do not need mirror for the SWAP.

3) Software mirror Raid /dev/md1, containing LVM, expanding all over what’s left of the disk.

4) Logical Volumes "rootvol" (5GB) holding / and "varvol" (6GB) holding /var. Both can be expanded, so I don’t need to worry now about their final sizes.

As said, the installation went great. However, I was not able to boot the system… I just got each time to a hidden maintenance system partition, and it seems my GRUB failed to install itself. Darn.

I’ve booted into rescue mode, and tried to install GRUB manually. Failed. I think (and it’s not the first time I’ve had such problems with GRUB) GRUB is not as good as everyone say it is. It can’t boot into software mirror, and it means it’s not ready for production, as far as I are.

I’ve used YUM to download and install Lilo, and managed easily to convert /etc/lilo.conf.anaconda to
the correct file for Lilo (/etc/lilo.conf), and to run it. Worked great, and the system was able to boot.