Posts Tagged ‘SCSI’

RHEL 4 / Centos 4 on VMware ESX 2.1

Sunday, July 9th, 2006

While trying to install RHEL 4 on VMware ESX 2.1 I’ve had the pleasure of seeing Linux installation which was unable to detect disks.

A quick search in VMware’s web site resulted in a download page for RHEL 4 Update 2, which, unfortunately, wasn’t the version I was using.

Before running around and searching the nearest download of RHEL 4 Update 2, I’ve noticed that VMware configuration for each virtual machine defines vxmbuslogic SCSI adapter which requires, as it appears, special driver for Linux RHEL 4. However, changing the adapter to vmxlsilogic solved the issue, and required no special Linux driver.

Power Supply failure – the wrong type of failure

Sunday, March 5th, 2006

I’ve lost an external DAS (Direct Attached Storage) today. Not lost as in could not find, but lost due to power supply failure. I’ve been home, got a phone call saying all the company’s Unix storage, which is contained on a DAS including two 72GB HDDs in a mirror (DiskSuite on Solaris8) is not available. Remotely, I could not reach it. I/O Error on each request (ls, cd, etc). I’ve had to reach the place. There I’ve noticed lots of error messages generated by the kernel when trying to access the disks. After lots of games (I will not describe the procedure, but it included replacing external SCSI cable, disconnecting one of the disks, etc), I have replaced the DAS module, and put the older disks in it, making sure I use the same LUN the disks had before (for DiskSuite’s sake). 

Conclusion – Power Supply failure, but not an absolute failure. The lights remained working, and disks could spin-up, but when required to work, the power supply failed to give the disks the whole power capacity the disks required, resulting in read/write errors. Working, but just not quite.

Tracking the problem, and hacking a different SCSI DAS module required almost two hours of my life. I hope never to encounter such a problem again.

Veritas (Symantec) Cluster Server on Centos 3.6

Wednesday, February 15th, 2006

I’m in the process of installing VCS on two guest (virtual) Linux nodes, each using the following setup:

256MB RAM

One local (Virtual) HDD, SCSI channel 0

Two shared (Virtual) HDDs, SCSI channel 1

Two NICs, one bridged, and one using /dev/vmnet2, a personal Virtual switch.

The host (carrier) is a Pentium 3 800MHz, with 630MB RAM. I don’t expect great performance out of it, but I expect a slow working testing environment.

Common mistake – Never forget to change your /etc/redhat-release to state you are running a RedHat Enterprise system. Failure to do so will result in failure to install VRTSllt, which will force you to either install it manually after you’ve fixed the file, or remove and reinstall all. In my case, Centos 3.6 (equivalent to RedHat Enterprise Server 3 Update 6) the file /etc/redhat-release should have contained the string:

Red Hat Enterprise Linux AS release 3 (Taroon)

Veritas has advanced a great deal in the last few years, regarding ease of installation, even on multiple servers. Installing Cluster software usually involves more than one server, and installing it all, from a single point, is an advancement. My two nodes are named vcs01-rhel and vcs02-rhel, and the installation was done completely from vcs01-rhel.

The installer assumes you can login using ssh (by default) without password prompt from one node to the other. In my case, it wasn’t true. I’ve found it quicker (and dirty, mind you!) to allow, for the sake of the installation and configuration process, to utilize rsh. It’s not safe, it’s not good, but if it’s just for the short and limited time required for the installation, I’de hack it so it would work. How did I do it?

One node vcs02-rhel I’ve installed (using yum, of course) the package rsh-server. The syntax is yum install rsh-server. Afterwards, I’ve changed its relevant xinetd file, /etc/xinetd.d/rsh to set the flag "disable = no" and restarted xinetd. Following that, I’ve hashed two lines in /etc/pam.d/rsh :

auth required pam_securetty.so

auth required pam_rhosts_auth.so

As said, quick and dirty. It allowed rsh from vcs01-rhel as root, without password. Don’t try it at an unsercure environment, as it actually allows not only vcs01-rhel, but any and every computer on the net full rsh, password free, access to the server. Better think it over, right?

The first thing I’ve done after I’ve finished installing the software, was to undo my pam.d changes, and disable rsh service. Later, I will remove it.

So, we need to run the installer, which can be done by cding to /mnt/cdrom/rhel3_i686 and running ./installer -usersh

I’m asked about all the machines I need to install VCS on, I’m asked if I want to configure the cluster (which I do), and I set a name, a cluster ID (a number between 0 and 255). This Cluster ID is especially important when dealing with few Veritas Clusters running on the same infrastracture. If you have two clusters with the same Cluster ID, you get extra-large cluster, and a mess out of it. Mind it, if you’re ever into few clusters in one network.

I decide to set it up onwards. I decide to enable the web management GUI, and I decide to set IP for the cluster. This IP will be used for the resource group (called ClusterService by default), and will be a resource in it. When/if I have more resource groups, I should consider adding more IP addresses for them. At least one for each. In such a case, the cluster server is serving clients requests without them being aware of any "special" setup with the server, like, for example, it has switched over two times already.

I define heartbit networks. I’ve used eth1 as the private heartbit, and eth0 as both public network and "slow heartbit". I would add later some more virtual NICs to both nodes, and define them to be used as private heartbit as well.

Installing packages – I decide to install all optional packages. It’s not that I’m going to lack space. I did not install, mind you, VVM, because I want to simulate a no-volume-manager-enabled system. Just pure basic simple partitions.

Installation went fine, and I was happy as a puppy.

One thing to note – I wanted to install the Maintanance Pack I have. I was unable to eject my CD. running lsof | grep /mnt/cdrom revealed that CmdServer, some part of VCS, was using the cdrom, probably because, as root, I initiated the service from that location. I shut down vcs service, and started it again from another path, and I was able to eject my CD.

Installing the MP wasn’t that easy. The installer, much smarter this time, has required the package redhat-release which is a mandatory package in RedHat systems, but me, running Centos, had the package centos-release which just wouldn’t do the trick. I’ve decided to rebuild the package centos-release with an internal different name – redhat-release, and to do that, I’ve had to download the srpm of centos-release. You need to change the name and version so in your RPM you’ll have redhat-release-3ES-13.6.2. I’ve done it with this SRPM centos-release.spec file. Replace your centos-release srpm spec file with this one, and you should be just fine. Remove your current centos-release package, and you’ll be able to install your newly built (using rppbuild -bb centos-release.spec) redhat-release RPM (faked, of course). Mind you – it will overwrite your /etc/redhat-release, so you better back it up, just in case. I’ve take precaution, and restored the file to its fake RedHat contents. You can never know…

You could wonder why I haven’t used RHEL itself, but a clone, namely Centos. Although its for home usage only, the ease of updates, availability of packages (using yum) and the fact I do not want to steal software combined together bring me to install Centos for all my home usages. Production environments, however, it will be an official RedHat, I can gurantee that.

So, it’s installing MP2, which means removing some packages, and then installing newer versions. The reason they do not use "upgrade" option of RPM is beyond me, but so their nastiness about redhat-release version. So, if you’ve kept all the rules given here, you’re supposed to have VCS4.0 MP2 installed on your Linux Nodes. Good luck. Our next chapter would be installing and configuring Oracle DataBase on this setup. Stay tuned :-)

VMware experiance – lots of it

Monday, January 16th, 2006

During the past few days/weeks, I’ve had the pleasure (and will have in the future as well) of playing with VMware ESX (2.5.2) and GSX (3.2.1), as well as Workstation in my long forgotten past, and here I try to describe my own personal impressions of the product.

First – it is a good product. I enjoyed working with it. It is not too complicated, however, it is not documented enough, and finding some solutions for specific problems were not easy and were not made easy by their docuemntations online and their web site.

The GSX I will start with. It is a modern, easilly usable product. It allows to run virtual systems on a running Windows or Linux system, and it allows for remote management of such systems. Good remote GUI (VMware Console), which allows some cool stunts such as installing a guest (virtual, but we’ll keep to VMware’s lingo here) OS directly from your own CDROM, on your own personal desktop. If you don’t get it – Install a Windows server, call it Server1. Install VMware GSX on it, and then run on your desktop the VMware-Console software. Using this software you can define a whole guest system on Server1, control it, and view its "physically attached" keyboard, mouse, screen. So, you can map your own desktop’s CDROM to a guest system on Server1, and install the guest from there. It’s a stunt which allow you never to leave your own chair! It doesn’t exist on the more expensive and advanced ESX, and it’s a pitty.

You can define, using the VMware-Console, or even using a web-based management interface a larger variaty of hardware on a guest system using the GSX than you can using the ESX. The ESX’s console and web interface did not allow for serial ports on a guest. It did not allow for sound, or for USB. So it appears that although the ESX version is more advanced, it is limited comparing to the lesser GSX.

I’ve discovered, during such an effort, that I could manually define a serial port on ESX guest system. I believe other devices can be defined as well, but I wouldn’t want to try that, nor would I be able to do so without a good example of a GSX system’s guest configuration file as an example. I’ve come to a resolution here, and it was working, for the time being.

The ESX version is more like a mainframe style system – it allows for an embedded system slicing and partitioning for consolidation of numerous virtual machines. Lots of buzz-words, but all they mean is that you can have one stronger PC hardware running few virtual configurations (guests), easier to manage, and with better utilization of your actual resources, as physical servers tend to lay idle noticable part of the day in most cases.

It adds in, however, few, more complicated considerations into the soup – if I had 3 servers doing nothing most of the day, but at 4 AM, all of them start to index local files, I couldn’t care less. However, on such consolidated setup, I would care – for better utilization, I would measure the amount of time, or estimated amount of time each require for its own task, and try to spread it better around the clock – this one will start a bit earlier, and that one will start a bit later, so I would not get to hog my system. It brings us to the major problem of such a setup – I/O. Each computers system ever built had problems with its I/O. I/O, and especially disk access, is the slowest mechanism in a computer. You can calculate millions and tens of millions instructions per second, but you would need few minutes to put the results on the disk. You could say that the I/O problem can be identified at two levels:

1) General disk access – Reading and writing to disks is rather slow.

2) Small files – Most files on the average system are small. Very small. Disk layout, as hard as any FS might try, results in random and spread layout, which leads to high seek-time when reading and writing small files, which is, actually, the main occupation of any OS I/O subsystem.

Virtual and consolidated solutions are no different than that. Each virtual OS requires its own share with the physical hardware’s disk I/O, which might lead, in some cases, to poor performance of all guest OSes, just because of disk hog, which, by the way, is the harderst to measure and detect. Moreover, it is the harderst to solve. You can always pour in some more hard-drives, but the host (Container) I/O subsystem remains the same single system, and the load generated by large amounts of small, random reads and writes remains the same. So, unless you use some QoS mechanism, you can get a single machine to hog your entire virtual construction. This is one of the biggest downsides of such consolidation solutions.

With P-Series, by the way, they can allow consolidation of the hardware into few I/O seperated virtual machines (Logical Partitions, or LPAR, as IBM call them. They call everything "Partitions"). VMware ESX supports such a setup as well, but I wonder how well, since it is not really hardware-bound setup (as LPAR is), they manage to prevent negative effects and degrade of performance of one I/O channel on others.

I guess that for low-I/O systems, or for lab usage, ESX could do the trick. You can run a full OS cluster (Windows or Linux) on it, and it will work correctly, and nicely. Unless you’re up to disconnecting physical (or virtual) disks from guest servers, it is a good solution for you.

So, to sum things up, I can say that I enjoy "playing" with VMware products. I enjoy them because they’re innovative, sophisticated, and they look sexy, but I am well aware of the way the market chooses its current solutions, and I am aware of the fact many utilize VMware products for the sake of consolidation and ease of management, without propper consideration or understanding of the well expected performance loss which can be part of it (but does not have to be, if you calculate things correctly). A friend has told me about ESX setup he has encountered, where the had quad-CPU system, with 16GB RAM, running 16 guest OSes, of which MS Exchange, MSSQL2005, MS-SMS, and more, using a single shelve of raid5 based storage, connected via two 2Gb/s fibre connections, setup as failback (only one active link at a time). It was over loaded, and was performing badly. Nice server, though :-)

One last thing about ESX is that it would not install on purely IDE systems. It requires SCSI (and maybe SATA?) for the space holding the guests virtual hard drives.

So, enough about VMware today. I wonder if there’s some easy matrix for "tell me what servers will do, and we’ll calculate I/O, CPU and memory for your future server", instead of the poor way of "I’ve discovered my server is too weak for the task, half a year after deployment", which we see too much of today.

Dell PowerEdge 1800 and Linux – Part 1

Tuesday, September 27th, 2005

As part of my voluntary actions, I manage and support Israeli Amateur Radio Committee Internet server. This machine is an old poor machine, custom made about 5-7 years ago, containing Pentium2 300MHz, and 256MB RAM. It serves few hundreds of users, and you can guess for yourself how slow and miserable this machine is.

After
long wait, the committee has decided to purchase a new server. This server, Dell PE 1800, has landed in my own personal lab just two days ago. It’s a nice machine, cheap, considering its abilities, and it’s waiting just to be installed. Or so it was, just up to now.

Mind you that brand PC servers containing more than one CPU can cost around
3K$ and above. This baby has come with a minimal, yet scalable setup, containing only one CPU out of two, 1GB RAM, our of 8GB possible max, and two SCSI HotSwap HDDs, using 2 out of 6 slots. Nice machine. And it was cheap too. Couldn’t complain about it.

At first, I’ve tried using Dell’s CDs. The "Server Setup" CD is supposed to help me prepare the machine to OS installation, either it be Windows, Linux, Novell, etc. I’ve tried using it, preparing it to a new Centos install, when I’ve noticed it didn’t partition quite as I’ve expected. Well, the "Server Setup" tool has decided I would not use Mirrors, and that I would not use LVM, but would use a predefined permanent setup, and that’s all. This machine did not come with a RAID controller, so I’ve had to configure Software RAID. What better time is there than during the install? Dell’s people think otherwise, so I’ve had to boot into a bootable media of Centos 4.1 (my whole installation tree resides on NFS share). The installation was smooth, and worked just like expected. Fast, sleek, smooth. All I’ve ever expected out of Linux installation on a server class PC. Just like it should have been.

I’ve partitioned the system using the following guidelines:

1) Software mirror Raid /dev/md0, containing /boot (150MB)

2) Two stand alone SWAP partitions, 1GB each, one on each HDD. I do not need mirror for the SWAP.

3) Software mirror Raid /dev/md1, containing LVM, expanding all over what’s left of the disk.

4) Logical Volumes "rootvol" (5GB) holding / and "varvol" (6GB) holding /var. Both can be expanded, so I don’t need to worry now about their final sizes.

As said, the installation went great. However, I was not able to boot the system… I just got each time to a hidden maintenance system partition, and it seems my GRUB failed to install itself. Darn.

I’ve booted into rescue mode, and tried to install GRUB manually. Failed. I think (and it’s not the first time I’ve had such problems with GRUB) GRUB is not as good as everyone say it is. It can’t boot into software mirror, and it means it’s not ready for production, as far as I are.

I’ve used YUM to download and install Lilo, and managed easily to convert /etc/lilo.conf.anaconda to
the correct file for Lilo (/etc/lilo.conf), and to run it. Worked great, and the system was able to boot.