Archive for February, 2006

A bug in restore in Centos4.1 and probably RHEL 4 update 1

Sunday, February 26th, 2006

I’ve been to Hostopia today. The land of hosting servers. I’ve had an emergency job on one Linux server, due to a mistake I’ve made. It appears that the performance hindrance of using raid0 instead of raid1 (Centos/RH default raid setup is raid0 and not raid1, which led me to this mistake) for the root partition is terrible.

I tend to setup servers in the following way:

Small (100MB) raid1 partition (/dev/sda1 and /dev/sdb1, usually) for /boot.

Two separated partitions for swap (/dev/sda2 and /dev/sdb2), each just half the required total swap.

One large raid1 (/dev/sda3 and /dev/sdb3) containing LVM, which, in turn, holds the “/” and the rest of the data partitions, if required.

In this specific case, I’ve made a mistake and was not aware of it on time. I’ve setup the large LVM over a stripe (raid0) by mistake. I’ve had degraded performance on the server, and all disk access were slow. Very slow. Since it is impossible to break such a raid array without loosing data, I’ve had to backup the data currently there, and make sure I would be able to restore it. It’s an old habit of mine to use dump and restore. Both ends of the procedure worked so far perfectly, on all *nix operating systems I’ve had experience with. I’ve dumped the data, using one of the swap partitions as a container (formatted as ext3, of course), and was ready to continue.

I’ve reached the server farm, where all hosting servers stood in long rows (I’m so sorry I did not take a picture. Some of those so called “servers” had color leds in their fans!), and got busy on this specific server. Had to backup all from the start, as it failed to complete before (and this time, I’ve done so to my laptop through the 2nd NIC), and then I’ve booted into rescue mode, destroyed the LVM, destroyed the raid (md device), and recreated them. It went fine, except that restore failed to work. The claim was “. is not the root” or something similar. Checking restore via my laptop worked fine, but the server itself failed to work. Eventually, after long waste of time, I’ve installed minimal Centos4.1 setup on the server, and tried to restore through overwrite from within a chroot environment. It failed as well. Same error message. I’ve suddenly decided to check if I’ve had an update to the dump package, and there was. Installing it solved the issue. I was able to restore the volume (using the “u” flag, to overwrite files), and all was fine.

I’ve wasted over an hour over this stupid bug. Pity.

Keeping the static copy of the up-to-date restore binary. Now I will not have these problems again. I hope 🙂

Veritas (Symantec) Cluster Server on Centos 3.6

Wednesday, February 15th, 2006

I’m in the process of installing VCS on two guest (virtual) Linux nodes, each using the following setup:

256MB RAM

One local (Virtual) HDD, SCSI channel 0

Two shared (Virtual) HDDs, SCSI channel 1

Two NICs, one bridged, and one using /dev/vmnet2, a personal Virtual switch.

The host (carrier) is a Pentium 3 800MHz, with 630MB RAM. I don’t expect great performance out of it, but I expect a slow working testing environment.

Common mistake – Never forget to change your /etc/redhat-release to state you are running a RedHat Enterprise system. Failure to do so will result in failure to install VRTSllt, which will force you to either install it manually after you’ve fixed the file, or remove and reinstall all. In my case, Centos 3.6 (equivalent to RedHat Enterprise Server 3 Update 6) the file /etc/redhat-release should have contained the string:

Red Hat Enterprise Linux AS release 3 (Taroon)

Veritas has advanced a great deal in the last few years, regarding ease of installation, even on multiple servers. Installing Cluster software usually involves more than one server, and installing it all, from a single point, is an advancement. My two nodes are named vcs01-rhel and vcs02-rhel, and the installation was done completely from vcs01-rhel.

The installer assumes you can login using ssh (by default) without password prompt from one node to the other. In my case, it wasn’t true. I’ve found it quicker (and dirty, mind you!) to allow, for the sake of the installation and configuration process, to utilize rsh. It’s not safe, it’s not good, but if it’s just for the short and limited time required for the installation, I’de hack it so it would work. How did I do it?

One node vcs02-rhel I’ve installed (using yum, of course) the package rsh-server. The syntax is yum install rsh-server. Afterwards, I’ve changed its relevant xinetd file, /etc/xinetd.d/rsh to set the flag "disable = no" and restarted xinetd. Following that, I’ve hashed two lines in /etc/pam.d/rsh :

auth required pam_securetty.so

auth required pam_rhosts_auth.so

As said, quick and dirty. It allowed rsh from vcs01-rhel as root, without password. Don’t try it at an unsercure environment, as it actually allows not only vcs01-rhel, but any and every computer on the net full rsh, password free, access to the server. Better think it over, right?

The first thing I’ve done after I’ve finished installing the software, was to undo my pam.d changes, and disable rsh service. Later, I will remove it.

So, we need to run the installer, which can be done by cding to /mnt/cdrom/rhel3_i686 and running ./installer -usersh

I’m asked about all the machines I need to install VCS on, I’m asked if I want to configure the cluster (which I do), and I set a name, a cluster ID (a number between 0 and 255). This Cluster ID is especially important when dealing with few Veritas Clusters running on the same infrastracture. If you have two clusters with the same Cluster ID, you get extra-large cluster, and a mess out of it. Mind it, if you’re ever into few clusters in one network.

I decide to set it up onwards. I decide to enable the web management GUI, and I decide to set IP for the cluster. This IP will be used for the resource group (called ClusterService by default), and will be a resource in it. When/if I have more resource groups, I should consider adding more IP addresses for them. At least one for each. In such a case, the cluster server is serving clients requests without them being aware of any "special" setup with the server, like, for example, it has switched over two times already.

I define heartbit networks. I’ve used eth1 as the private heartbit, and eth0 as both public network and "slow heartbit". I would add later some more virtual NICs to both nodes, and define them to be used as private heartbit as well.

Installing packages – I decide to install all optional packages. It’s not that I’m going to lack space. I did not install, mind you, VVM, because I want to simulate a no-volume-manager-enabled system. Just pure basic simple partitions.

Installation went fine, and I was happy as a puppy.

One thing to note – I wanted to install the Maintanance Pack I have. I was unable to eject my CD. running lsof | grep /mnt/cdrom revealed that CmdServer, some part of VCS, was using the cdrom, probably because, as root, I initiated the service from that location. I shut down vcs service, and started it again from another path, and I was able to eject my CD.

Installing the MP wasn’t that easy. The installer, much smarter this time, has required the package redhat-release which is a mandatory package in RedHat systems, but me, running Centos, had the package centos-release which just wouldn’t do the trick. I’ve decided to rebuild the package centos-release with an internal different name – redhat-release, and to do that, I’ve had to download the srpm of centos-release. You need to change the name and version so in your RPM you’ll have redhat-release-3ES-13.6.2. I’ve done it with this SRPM centos-release.spec file. Replace your centos-release srpm spec file with this one, and you should be just fine. Remove your current centos-release package, and you’ll be able to install your newly built (using rppbuild -bb centos-release.spec) redhat-release RPM (faked, of course). Mind you – it will overwrite your /etc/redhat-release, so you better back it up, just in case. I’ve take precaution, and restored the file to its fake RedHat contents. You can never know…

You could wonder why I haven’t used RHEL itself, but a clone, namely Centos. Although its for home usage only, the ease of updates, availability of packages (using yum) and the fact I do not want to steal software combined together bring me to install Centos for all my home usages. Production environments, however, it will be an official RedHat, I can gurantee that.

So, it’s installing MP2, which means removing some packages, and then installing newer versions. The reason they do not use "upgrade" option of RPM is beyond me, but so their nastiness about redhat-release version. So, if you’ve kept all the rules given here, you’re supposed to have VCS4.0 MP2 installed on your Linux Nodes. Good luck. Our next chapter would be installing and configuring Oracle DataBase on this setup. Stay tuned 🙂

RedHat Cluster, and some more

Sunday, February 12th, 2006

It’s been a long while since I’ve written. I get to have, once a while, a period of time dedicated for laziness. I’ve had just one of these for the last few weeks, in which I’ve been almost completely idle. Usually, waking up from such idle time is a time dedicated to self studies and hard work, so I don’t fight my idle periods too hard. This time, I’ve had the pleasure of testing and playing, for personal reasons, both with VMWare GSX, in a “Cluster-In-a-Box” setup, based on recommendation regarding MSCS, altered for Linux (and later, Veritas Cluster Service) and both with RedHat Cluster Server, with the notion of playing with RedHat’s GFS, but, regrettably, without the last.

First, VMware. In their latest rivalty with Microsoft over the issue of Virtualization of servers and desktops, MS has gained an advantage lately. Due to the lower prices of “Virtual Server 2005”, comparing with “VMware GSX Server”, and due to their excellent marketing system (from which we should all learn, if I may say!), Not a few servers and virtual server farms, especially the ones running Windows/Windows setups, had moved to this MS solution, which is as capable as VMware GSX Server. Judging by the history of such rivalries, MS would have won. They always have. However, VMware, in an excellent move, has announced that the next generation of their GSX, simply called “Server”, would be for free. Free for everyone. In this they probably mean to invest more in their more robust ESX server, and give the GSX as a taste of their abilities. While MS do not have any more advanced product than their Virtual Server, it could mean a death blow to their effort in this direction. It could even mean they will just give away their product! While this will happen, we, the customers, will earn a selection of free, advanced and reliable products designed for virtualization. Could it be any better than that?

One more advantage of this “Virtualization for the People” is that community based virtual images, of even the most complicated to install setups can and would be widely available. Meaning to shorten installation time, and allow for a quick working system for everyone. It will require, however, better knowledge and understanding of the products themselves, as merely installing them will not be enough. To survive the future market, you won’t be able to just sell an installation of a product, but should be able to support an out-of-the-box setup of it. That’s for the freelances, and the partially freelances of us…

So, I’ve reinstalled my GSX, and started playing with it. The original goal was to actually run a working setup of RHEL, VCS and Oracle 10g. Unfortunately, VCS supports only RH3 (update 2?), and not RH4, which was a shame. At that point, I’ve considered using RH Cluster Server for the task at hand. It grew to the task of learning this cluster server, and nothing more, which I did, and I can and would share my concepts about it here.

First – Names – I’ve had the pleasure of working with numerous cluster solutions. I’m thrilled each time I get to play with another cluster solution the naming conventions, and name changes vendors do, just to keep themselves unique. I hate it. So here’s a little explanation:
All clusters contain a group of resources (Resource Group, as most vendors call them). This group contains a set of resources, and in some cases, relations (order of startup, dependencies, etc). Each resource could be any single element required for an application. Example – Resource could be an IP address, which without you won’t be able to contact the application. Resource could be a disk device, containing the application’s data. It could be an application start/stop script, and it could be a sub-application – an application required for the whole group to be up, such as a DB for DB driven web server. The order you would ask them to start would be IP, disk, DB, web server (in our case). You’d ask the IP to be brought up first because some of the cluster servers can trick an IP based clients into some delay, so the client hardly feels the short downtime of application failover. But this is for later. So, in a resource group, we have resources. If we can separate resources into different groups, if they have no required dependency between them, it is always better to do so. In our previous example, lets say our web server uses the DB, but it contacts it using IP address, or using hostname. In this case, we won’t need the DB to run on the same physical machine the web server is using, and in such a case, assuming the physical disk holding the DB and the one holding the rest of the web application are not the same disk, we could separate them.

The idea, if I can try to sum it up, is to split your application to the smallest self-maintained structures. Each structure will be called resource group, and each component in this structure is a resource. On some cluster servers, one could group and set dependencies between resource groups, which allows for even more scalability, but that is not our case.

So we had resource groups containing resources. Each computer, a member in the cluster, is called a node. Now, let’s assume our cluster containing three nodes, but we want our application (our resource group) to be able to run on only two specific? In this case, we need to define, for our resource group, which nodes are to be associated with it. In RH Cluster Server, a thing called “Domain” is designed for it. This Domain containes a list of nodes. This Domain can be associated with Resource Group, and thus set priority of failover, and set the group of nodes allowed to deal with the resource group.

All clusters have a single point of error (unlike failure). The whole purpose of the cluster is to allow for non-cluster-aware application the high-availability you could expect for a (relatively) low price. We’re great – we know how to bring an application up, we know how to bring it down. We can assume when the other node(s) is/are down. We cannot be sure of it. We try. We demand few means of communication, so that one link failure won’t cause us to corrupt our shared volumes (by trying multiple access into them). We set a whole system of logic, a heartbit, just name it, to avoid, at almost all cost, a status of split-head – two cluster nodes believing they are the only ones up. You can guess what it means, right?

In RH, there is a heartbit, sure. However, it is based on bonding, in the event of more than one NIC, and not on separated infrastructures. It is a simple network-based HB, with nothing special about it. In case of loss of connection, it would have reset the inactive node, if it saw fit, using a mechanism they call “Fence”. A “Fence” is a system by which the cluster can *know* for sure (or almost for sure) a node has been down, or the cluster can physically take a node down (poweroff if needs), such as control of the UPS the node is connected to, or its power switch, or alternate monitoring infrastructure, such as the Fibre Channel Switch, etc. In such an event, the cluster can know for sure, or can assume, at least, that the hung node has been reset, or it can force it to reset, to release some hung application.

Naming – Resource group is called Service. Resource remains resource, but an application resource *must* be defined by an rc-like script, which accepts start/stop (/restart?). Nothing complicated to it, really. The service contains all required resources.

I was not happy with the cluster, if I can sum up my issues with it. Monitoring machines (nodes) it did correctly, but in the simple enough example I’ve chosen to setup, using apache as a resource (only afterwards I’ve noticed it to be the example RedHat used in their documentation) it failed miserably to take the correct action when an application failed (unlike a failure of a node). I’ve defined my “Service” to contain the following three items:

1) IP Address – Unique for my testing purposes.

2) Shared partition (in my case, and thanks to VMware, /dev/sdb1, mounted at /var/www/html)

3) The Apache application – “/etc/init.d/httpd”

All in all, it was brought up correctly, and switch-over went just fine, including in a case of correct and incorrect reset of the active/passive node, however, when I’ve killed my apache (killall httpd), the cluster detected failure in the application, but was helpless with it. I was unable to bring down the “Service”, as it failed to turn off Apache (duh!), so it did not release neither the IP address, nor the shared volume. In so doing, I’ve had to restart the service rgmanager on both nodes, after manual removal of the remains of the “Service”. I didn’t like it. I expect the cluster to notice failure in the application, which it did, but I expect it to either try to restart the application (/etc/init.d/httpd stop && /etc/init.d/httpd start) before it fails completely, or to set a flag saying it is down, remove the remains of the “Service” from the node in question (release the IP address and the shared storage), and try to bring it up on the other node(s). It did nothing of the likes. It just failed, completely, and required manual intervention.

I expect HA-Cluster to be able to react to an application or resource failure, and not just to a node failure. Since HA-Clusters are meant for the non-ideal world, a place where computers crash, where hardware failures occure, and where applications just die, while servers remain working, I expect the Cluster Server to be able to handle the full variety of problems, but maybe i was expecting too much. I believe it to be better in their future versions, and I believe it could have been done quite easily right now, as long as detection of the failed application occurred, which it has, but it’s not for me to define the cluster’s abilities. This cluster is not mature enough for real-life production sites, if and only because of its failure to react correctly to a resource failure, without demanding manual intervention. A year from now, I’ll probably recommend it as a cheap and reliable solution for most common HA-related tasks, but not today.

That leaves me with VCS and Oracle, which I’ll deal with in the future, wether I like it or not 🙂

A long while, and something of interest

Thursday, February 2nd, 2006

Well, more like of delusions. I’ve noticed this comic advertisement, and I’ve just had to put it here online. It’s, I think, the most delusional, drug induced advertisement I’ve seen in a while.

Just open and enjoy