Archive for October, 2006

Upgrading Ubuntu 6.06 to 6.10 with software non-standard RAID configuraion

Sunday, October 29th, 2006

A copy of a post made in Ubuntu forms, with the hope it would help others in status such as I’ve been in. Available through here

In afterthought, I think the header should be non-common, rahter than non-standard…

Hi all.

I have passed through this forum yesterday, searching for a possible cause for a problem I have had.
The theme was almost well known – after upgrade from 6.06 to 6.10 the system won’t boot. Lilo (I use Lilo, not Grub) would load the kernel, and after a while, never reachine "init" the system would wait for something, I didn’t know what.

I tried disconnecting USB, setting different boot parameters, even virified (using the kernel messages) that disks didn’t replace their locations (and they did not, although I have additional IDE controller). Alas, it didn’t seem good.

The weird thing is that the during this wait (there was keyboard response, but the system didn’t move on…), the disk LED flashed in a fixed rate. I though it might have to do with RAID rebuild, however, from live-cd, there was no signs of such a case.

Then, on one of the "live-cd" boots, accessing the system via "chroot" again, I have decied to open the initrd used by the system, in an effort to dig into the problem.
Using the following set of commands did it:
cp /boot/initrd.img-2.6.17-10-generic /tmp/initrd.gz
cd /tmp && mkdir initrd.out && cd initrd.out
gzip -dc ../initrd.gz | cpio -od

Following this, I’ve had the initrd image extracted, and I could look into it. Just to be on the safe side, I have looked into the image’s /etc, and found there mdadm/mdadm.conf file.
This file had different (wrong!) UUIDs for my software RAID setup (compared with "mdadm –detail /dev/md0" for md0, etc).
I have located the origin of this file to be the system’s real /etc/mdadm/mdadm.conf, which was originated a while ago, before I’ve made many manual modifications (changed disks, destroyed some of the md devices, etc). I have fixed the system’s real /etc/mdadm/mdadm.conf file to reflect the correct system, and recreated the initrd.img file for the system (now with the up-to-date mdadm.conf). Updated Lilo, and the system was able to boot correctly this time.

The funny thing is that even using the previous kernel, which had its initrd.img built long ago, and which worked fine for a long while failed to complete the boot process altogether using the upgraded system.

My system relevant details:

/dev/hda+/dev/hdc -> /dev/md2 (/boot), /dev/md3

/dev/hde+/dev/hdg -> /dev/md1
/dev/md1+/dev/md1 -> LVM2, including the / on it

Lilo is installed on /dev/hda and /dev/hdc altogether.

Single-Node Linux Heartbeat Cluster with DRBD on Centos

Monday, October 23rd, 2006

The trick is simple, and many of those who deal with HA cluster get at least once to such a setup – have HA cluster without HA.

Yep. Single node, just to make sure you know how to get this system to play.

I have just completed it with Linux Heartbeat, and wish to share the example of a setup single-node cluster, with DRBD.

First – get the packages.

It took me some time, but following Linux-HA suggested download link (funny enough, it was the last place I’ve searched for it) gave me exactly what I needed. I have downloaded the following RPMS:

heartbeat-2.0.7-1.c4.i386.rpm

heartbeat-ldirectord-2.0.7-1.c4.i386.rpm

heartbeat-pils-2.0.7-1.c4.i386.rpm

heartbeat-stonith-2.0.7-1.c4.i386.rpm

perl-Mail-POP3Client-2.17-1.c4.noarch.rpm

perl-MailTools-1.74-1.c4.noarch.rpm

perl-Net-IMAP-Simple-1.16-1.c4.noarch.rpm

perl-Net-IMAP-Simple-SSL-1.3-1.c4.noarch.rpm

I was required to add up the following RPMS:

perl-IO-Socket-SSL-1.01-1.c4.noarch.rpm

perl-Net-SSLeay-1.25-3.rf.i386.rpm

perl-TimeDate-1.16-1.c4.noarch.rpm

I have added DRBD RPMS, obtained from YUM:

drbd-0.7.21-1.c4.i386.rpm

kernel-module-drbd-2.6.9-42.EL-0.7.21-1.c4.i686.rpm (Note: Make sure the module version fits your kernel!)

As soon as I finished searching for dependent RPMS, I was able to install them all in one go, and so I did.

Configuring DRBD:

DRBD was a tricky setup. It would not accept missing destination node, and would require me to actually lie. My /etc/drbd.conf looks as follows (thanks to the great assistance of linux-ha.org):

resource web {
protocol C;
incon-degr-cmd “echo ‘!DRBD! pri on incon-degr’ | wall ; sleep 60 ; halt -f”; #Replace later with halt -f
startup { wfc-timeout 0; degr-wfc-timeout 120; }
disk { on-io-error detach; } # or panic, …
syncer {
group 0;
rate 80M; #1Gb/s network!
}
on p800old {
device /dev/drbd0;
disk /dev/VolGroup00/drbd-src;
address 1.2.3.4:7788; #eth0 network address!
meta-disk /dev/VolGroup00/drbd-meta[0];
}
on node2 {
device /dev/drbd0;
disk /dev/sda1;
address 192.168.99.2:7788; #eth0 network address!
meta-disk /dev/sdb1[0];
}
}

I have had two major problems with this setup:

1. I had no second node, so I left this “default” as the 2nd node. I never did expect to use it.

2. I had no free space (non-partitioned space) on my disk. Lucky enough, I tend to install Centos/RH using the installation defaults unless some special need arises, so using the power of the LVM, I have disabled swap (swapoff -a), decreased its size (lvresize -L -500M /dev/VolGroup00/LogVol01), created two logical volumes for DRBD meta and source (lvcreate -n drbd-meta -L +128M VolGroup00 && lvcreate -n drbd-src -L +300M VolGroup00), reformatted the swap (mkswap /dev/VolGroup00/LogVol01), activated the swap (swapon -a) and formatted /dev/VolGroup00/drbd-src (mke2fs -j /dev/VolGroup00/drbd-src). Thus I have now additional two volumes (the required minimum) and can operate this setup.

Solving the space issue, I had to start DRBD for the first time. Per Linux-HA DRBD Manual, it had to be done by running the following commands:

modprobe drbd

drbdadm up all

drbdadm — –do-what-I-say primary all

This has brought the DRBD up for the first time. Now I had to turn it off, and concentrate on Heartbeat:

drbdadm secondary all

Heartbeat settings were as follow:

/etc/ha.d/ha.cf:

use_logd on #?Or should it be used?
udpport 694
keepalive 1 # 1 second
deadtime 10
initdead 120
bcast eth0
node p800old #`uname -n` name
crm yes
auto_failback off #?Or no
compression bz2
compression_threshold 2

I have also created a relevant /etc/ha.d/haresources, although I’ve never used it (this file has no importance when using “crm=yes” in ha.cf). I did, however, use it as a source for /usr/lib/heartbeat/haresources2cib.py:

p800old IPaddr::1.2.3.10/8/1.255.255.255 drbddisk::web Filesystem::/dev/drbd0::/mnt::ext3 httpd

It is clear that the virtual IP will be 1.2.3.10 in my class A network, and DRBD would have to go up before mounting the storage. After all this, the application would kick in, and would bring up my web page. The application, Apache, was modified beforehand to use the IP 1.2.3.10:80, and to search for DocumentRoot in /mnt

Running /usr/lib/heartbeat/haresources2cib.py on the file (no need to redirect output, as it is already directed to /var/lib/heartbeat/crm/cib.xml), and I was ready to go.

/etc/init.d/heartbeat start (while another terminal is open with tail -f /var/log/messages), and Heartbeat is up. It took it few minutes to kick the resources up, however, I was more than happy to see it all work. Cool.

The logic is quite simple, the idea is very basic, and as long as the system is being managed correctly, there is no reason for it to get to a dangerous state. Moreover, since we’re using DRBD, Split Brain cannot actually endanger the data, so we get compensated for the price we might pay, performance-wise, on a real two-node HA environment following these same guidelines.

I cannot express my gratitude to http://www.linux-ha.org, which is the source of all this (adding up with some common sense). Their documents are more than required to setup a full working HA environment.

Poor Man’s DRP + Snapshots – Linux only

Friday, October 6th, 2006

When you own a data storage, one of your major considerations is how to backup your data. Several solutions exist to answer this question.

When your data grows to a certain size, you encounter an additional issues – How to backup the data with minimum performance impact.

It is quite obvious that backup devices has a specific speed and performance. It is quite obvious that is you have more data than you can stream into your tape deviced during night, your backup would probably continue during working hours.

Several solutions exist to deal with this problem, amongst you can find the solution of faster backup tapes, broader bandwidth between your storage container and your backup devices. The issue I will demonstrate has to do with a third option – create a real-time replica on another server, and backup the replica only.

When it comes to Linux, I’ve always felt that the backup/restore software companies were rather slow to supply solutions fit for Linux, especailly compared to the widening usage of Linux-based systems in the market.

One of the more intriging solutions which grew in the OpenSource community is called DRBD – Distributed Redundant Block Device. It allows the creation of a logical block device which overlayes two physical block devices – one local and one remotely accessible via network. It can be easilly described as network Raid-1 solution.

The wonders of real-time volume replica between two servers should not be discussed here. The advantages are well known, as are the disadvantages, of which the largest one is the heavy performance toll on such a system.

The wonders of snapshots are also well known. NetApp gains its main capital based on their sophisticated snapshot technology (WAFL, etc). Other storage vendors have added the abilities to take snapshots with higher or lower effeciancy, however, one of the newer players in this under-the-spotlight area is the OpenSource LVM2 for Linux, with its snapshot capabilities. Although still not perfect, it does show a promise I will soon demonstrate, combined with DRBD, described above.

The combined wonders of volume replication together with scheduled snapshots can offer the ability to execute backup of consistant snapshot data, the ability to get back to a desired volume’s point-in-time and the power to reduce the load of backing up on mission-critical datacenters. All these, at the price of internet connection which will allow you to download the latest DRBD software.

I have tested it on a home-made setup – Two Virtual Linux server running on a single VMware-Server machine.

The host is Pentium4 1.8GHz, with 1GB RDRAM, and three IDE harddrives, running Centos 4.4

The guests are two Centos 4.4 machines, with 160MB RAM each, two virtual NICs – one public and one private, minimal installation, and Dag Wieers‘ YUM repositories added to them.

The guest will be called DRBD-test1 and DRBD-test2. The first will act as the mission-critical server, and the second will be the replica (target) server.

Both guests were updated to the latest updates available at this time. Both are running kernel version 2.6.9-42.0.2.EL, DRBD version 0.7.21-1.c4, and kernel-module-drbd-2.6.9-42.EL-0.7.21-1.c4

Installing the kernel-module package put the drbd.ko modules in /lib/modules/2.6.9-42.EL instead of my running kernel (2.6.9-42.0.2.EL), so after verifying that the modules were able to load into my running kernel, I have moved them to the kernel/drivers/block directory inside the modules tree, and run ‘depmod -a‘.

I decided to use a consistant configuraion, and defined the storage to replicate in a similar manner:

On /dev/sdb I’ve created PV (pvcreate /dev/sdb). Assigned this PV to VG named vg00, and created two LVs on it: meta (256MB) and source (2GB) on the guest acting as the mission critical server, and meta (256MB) and target (2GB) on the one acting as replica.

I have created the device /dev/drbd0, per DRBD’s Howto, built the configuration file drbd.conf, and loaded the modules.

Forced the Source guest to act as the primary, and replication began.

When replication has finished, I have created a snapshot of the LV target and mounted it correctly: "lvcreate -L 200M -s -n snap /dev/vg00/target && mount /dev/vg00/snap /mnt"

I was able to access the data inside the volume, without changing the Primary/Secondary order of the servers. I have created a script which used DD to stress the I/O of the DRBD volume on the source server, and created a script which took scheduled (every minute) snapshots of the target volume. I have learned the following:

1. It works, but

2. The size limitaion forced on snapshot (200MB in my case) should never be filled up. When running DD on the source volume (creating 50MB empty files), the space consumed by the snapshots increases, and if/when a snapshot exceeds the 100% utilization, it is inaccessible anymore. To view the current usage of a snapshot, run "lvdisplay /dev/vg00/snap" (in my example).

During that evaluation, one of my virtual server crushed, due to LVM2 snapshot problem. LVM2 is not yet perfect on RH based systems…

Performance on another time. I wan’t too happy with it, however on this experiment my goal was to find out if such a setup can be built rather than to measure the performance impact.

Generally speaking – I was rather happy with the results – It showed that this setup can actually work. It proved to me again that OSS innovations elevate Linux to the enterprise.

Now that I know that such a setup can be done, all left to do is to fine-tune it to minimum performance impact, and test again to see if it can actually be a well-suited solution for the questions I’ve started with.

Converting crt to PEM

Wednesday, October 4th, 2006

Took two steps:

openssl x509 -in input.crt -out input.der -outform DER

openssl x509 -in input.der -inform DER -out output.pem -outform PEM

IBM FastT200 serial console wiring diagram

Tuesday, October 3rd, 2006

Well, not exactly a digram, but I will describe the IBM FastT200 RJ25 to DE9 wiring. This inforamtion was given to me by a friend, and I both hope it to be accurate, and hope it would help whoever needs it.

RJ25       DE9

1       8

2       7

3       3

4       2

5       5

Another friend noted that the diagram is upside-down, which might be the case. It might be that you need to reverse either one, or both of the lists. However, even this to begin with might be better than none.