Posts Tagged ‘Lilo’

Preperation of recovery server for RPM based systems

Monday, July 23rd, 2007

On most cases, when preparing a recovery server, you can just ‘tar’ the entire server’s contents and just move it, along with a short recipe on how to rebuild the original partition layout (software Raid? LVM? flat partition tables?), how to mount volumes in-place, how to extract the tar files into the right locations, and how to install your favorite boot loader, either Lilo or Grub. Also, beforehand, you deal with taking a nice snapshot (or capturing the system in a single-user phase), and life is good, yada yada yada.

Most of us, however, never prepare for a rainy day. It’s not that we don’t want to, it’s not that we don’t plan, it’s just that we never seem to get to it, and after all, the hardware is rather new, and there should be no reason for failure. I can guess some of you heard this before – maybe in their own voices.

So, backup is a tiring job, and I will not deal with the things you need to do to maintain a replica of your data, but I will deal with how to prepare, quickly and easily, a system-recovery server (or a postmortem server) with just a little thought beforehand. This might be a bit too late for you if you’re reading it now, but, well, for the next time…

This little trick worked (as part of a large-scale process) when I migrated a server from 64bit server to a 32bit server (yeah, I know – the other way around).

It assumes you use RPM as your tool to install applications, and that if you do not, you have a method of knowing which piece of software you installed from source, and which package was installed from an external source (not your day-to-day RPM repository).

On the source server, run ‘rpm -qa > /tmp/rpmlist.long‘. Keep this file. It is important. Also, try to keep your yum.repos.d directory, or at least know which rpm repositories you use (I always use rpmforge, so I see no problem with that).

Install your target server – Same version as the source, minimal package selection. Copy the file rpmlist.long to /tmp. Make sure yum is configured (I will deal here with YUM, but you can replace it with any other repository client of your choice). Run the two following lines:

cat /tmp/rpmlist.long | sed s/-[0-9].*$/”/g > /tmp/rpmlist-short

for i in `cat /tmp/rpmlist-short`; do

yum install -y $i

done

This will add the missing RPMs with their dependencies, and will bring your system to a similar status. At least, this is a good place to start recovering.

On future chapters:

- Fully migrating from 64 to 32 bit and vice versa

- Using LVM snapshots for a smart backup, and for a smart recovery

DL140 Generation 1 (G1) and LILO do not work

Wednesday, February 14th, 2007

I add to this blog all pieces of information which I might think could help other people.

One of the things I have encountered short while ago had to do with DL140 G1 system cannot boot Linux.

It’s Linux system (RedHat 4 32bit) was deployed by an external system and the system could not boot. However, when installed from CD, the system booted just fine. The symptom for this was that after booting, the screen showed:

LILO: Loading Linux…..

and that’s all. The system could have booted (and did so once) after several days, however, this is not really a desired status.

It seems to be an issue of Lilo and this specific hardware. Other systems (non DL140) were able to boot just fine using Lilo, and this same kernel version was bootable on that system through other means.

Replacing Lilo with GRUB during installation/deployment solved the isse. FYI.

Upgrading Ubuntu 6.06 to 6.10 with software non-standard RAID configuraion

Sunday, October 29th, 2006

A copy of a post made in Ubuntu forms, with the hope it would help others in status such as I’ve been in. Available through here

In afterthought, I think the header should be non-common, rahter than non-standard…

Hi all.

I have passed through this forum yesterday, searching for a possible cause for a problem I have had.
The theme was almost well known – after upgrade from 6.06 to 6.10 the system won’t boot. Lilo (I use Lilo, not Grub) would load the kernel, and after a while, never reachine "init" the system would wait for something, I didn’t know what.

I tried disconnecting USB, setting different boot parameters, even virified (using the kernel messages) that disks didn’t replace their locations (and they did not, although I have additional IDE controller). Alas, it didn’t seem good.

The weird thing is that the during this wait (there was keyboard response, but the system didn’t move on…), the disk LED flashed in a fixed rate. I though it might have to do with RAID rebuild, however, from live-cd, there was no signs of such a case.

Then, on one of the "live-cd" boots, accessing the system via "chroot" again, I have decied to open the initrd used by the system, in an effort to dig into the problem.
Using the following set of commands did it:
cp /boot/initrd.img-2.6.17-10-generic /tmp/initrd.gz
cd /tmp && mkdir initrd.out && cd initrd.out
gzip -dc ../initrd.gz | cpio -od

Following this, I’ve had the initrd image extracted, and I could look into it. Just to be on the safe side, I have looked into the image’s /etc, and found there mdadm/mdadm.conf file.
This file had different (wrong!) UUIDs for my software RAID setup (compared with "mdadm –detail /dev/md0" for md0, etc).
I have located the origin of this file to be the system’s real /etc/mdadm/mdadm.conf, which was originated a while ago, before I’ve made many manual modifications (changed disks, destroyed some of the md devices, etc). I have fixed the system’s real /etc/mdadm/mdadm.conf file to reflect the correct system, and recreated the initrd.img file for the system (now with the up-to-date mdadm.conf). Updated Lilo, and the system was able to boot correctly this time.

The funny thing is that even using the previous kernel, which had its initrd.img built long ago, and which worked fine for a long while failed to complete the boot process altogether using the upgraded system.

My system relevant details:

/dev/hda+/dev/hdc -> /dev/md2 (/boot), /dev/md3

/dev/hde+/dev/hdg -> /dev/md1
/dev/md1+/dev/md1 -> LVM2, including the / on it

Lilo is installed on /dev/hda and /dev/hdc altogether.

Why I don’t like GRUB – RHEL4 on system with local and external storage

Sunday, July 16th, 2006

Installed RHEL4 on a system with both internal storage (HP SmartAray 5i – cciss) and external disks through Qlogic FCS HBA.

During install, the local disks were detected as /dev/cciss/c0d0 while the external disks were detected as /dev/sda and /dev/sdb

After installation was done, Grub started with incorrect mapping. For no apparent reason, Grub searched for its stage2 and date in /dev/sda1 and not in /dev/cciss/c0d0p1.

The quickest way for me to solve it was to replace grub with Lilo (available on the fourth RHEL CD), correct /etc/lilo.conf.anaconda, copy this file to /etc/lilo.conf and run Lilo (with "-v" flag, for safety). It worked like a charm.

HP ML110 G3 and Linux Centos 4.3 / RHEL 4 Update 3

Tuesday, May 30th, 2006

Using the same installation server as before, my laptop, I was able to install Linux Centos 4.3, with the addition of HP’s drivers for Adaptec SATA raid controller, on my new HP ML110 G3.

Using just the same method as before, when I’ve installed Centos 4.3 on IBM x306, but with HP drivers, I was able to do the job easily.

To remind you the process of preparing the setup:

(A note – When I say "replace it with it" I always recommend you keep the older one aside for rainy days)

1. Obtain the floppy image of the drivers, and put it somewhere accessible, such as some easily accessible NFS share.

2. Obtain the PXE image of the kernel of Centos4.1 or RHEL 4 Update 1, and replace your PXE kernel with it (downgrade it)

3. Prepare the driver’s RPM and Centos 4.1 / RHEL 4 Update 1 kernel RPM handy on your NFS share.

4. Do the same for the PXE initrd.img file.

5. Obtain the /Centos/base/stage2.img file from Centos 4.1 or RHEL 4 Update 1 (depends on the installation distribution, of course), and replace your existing one with it.

6. I assume your installation media is actually NFS, so your boot command should be something like: linux dd=nfs:NAME_OF_SERVER:/path/to/NFS/Directory

Should and would work like charm. Notice you need to use the 64bit kernel with the 64bit driver, and same for the 32bit. Won’t work otherwise, of course.

After you’ve finished the installation, *before the reboot*, press Ctrl+Alt+F2 to switch to text console, and do the following:

1. Copy your kernel RPM to the new system /root directory: cp /mnt/source/prepared_dir/kernel….rpm /mnt/sysimage/root/

2. Do the same for HP drivers RPM

3. Chroot into the new system: chroot /mnt/sysimage

4. Install (with –force if required, but *never* try it first) the RPMs you’ve put in /root. First the kernel and then HP driver.

5. HP Driver RPM will fail the post install. It’s OK. rename /boot/initrd-2.6.9-11.ELsmp (or non SMP, depends on your installed kernel)

6. Verify you have alias for the new storage device in your /etc/modprobe.conf

7. run mkinitrd /boot/initrd-2.6.9-11.ELsmp 2.6.9-11.ELsmp (or non SMP, depending on your kernel)

8. Edit manually your /etc/grub.conf to your needs.

Note – I do not like Grub. Actually, I find it lacking in many ways, so I install Lilo from the i386 (not the 64bit, since it’s not there) version of the distro. Later on, you can rename /etc/lilo.conf.anaconda to /etc/lilo.conf, and work with it. Don’t forget to run /sbin/lilo after changes to this file.

Dell PowerEdge 1800 and Linux – Part 2

Saturday, October 1st, 2005

In this part – Me installing Centos 4.1 64bit, the 86_x64 version.

Installation and boot from net (another chapter, soon to come), all works great. I’m using the same partition table / Raid / LVM I’ve created in my previous
install on this server. All looks great, and I expect to find the same GRUB problem. I’m not disappointed.

However – a tricky part! This time, in yum repositories, no Lilo! This time I have to think of something, and finally I reach a simple decision – If I can’t get a 64bit Lilo version, I would use the 32bit version. I obtain it from my Centos4.1 32bit installation storage, install it (worked like a charm), and reconf / run it. It works like charm. The system is up and running now.

Running "dd if=/dev/sda of=/dev/null bs=1M" gave me results around the 70MB/sec. Nice… :-)

Next to do: Install ISPMan on this system, and migrate whole lot of user accounts directly into it.

Dell PowerEdge 1800 and Linux – Part 1

Tuesday, September 27th, 2005

As part of my voluntary actions, I manage and support Israeli Amateur Radio Committee Internet server. This machine is an old poor machine, custom made about 5-7 years ago, containing Pentium2 300MHz, and 256MB RAM. It serves few hundreds of users, and you can guess for yourself how slow and miserable this machine is.

After
long wait, the committee has decided to purchase a new server. This server, Dell PE 1800, has landed in my own personal lab just two days ago. It’s a nice machine, cheap, considering its abilities, and it’s waiting just to be installed. Or so it was, just up to now.

Mind you that brand PC servers containing more than one CPU can cost around
3K$ and above. This baby has come with a minimal, yet scalable setup, containing only one CPU out of two, 1GB RAM, our of 8GB possible max, and two SCSI HotSwap HDDs, using 2 out of 6 slots. Nice machine. And it was cheap too. Couldn’t complain about it.

At first, I’ve tried using Dell’s CDs. The "Server Setup" CD is supposed to help me prepare the machine to OS installation, either it be Windows, Linux, Novell, etc. I’ve tried using it, preparing it to a new Centos install, when I’ve noticed it didn’t partition quite as I’ve expected. Well, the "Server Setup" tool has decided I would not use Mirrors, and that I would not use LVM, but would use a predefined permanent setup, and that’s all. This machine did not come with a RAID controller, so I’ve had to configure Software RAID. What better time is there than during the install? Dell’s people think otherwise, so I’ve had to boot into a bootable media of Centos 4.1 (my whole installation tree resides on NFS share). The installation was smooth, and worked just like expected. Fast, sleek, smooth. All I’ve ever expected out of Linux installation on a server class PC. Just like it should have been.

I’ve partitioned the system using the following guidelines:

1) Software mirror Raid /dev/md0, containing /boot (150MB)

2) Two stand alone SWAP partitions, 1GB each, one on each HDD. I do not need mirror for the SWAP.

3) Software mirror Raid /dev/md1, containing LVM, expanding all over what’s left of the disk.

4) Logical Volumes "rootvol" (5GB) holding / and "varvol" (6GB) holding /var. Both can be expanded, so I don’t need to worry now about their final sizes.

As said, the installation went great. However, I was not able to boot the system… I just got each time to a hidden maintenance system partition, and it seems my GRUB failed to install itself. Darn.

I’ve booted into rescue mode, and tried to install GRUB manually. Failed. I think (and it’s not the first time I’ve had such problems with GRUB) GRUB is not as good as everyone say it is. It can’t boot into software mirror, and it means it’s not ready for production, as far as I are.

I’ve used YUM to download and install Lilo, and managed easily to convert /etc/lilo.conf.anaconda to
the correct file for Lilo (/etc/lilo.conf), and to run it. Worked great, and the system was able to boot.