Posts Tagged ‘grub’

XenServer “Internal error: Failure… no loader found”

Saturday, October 24th, 2009

It has been long since I had the time to write here. I have recently been involved more and more with XenServer virtualization, as you might see in the blogs, and following a solution to a rather common problem, I have decided to post it here.

The problem: When attempting to boot a Linux VM on XenServer (5.0 and 5.5), you get the following error message:

Error: Starting VM ‘Cacti’ – Internal error: Failure(“Error from xenguesthelper: caught exception: Failure(\”Subprocess failure: Failure(\\\”xc_dom_linux_build: [2] xc_dom_find_loader: no loader found\\\\n\\\”)\”)”)

This is very common with Linux VMs which were converted from physical (or other, non-PV virtualization) to XenServer.

This will probably either happen during the P2V process, or after a successful update to the Linux VM.

The cause is that the original kernel, non PV-aware one, has not been removed, and GRUB likes to load from it. XenServer will use the GRUB menu, but will not display it to us to select our desired kernel.

With no chance to intervene, XenServer will attempt to load a PV-enabled machine using non-PV kernel, and will fail.

Preventing the problem is quite simple – remove your non-PV kernel (non-xen) so that future updates will not attempt to update it as well and set it to be the default kernel. Very simple.

Solving the problem in less than two minutes is a bit more tricky. Let’s see how to solve it.

All operations are performed from within the control domain. This guide does not apply to StorageLink or NetApp/Equalogic devices, as they behave differently. This applies only to LVM-over-something, whatever it may be.

First, we will need to find the name of the VDI we are to work on. Use xe in the following manner, using the VM’s name:

xe vbd-list vm-name-label=Cacti

uuid ( RO)             : 128f29dc-4a14-1a2d-75d1-8674d3d2403b
vm-uuid ( RO): eae053de-4a20-28a5-f335-f5a18dd79993
vm-name-label ( RO): Cacti
vdi-uuid ( RO): 90524af4-5b20-4412-9bfe-f1fe27f220b1
empty ( RO): false
device ( RO): xvda

uuid ( RO)             : de177727-b28a-8b79-e73e-d08366d56277
vm-uuid ( RO): eae053de-4a20-28a5-f335-f5a18dd79993
vm-name-label ( RO): Cacti
vdi-uuid ( RO): <not in database>
empty ( RO): true
device ( RO): xvdd

It is very common that xvdd is used for CDROM, so we can safely ignore the second section. The first section is the more interesting one. There is a correlation between the name of the VDI and the name of the LVM on the disk. We can find this specific LV using the following command. Notice that the name of the VDI is used here as the argument for the ‘grep’ command:

lvs | grep 90524af4-5b20-4412-9bfe-f1fe27f220b1

LV-90524af4-5b20-4412-9bfe-f1fe27f220b1 VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b -wi-a-  10.00G

We now have our LV path! As you can see, its status is offline. We need to set it to online state. Using both the LV and the VG name, we can do it like that:

lvchange -ay /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

Now we can access the volume. We can actually check that the problem is the one we look for, using pygrub:

pygrub /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

We should now see the GRUB menu of the VM at question. If you don’t see any menu, either you have missed a step or used the wrong disk.

The menu should show you all the list of kernels. The default one is the one highlighted, and if it doesn’t include the word “xen” with it, most likely that we have found the problem.

We now need to change to a PV-capable kernel. We will need to access the “/boot” partition of the Linux VM, and change GRUB’s options there.

First we map the disk to a loop device, so we can access its partitions:

losetup /dev/loop1 /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

Notice that you need to use the entire path to the LV, that the LV is online, and that loop1 is not in use. If it is, you will have a message saying something like “LOOP_SET_FD: Device or resource busy”

Now we need to access its partitions. We will map them using ‘kpartx’ to /dev/mapper/ devices. Notice we’re using the same loop device name:

kaprtx -a /dev/loop1

Now, new files present themselves in /dev/mapper:

ls -la /dev/mapper/
total 0
drwxr-xr-x  2 root root     220 Oct 24 12:39 .
drwxr-xr-x 14 root root   16560 Oct 24 12:31 ..
crw——-  1 root root  10, 62 Sep 29 10:15 control
brw-rw—-  1 root disk 252,  5 Oct 24 12:39 loop1p1
brw-rw—-  1 root disk 252,  6 Oct 24 12:39 loop1p2
brw-rw—-  1 root disk 252,  7 Oct 24 12:39 loop1p3

Usually, the first partition represents /boot, so we can now mount it and work on it:

mount /dev/mapper/loop1p1 /mnt

All we need to do is edit /mnt/grub/menu.lst to match our requirements, and then wrap everything back up:

umount /mnt

kpartx -u /dev/loop1

losetup -d /dev/loop1

We don’t have to change the LV to offline, because the XenServer will activate it if it’s not, however, we could do it, to be on the safe side:

lvchange -an /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

Now we can activate the VM, and see it boot successfully.

This whole process takes several minutes the first time, and even less later.

I hope this helps.

Preperation of recovery server for RPM based systems

Monday, July 23rd, 2007

On most cases, when preparing a recovery server, you can just ‘tar’ the entire server’s contents and just move it, along with a short recipe on how to rebuild the original partition layout (software Raid? LVM? flat partition tables?), how to mount volumes in-place, how to extract the tar files into the right locations, and how to install your favorite boot loader, either Lilo or Grub. Also, beforehand, you deal with taking a nice snapshot (or capturing the system in a single-user phase), and life is good, yada yada yada.

Most of us, however, never prepare for a rainy day. It’s not that we don’t want to, it’s not that we don’t plan, it’s just that we never seem to get to it, and after all, the hardware is rather new, and there should be no reason for failure. I can guess some of you heard this before – maybe in their own voices.

So, backup is a tiring job, and I will not deal with the things you need to do to maintain a replica of your data, but I will deal with how to prepare, quickly and easily, a system-recovery server (or a postmortem server) with just a little thought beforehand. This might be a bit too late for you if you’re reading it now, but, well, for the next time…

This little trick worked (as part of a large-scale process) when I migrated a server from 64bit server to a 32bit server (yeah, I know – the other way around).

It assumes you use RPM as your tool to install applications, and that if you do not, you have a method of knowing which piece of software you installed from source, and which package was installed from an external source (not your day-to-day RPM repository).

On the source server, run ‘rpm -qa > /tmp/rpmlist.long‘. Keep this file. It is important. Also, try to keep your yum.repos.d directory, or at least know which rpm repositories you use (I always use rpmforge, so I see no problem with that).

Install your target server – Same version as the source, minimal package selection. Copy the file rpmlist.long to /tmp. Make sure yum is configured (I will deal here with YUM, but you can replace it with any other repository client of your choice). Run the two following lines:

cat /tmp/rpmlist.long | sed s/-[0-9].*$/”/g > /tmp/rpmlist-short

for i in `cat /tmp/rpmlist-short`; do

yum install -y $i

done

This will add the missing RPMs with their dependencies, and will bring your system to a similar status. At least, this is a good place to start recovering.

On future chapters:

– Fully migrating from 64 to 32 bit and vice versa

– Using LVM snapshots for a smart backup, and for a smart recovery