Recovery of a StorageRepository (SR) in XenServer, part one

In this part I will discuss a possible solution to a problem I encountered several times already – failure to understand XenServer use of LVM, but first – a little explanation of the topic.

XenServer makes extensive use of LVM technology in order to support the storage requirements of virtual disks. It is being utilized in two methods – LVMoISCSI/LVMoHBA and ext. In both cases, XenServer defines the initial layout as a LVM framework. The LVM, except for the system disk, is positioned directly on the disk in whole, and not on the first partition. I imagine that the desire to avoid dealing with GPT/Basic/Other partitioning schemes is the root of this notion. While it does solve the disk partitioning method problem, it creates a different problem – PEBKC problem (Problem Exists Between Keyboard and Chair). Lack of understanding that there is no partition on the disk, but the data is structured directly on it, is the cause of relatively frequent deletion of the LVM structure as it being replaced by a partitioning layout. The cause of it can be one of two common problems – the first is that the LUN/disk is exposed directly to a Windows machine, which asks joyfully if one would like to ‘sign the partition’. If one does so, a basic partitioning structure is created, and the LVM data structure is overwritten by it. The second problem is a little less common, and involves lack of understanding of the LVM structure as employed by XenServer, when performing disk tasks as the root user on the XenServer host directly. In this case, the user will not be aware of the data structure, and might be tempted to partition, and God forbid – even format the created partition. The result would be a total loss of the SR.

This was about how data is structured and how it is erased or damaged.

I was surprised to discover the ‘easy’ method of recovery from a partitioning table layer over the LVM metadata. I assume that no one has attempted to format the resulting partition(s), but stopped only at creating the partition layout and attempting to understand why it doesn’t work anymore in XenServer.

The easy way, which will be discussed here, is the first of two articles I intend on writing about LVM recovery. If this ‘easy’ method works for you – no need to try your luck with the more complex one.

So, to work. In case someone has created a partition layout, overwriting, as explained earlier, the LVM metadata structure, the symptoms would be that a disk will have (a) partition(s). For example, the results of ‘cat /proc/partitions’ would look like that (snipping the irrelevant parts)

8         16        156290904 sdb
8        17        156288321 sdb1

As clearly visible – the bold line should not be there. The output of ‘fdisk -l /dev/sdb’ showed (again – snipping the irrelevant parts):

/dev/sdb1                                1                   19457                 156288321       83  Linux

It proves someone has manually attempted to partition the disk. Had a mount command worked (example: ‘mount /dev/sdb1 /mnt’) my response e-mail message would go like this: “Sorry. The data was overwritten. Can’t do anything about it”, however, this was not the case. Not this time.

The magic trick I used was to remove the partition entirely, freeing the disk to be identified as LVM, if it could – I wasn’t sure it would – and then take some recovery actions.

First – fdisk to remove the partition:

fdisk /dev/sdb << EOF
d
w
EOF

Now, a pvscan operation could take place. The following command returned the correct value – a PV ID which wasn’t there before, meaning that the PV information was still intact:

pvscan

Now, a simple ‘SR Repair’ operation could take place.

Easy.
My next article in this series will show a more complex method of recovery to employ when this ‘easy’ one doesn’t work.

Tags: , , ,

4 Responses to “Recovery of a StorageRepository (SR) in XenServer, part one”

  1. AJ Says:

    Hello,

    We have a critical scenario with one of our xenserver. The LVM metadata on the local storage lost somehow and VMs disks were not showing in xencernter. When trying to reattach the SR, one of the tech mistakenly use ‘sr-create’ command instead of ‘sr-introduce’ for the SR with VM disks and all the data on the SR lost. We restored the LVM metadata from a backup and VG and LVMs are now showing with correct disk size. And we tried to manually mount VM disks on the host after creating device mapping with Kpartx but it’s not working. Just wonder if there is any way to recover data from the SR.

    Any help would be greatly appreciated.

    Thanks.

  2. ez-aton Says:

    This is a hard one. If you were able to recover the LVM metadata, you might be able to perform re-introduce to the SR. Ignore the previous one (sr-forget), and run sr-introduce and be able to recover it.
    A (bad, and slow, however…) alternative is to map the disks using kpartx and, using a new SR (on other disks, of course) create virtual disks of the same size. If you do it, you can perform ‘dd’ from the source block device to the target disk (you will have to map it using vbd-create to the control domain). You can save your data like that. You will have a virtual disk, and you will have your data. However, first – sr-introduce!

    Ez

  3. ricrc Says:

    This … server fixed, partition recovered. thank you.

  4. etzion Says:

    With pleasure 🙂

Leave a Reply