Clone corrupted disk in XenServer
Following some unknown problems, I had recently several XenServer machines (different clusters, different sites and customers, and even different versions) with a VDI-END-of-File issues. It means that while you can start the VM correctly, perform XenMotion to another server you are unable to do any storage-migration task – neither Storage XenMotion, nor VDI copy or VM-move commands. In some cases, snapshots taken from the “ill” disks were misbehaving just the same. This is rather frustrating, because the way to solve it is by cloning the disk into a new one, and your hands are bound.
A method I have devised for the task is rather simple – Create a new VDI (on the target storage), map the original VDI and the new VDI to a domain0 machine, and copy using the ‘dd’ command, block-by-block. This is slow, thick, but it’s working.
How to do it? The steps, in general are:
- Create a new VDI of the same size or larger than the original VDI.
- Map the old and new VDIs UUID
- Map the UUID of the control domain you intend to use for this task (it has to be which has access to both VDIs)
- Turn off the ‘ill’ VM, mark the ‘ill’ VDI in a way you will be able to identify it easily (unique name label, for example), and unmap it from the VM
- Create VBD for the VDI devices for the control domain, and plug them
- Create Linux device file for the VBDs on the control domain
- Perform ‘dd’ between the old and new disks (do not get confused with the direction, or you will overwrite your data!)
- Unmap VBDs, destroy VBDs
- Map the new VDI to the VM
- Start the VM
I won’t go over the how to create a VDI. Use the XenCenter GUI to do it. Place it on the desired SR. Give it a noticeable name, so you would be able to recognise it
Get the UUID of the new VDI:
xe vdi-list name-label="The name label I used" | grep ^uuid | awk '{print $NF}'
Do the same to the source VDI. Use it’s name label, or use xe vbd-list to obtain its VDI UUID
Get the UUID of the control domain you want to use: xe vm-list is-control-domain=true
Unmap the VM’s VDI from it (after setting some very noticeable name for it, and noting the disk number/ID it had on the VM)
On the control domain, run:
xe vbd-create vdi-uuid=<'Ill' VDI UUID> vm-uuid=<Control domain UUID> device=xvda
This command will result in a UUID. Note this UUID, as the source device UUID.
Run again for the target VDI. This time, use device=xvdb
Note this UUID as well. This is the target UUID.
We need to connect the VBDs and create a device node for them:
xe vbd-plug uuid=<UUID of source VBD created above>
There is a new block device available to the XenServer host’s control domain. To identify the new device, we need to run now:
tail -1 /proc/partitions
The resulting line would look something like this:
253 10 40960000 tdk
The interesting fields are the first, the 2nd and the last. We will use them to create a block device using the command ‘mknod’:
mknod /dev/tdk b 253 10
The result will be a block device file called /dev/tdk with the major 253 and minor 10.
We will repeat the process for the target VBD, and here we have two additional disks on the control domain.
We can (and should) copy using dd from the source to the target (don’t mix it!). Assuming /dev/tdk is the source, and /dev/tdl is the target, it would look like this:
dd if=/dev/tdk of=/dev/tdl bs=1M oflag=direct
We are using oflag=direct to enforce direct writes and not to saturate the control domain’s caches.
Following the operation, to release the disks and get back to business, we do:
- xe vbd-unplug uuid=<SOURCE VBD UUID>
- xe vbd-destroy uuid=<SOURCE VBD UUID>
- xe vbd-unplug uuid=<TARGET VBD UUID>
- xe vbd-destroy uuid=<TARGET VBD UUID>
- Map the new disk to the VM, to the correct device number
- Start the VM
If it starts OK, we can destroy the old VDI and have a bowl. If it doesn’t, we can always map the previous (source) VDI to the VM, and start it anew.
I hope it helps.