Posts Tagged ‘restore’

LVM Recovery

Friday, May 29th, 2009

A friend of mine made a grieve mistake – partition a disk containing Linux LVM directly on it, without any partition table. Oops.

When dealing with multi-Tera sized disks, one gets to encounter limitations not known on smaller scales – the 2TB limitation. Normal partition table can contain only around 2TB mapping, meaning that to create larger partitions, or even smaller partitions which exceed that specific limit, you have to take one of two actions:

  • Use GPT partition tables, which is meant for large disks, and partition the disk to the size limits you desire
  • Define LVM PV directly on the block device (the command would look like ‘pvcreate /dev/sdb -> see? No partitions)

“Surprisingly” and for no good reason, it appears that the disk which was used completely for the LVM PV suddenly had a single GPT partition on it. Hmmmm.

This is/was a single disk in a two-PV VG continging a single LV spanned all over the VG space. Following the “mysterious” actions, the VG refused to start, claiming that it could not find PV with PVID <some UID>.

This is a step where one should stop and call a professional if he doesn’t know for sure how to continue. These following actions are very risky to your data, and could result in you either recovering from tapes (if exist) or seeking a new job, if this is/was some mission-critical data.

First – go to /etc/lvm/archive and find the latest file named after the VG which has been destroyed. Look into it – you should see the PV is in there. Search the PV based on the UID reported not to repond on the logs.

Second – you need to remove the GPT partition from the disk. The PV will be recreated exactly as it was suppoed to be before. Replace /dev/some_disk with your own device file.

fdisk /dev/some_disk

d

w

Third – Reread the VG archive file, to be on the safe side. Verify again that the PV you are about to recreate is the one you are to. When done, run the following command

pvcreate -u <UID> /dev/some_disk

Again – the name of the device file has been changed in this example to prevent copy-paste incidents from happening.

Fourth – Run vgcfgrestore with the name of the VG as parameter. This command would restore your meta information into the PV and VG.

vgcfgrestore VG_TEST

Fifth – Activate the VG:

vgchange -ay VG_TEST

Now the volumes should be up, and you have the ability to attempt to mount these volumes.

Notice that the data might be corrupted in some way. Running fsck is recommended, although time-consuming.

Good luck!

Moving Exchange Data

Thursday, April 6th, 2006

Lets assume you have a method of point-in-time copy of Microsoft Exchange DB and logs, while the system is running, to an alternate server. Let’s assume, if we’re at that, that this point-in-time is consistent, and that you can mount this store (depending on using the similar directory structure, etc.), on an alternate server, and that it works correctly, aka, mounts without a problem. Scenario can be like this:

Server A: Microsoft Exchange, Storage group containing few mailbox stores, each on a different drive letter (E:, F:, G:, in our example), and the Storage Group’s logs are on a seperated drive, L:.

On Server B, we create a similar setup – Few mailbox stores, similar names, on E:, F:, G:, and we create (or move) the logs to reside on L:. We make sure this server’s patch level (or updates and versions) are similar to Server A.

We dismount the whole storage group, mark it to be overwritten by a restore, and replace the currently existing stores with our point-in-time from Server A. Great. Mounting the store, and, on a wider point of view, mounting the whole storage group’s components would be easy and painless. Our point-in-time is consistant, so it’s just like bringing up a storage group after unexpected shutdown.

Lets assume we were able to do so, we’re not finished yet. Each user’s attributes contain information pointing to the location of his/her mailbox, including the name of the store, and the name of the server. We need to change an AD attributes, per-user, for this point-in-time replication/DRP to work.

A friend of mine, Guy, has created such a script, just to solve this specific issue. It has some minor issues yet, but if you are aware of them, you can handle them quite easily. They are:

1. To run the script, make sure it is accessible via the same path on each computer running ADU&C (required only on the computers which run it). You can put it on a share, and I think it will work (haven’t tested it), or you can put it on a local directory, but make sure other computers from which you would want to run this option, have this script in the same directory (same path).

2. The script / GUI does not understand the option "Cancel", although it’s there. If you pick "Cancel", you get to actually select "0". Be aware of it.

3. The script requires resolution per OU. It means that it’s easier to move the users sharing the same mailbox store into the same OU, at least for the purpose of running the script. You could create an OU under an existing OU, and move only the users sharing the same mailbox store into it, obtaining the GPO and settings propagated to it from above.

4. There is no "uninstall" option. Don’t want it? Don’t use it. Can’t remove it unless you know what you’re doing.

I tend to believe these flaws/bugs/issues will be dealt with someday, but for the minor usage I had, it was enough, and even better.

By the way – so far, this trick cannot be used for Public Folders, as their information is hidden well too deep. Maybe someday.

A bug in restore in Centos4.1 and probably RHEL 4 update 1

Sunday, February 26th, 2006

I’ve been to Hostopia today. The land of hosting servers. I’ve had an emergency job on one Linux server, due to a mistake I’ve made. It appears that the performance hindrance of using raid0 instead of raid1 (Centos/RH default raid setup is raid0 and not raid1, which led me to this mistake) for the root partition is terrible.

I tend to setup servers in the following way:

Small (100MB) raid1 partition (/dev/sda1 and /dev/sdb1, usually) for /boot.

Two separated partitions for swap (/dev/sda2 and /dev/sdb2), each just half the required total swap.

One large raid1 (/dev/sda3 and /dev/sdb3) containing LVM, which, in turn, holds the “/” and the rest of the data partitions, if required.

In this specific case, I’ve made a mistake and was not aware of it on time. I’ve setup the large LVM over a stripe (raid0) by mistake. I’ve had degraded performance on the server, and all disk access were slow. Very slow. Since it is impossible to break such a raid array without loosing data, I’ve had to backup the data currently there, and make sure I would be able to restore it. It’s an old habit of mine to use dump and restore. Both ends of the procedure worked so far perfectly, on all *nix operating systems I’ve had experience with. I’ve dumped the data, using one of the swap partitions as a container (formatted as ext3, of course), and was ready to continue.

I’ve reached the server farm, where all hosting servers stood in long rows (I’m so sorry I did not take a picture. Some of those so called “servers” had color leds in their fans!), and got busy on this specific server. Had to backup all from the start, as it failed to complete before (and this time, I’ve done so to my laptop through the 2nd NIC), and then I’ve booted into rescue mode, destroyed the LVM, destroyed the raid (md device), and recreated them. It went fine, except that restore failed to work. The claim was “. is not the root” or something similar. Checking restore via my laptop worked fine, but the server itself failed to work. Eventually, after long waste of time, I’ve installed minimal Centos4.1 setup on the server, and tried to restore through overwrite from within a chroot environment. It failed as well. Same error message. I’ve suddenly decided to check if I’ve had an update to the dump package, and there was. Installing it solved the issue. I was able to restore the volume (using the “u” flag, to overwrite files), and all was fine.

I’ve wasted over an hour over this stupid bug. Pity.

Keeping the static copy of the up-to-date restore binary. Now I will not have these problems again. I hope 🙂