Posts Tagged ‘snapshot’

Citrix XenServer 5.0 cannot cooperate with NetApp SnapMirror

Tuesday, September 8th, 2009

It has been a long while, I know. I was busy with life, work and everything around it. Not much worth mentioning.

This, however, is something else.

I have discovered an issue with Citrix XenServer 5.0 (probably the case with 5.5, but I have other issues with that release) using NetApp through NetApp API SR – Any non XenServer-generated snapshot will be deleted as soon as any snapshot-related action would be performed on that volume. Meaning that if I had manually created a snapshot called “1111″ (short and easy to recognize, especially with all these UUID-based volumes, LUNs and snapshot names XenServer uses…), the next time anyone would create a snapshot of a machine which has a disk (VDI) on this specific volume, the snapshot, my snapshot, “1111″ will be removed under that specific volume. The message seen in /var/log/SMlog would look like this:

Removing unused snap (1111)

While under normal operation, this does not matter much, as non-XenServer snapshots have little value, when using NetApp SnapMirror technology, the mechanism works a bit differently.

It appears that the SnapMirror system takes snapshots with predefined names (non-XenServer UUID type, luckily for us all). These snapshots include the entire changes performed since the last SnapMirror snapshots, and are used for replication. Unfortunately, XenServer deletes them. No SnapMirror snapshots, well, this is quite obvious, is it not? No SnapMirror…

We did not detect this problem immediately, and I should take the blame for that. I had to define a set of simple trial and error tests, as described above, instead of battling with a system I did not quite follow at that time – NetApp SnapMirror. Now I do, however, and I have this wonderful insight which can make your personal life, if you had issues with SnapMirror and XenServer, and did not know how to make it work, better. This solution cannot be an official one, due to its nature, which you will understand shortly. This is a personal patch for your pleasure, based on the hard fact that SnapMirror uses a predefined name for its snapshots. This name, in my case, is the name of the DR storage device. You must figure out what name is being used as part of the snapshot naming convention on your own site. Search for my ’storagedr’ phrase, and replace it with yours.

This is the diff file for /opt/xensource/sm/NETAPPSR.py . Of course – back up your original file. Also – this is not an official patch. It was tested to function correctly on XenServer 5.0, and it will not work on XenServer 5.5 (since NETAPPSR.py is different). Last warning – it might break on the next update or upgrade you have for your XenServer environment, and if that happens, you better monitor your SnapMirror status closely then.

400,403c400,404
<                     util.SMlog("Removing unused snap (%s)" % val)
<                     out = netapplib.fvol_snapdelete_wrapper(self.sv, val, volname)
<                     if not na_test_result(out):
<                         pass
---
> 		    if 'storagedr' not in val:
>                     	util.SMlog("Removing unused snap (%s)" % val)
>                     	out = netapplib.fvol_snapdelete_wrapper(self.sv, val, volname)
>                     	if not na_test_result(out):
>                         	pass

Hope it helps!

XenServer create snapshots for all machines

Friday, August 7th, 2009

XenServer is a wonderful tool. One of the better parts of it is its powerful scripting language, powered by the ‘xe’ command.

In order to capture a mass of snapshots, you can either do it manually from the GUI, or scripted. The script supplied below will include shell functions to capture Quiesce snapshots, and it that fails, normal snapshots of every running VM on the system.

Reason: NetApp SnapMirror, or other backup (maybe for later export) scheduled actions.

#!/bin/bash
# This script will supply functions for snapshotting and snapshot destroy including disks
# Written by Ez-Aton
# Visit my web blog for more stuff, at http://run.tournament.org.il
 
# Global variables:
UUID_LIST_FILE=/tmp/SNAP_UUIDS.txt
 
# Function
function assign_all_uuids () {
	# Construct artificial non-indexed list with name (removing annoying characters) and UUID
	LIST=""
	for UUID in `xe vm-list power-state=running is-control-domain=false | grep uuid | awk '{print $NF}'`
	do
		NAME=`xe vm-param-get param-name=name-label uuid=$UUID | tr ' ' _ | tr -d '(' | tr -d ')'`
		LIST="$LIST $NAME:$UUID"
	done
	echo $LIST
}
 
function take_snap_quiesce () {
	# We attempt to take a snapshot with quench
	# Arguments: $1 name ; $2 uuid
	# We attempt to snapshot the machine and set the value of snap_uuid to the snapshot uuid, if successful.
	# Return 1 if failed
 
	if SNAP_UUID=`xe vm-snapshot-with-quiesce vm=$2 new-name-label=${1}_snapshot`
	then
		# echo "Snapshot-with-quiesce for $1 successful"
		return 0
	else
		echo "Snapshot-with-quiesce for $1 failed"
		return 1
	fi
}
 
function take_snap () {
	# We attempt to take a snapshot
	# Arguments: $1 name ; $2 uuid
	# We attempt to snapshot the machine and set the value of snap_uuid to the snapshot uuid, if successful.
	# Return 1 if failed
 
	if SNAP_UUID=`xe vm-snapshot vm=$2 new-name-label=${1}_snapshot`
	then
		#echo "Snapshot for $1 successful"
		echo $SNAP_UUID
		return 0
	else
		echo "Snapshot-with-quiesce for $1 failed"
		return 1
	fi
}
 
function stop_ha_template () {
	# Templates inherit their settings from the origin
	# We need to turn off HA
	# $1 : Template UUID
	if [ -z "$1" ]
	then
		echo "Missing template UUID"
		return 1
	fi
	xe template-param-set ha-always-run=false uuid=$1
}
 
function get_vdi () {
	# This function will get a space delimited list of VDI UUIDs of a given snapshot/template UUID
	# Arguments: $1 template UUID
	# It will also verify that each VBD is an actual snapshot
	if [ -z "$1" ]
	then
		echo "No arguments? We need the template UUID"
		return 1
	fi
	VDIS=""
	for VBD in `xe vbd-list vm-uuid=$1 | grep ^uuid | awk '{print $NF}'`
	do
		echo "VBD: $VBD"
		if [ ! `xe vbd-param-get param-name=type uuid=$VBD` = "CD" ]
		then
			CUR_VDI=`xe vdi-list vbd-uuids=$VBD | grep ^uuid | awk '{print $NF}'`
			if `xe vdi-param-get uuid=$CUR_VDI param-name=is-a-snapshot`
			then
				VDIS="$VDIS $CUR_VDI"
			else
				echo "VDI is not a snapshot!"
				return 1
			fi
			CUR_VDI=""
		fi
	done
	echo $VDIS
}
 
function remove_vdi () {
	# This function will get a list of VDIs and remove them
	# Carefull!
	for VDI in $@
	do
		if xe vdi-destroy uuid=$VDI
		then
			echo "Success in removing VDI $VDI"
		else
			echo "Failure in removing VDI $VDI"
			return 1
		fi
	done
}
 
function remove_template () {
	# This funciton will remove a template
	# $1 template UUID
	if [ -z "$1" ]
	then
		echo "Required UUID"
		return 1
	fi
	xe template-param-set is-a-template=false uuid=$1
	if ! xe vm-uninstall force=true uuid=$1
	then
		echo "Failure to remove VM/Template"
		return 1
	fi
}
 
function remove_all_template () {
	# This function will completely remove a template
	# The steps are as follow:
	# $1 is the UUID of the template
	# Calculate its VDIs
	# Remove the template
	# Remove the VDIs
	if [ -z "$1" ]
	then
		echo "No Template UUID was supplied"
		return 1
	fi
	# We now collect the value of $VDIS
	get_vdi $1
	if [ "$?" -ne "0" ]
	then
		echo "Failed to get VDIs for Template $1"
		return 1
	fi
	if ! remove_template $1
	then
		echo "Failure to remove template $1"
		return 1
	fi
	if ! remove_vdi $VDIS
	then
		return 1
	fi
}
 
function create_all_snapshots () {
	# In this function we will run all over $LIST and create snapshots of each machine, keeping the UUID of it inside a file
	# $@ - list of machines in the $LIST format
	if [ -f $UUID_LIST_FILE ]
	then
		mv $UUID_LIST_FILE $UUID_LIST_FILE.$$
	fi
	for i in $@
	do
		SNAP_UUID=`take_snap_quiesce ${i%%:*} ${i##*:}`
		if [ "$?" -ne "0" ]
		then
			echo "Problem taking snapshot with quiesce for ${i%%:*}"
			echo "Attempting normal snapshot"
			SNAP_UUID=`take_snap ${i%%:*} ${i##*:}`
			if [ "$?" -ne "0" ]
                	then
                        	echo "Problem taking snapshot for ${i%%:*}"
				SNAP_UUID=""
			fi
		fi
		stop_ha_template $SNAP_UUID
		echo $SNAP_UUID >> $UUID_LIST_FILE
	done
}

Possible use will be like this:

. /usr/local/bin/xen_functions.sh

create_all_snapshots `assign_all_uuids` &> /tmp/snap_create.log

HP EVA bug – Snapshot removed through sssu is still there

Friday, May 2nd, 2008

This is an interesting bug I have encountered:

The output of an sssu command should look like this:

EVA> DELETE STORAGE “\Virtual Disks\Linux\oracle\SNAP_ORACLE”

EVA>

It still leaves the snapshot (SNAP_ORACLE in this case) visible, until the web interface is used to press on “Ok”.

This happened to me on HP EVA with HP StorageWorks Command View EVA 7.0 build 17.

When sequential delete command is given, it looks like this:

EVA> DELETE STORAGE “\Virtual Disks\Linux\oracle\SNAP_ORACLE”

Error: Error cannot get object properties. [ Deletion completed]

EVA>

When this command is given for a non-existing snapshot, it looks like this:

EVA> DELETE STORAGE “\Virtual Disks\Linux\oracle\SNAP_ORACLE”

Error: \Virtual Disks\Linux\oracle\SNAP_ORACLE not found

So I run the removal command twice (scripted) on an sssu session without “halt_on_errors”. This removes the snapshots correctly.

Quick provisioning of virtual machines

Friday, February 1st, 2008

When one wants to achieve fast provisioning of virtual machines, some solutions might come into account. The one I prefer uses Linux LVM snapshot capabilities to duplicate one working machine into few.

This can happen, of course, only if the host running VMware-Server is Linux.

LVM snapshots have one vast disadvantage – performance. When a block on the source of the snapshot is being changed for the first time, the original block is being replicated to each and every snapshot COOW space. It means that a creation of a 1GB file on a volume having ten snapshots means a total copy of 10GB of data across your disks. You cannot ignore this performance impact.

LVM2 has support for read/write snapshots. I have come up with a nice way of utilizing this capability to my benefit. An R/W snapshot which is being changed does not replicate its changes to any other snapshot. All changes are considered local to this snapshot, and are being maintained only in its COOW space. So adding a 1GB file to a snapshot has zero impact on the rest of the snapshots or volumes.

The idea is quite simple, and it works like this:

1. Create adequate logical volume with a given size (I used 9GB for my own purposes). The name of the LV in my case will be /dev/VGVM3/centos-base

2. Mount this LV on a directory, and create a VM inside it. In my case, it’s in /vmware/centos-base

3. Install the VM as the baseline for all your future VMs. If you might not want Apache on some of them, don’t install it on the baseline.

4. Install vmware-tools on the baseline.

5. Disable the service “kudzu”

6. Update as required

7. In my case I always use DHCP. You can set it to obtain its IP once from a given location, or whatever you feel like.

8. Shut down the VM.

9. In the VM’s .vmx file add a line like this:

uuid.action = “create”

I have added below (expand to read) two scripts which will create the snapshot, mount it and register it, including new MAC and UUID.

Press below for the scripts I have used to create and destroy VMs

create-replica.sh:

#!/bin/sh
# This script will replicate vms from a given (predefined) source to a new system
# Written by Ez-Aton, http://www.tournament.org.il/run
# Arguments: name

# FUNCITONS BE HERE
test_can_do () {
# To be able to snapshot, we need a set of things to happen
if [ -d $DIR/$TARGET ] ; then
echo “Directory already exists. You don’t want to do it…”
exit 1
fi
if [ -f $VG/$TARGET ] ; then
echo “Target snapshot exists”
exit 1
fi
if [ `vmrun list | grep -c $DIR/$SRC/$SRC.vmx` -gt "0" ] ; then
echo “Source VM is still running. Shut it down before proceeding”
exit 1
fi
if [ `vmware-cmd -l | grep -c $DIR/$TARGET/$SRC.vmx` -ne "0" ] ; then
echo “VM already registered. Unregister first”
exit 1
fi
}

do_snapshot () {
# Take the snapshot
lvcreate -s -n $TARGET -L $SNAPSIZE $VG/$SRC
RET=$?
if [ "$RET" -ne "0" ]; then
echo “Failed to create snapshot”
exit 1
fi
}

mount_snapshot () {
# This function creates the required directories and mounts the snapshot there
mkdir $DIR/$TARGET
mount $VG/$TARGET $DIR/$TARGET
RET=$?
if [ "$RET" -ne "0" ]; then
echo “Failed to mount snapshot”
exit 1
fi
}

alter_snap_vmx () {
# This function will alter the name in the VMX and make it the $TARGET name
cat $DIR/$TARGET/$SRC.vmx | grep -v “displayName” > $DIR/$TARGET/$TARGET.vmx
echo “displayName = \”$TARGET\”" >> $DIR/$TARGET/$TARGET.vmx
cat $DIR/$TARGET/$TARGET.vmx > $DIR/$TARGET/$SRC.vmx
\rm $DIR/$TARGET/$TARGET.vmx
}

register_vm () {
# This function will register the VM to VMWARE
vmware-cmd -s register $DIR/$TARGET/$SRC.vmx
}

# MAIN
if [ -z "$1" ]; then
echo “Arguments: The target name”
exit 1
fi

# Parameters:
SRC=centos-base         #The name of the source image, and the source dir
PREFIX=centos             #All targets will be created in the name centos-$NAME
DIR=/vmware               #My VMware VMs default dir
SNAPSIZE=6G              #My COOW space
VG=/dev/VGVM3           #The name of the VG
TARGET=”$PREFIX-$1″

test_can_do
do_snapshot
mount_snapshot
alter_snap_vmx
register_vm
exit 0

remove-replica.sh:

#!/bin/sh
# This script will remove a snapshot machine
# Written by Ez-Aton, http://www.tournament.org.il/run
# Arguments: machine name

#FUNCTIONS
does_it_exist () {
# Check if the described VM exists
if [ `vmware-cmd -l | grep -c $DIR/$TARGET/$SRC.vmx` -eq "0" ]; then
echo “No such VM”
exit 1
fi
if [ ! -e $VG/$TARGET ]; then
echo “There is no matching snapshot volume”
exit 1
fi
if [ `lvs $VG/$TARGET | awk '{print $5}' | grep -c $SRC` -eq "0" ]; then
echo “This is not a snapshot, or a snapshot of the wrong LV”
exit 1
fi
}

ask_a_thousand_times () {
# This function verifies that the right thing is actually done
echo “You are about to remove a virtual machine and an LVM. Details:”
echo “Machine name: $TARGET”
echo “Logical Volume: $VG/$TARGET”
echo -n “Are you sure? (y/N): ”
read RES
if [ "$RES" != "Y" ]&&[ "$RES" != "y" ]; then
echo “Decided not to do it”
exit 0
fi
echo “”
echo “You have asked to remove this machine”
echo -n “Again: Are you sure? (y/N): ”
read RES
if [ "$RES" != "Y" ]&&[ "$RES" != "y" ]; then
echo “Decided not to do it”
exit 0
fi
echo “Removing VM and snapshot”
}

shut_down_vm () {
# Shut down the VM and unregister it
vmware-cmd $DIR/$TARGET/$SRC.vmx stop hard
vmware-cmd -s unregister $DIR/$TARGET/$SRC.vmx
}

remove_snapshot () {
# Umount and remove the snapshot
umount $DIR/$TARGET
RET=$?
if [ "$RET" -ne "0" ]; then
echo “Cannot umount $DIR/$TARGET”
exit 1
fi
lvremove -f $VG/$TARGET
RET=$?
if [ "$RET" -ne "0" ]; then
echo “Cannot remove snapshot LV”
exit 1
fi
}

remove_dir () {
# Removes the mount point
rmdir $DIR/$TARGET
}

#MAIN
if [ -z "$1" ]; then
echo “No machine name. Exiting”
exit 1
fi

#PARAMETERS:
DIR=/vmware                #VMware default VMs location
VG=/dev/VGVM3            #The name of the VG
PREFIX=centos              #Prefix to the name. All these VMs will be called centos-$NAME
TARGET=”$PREFIX-$1″
SRC=centos-base           #The name of the baseline image, LVM, etc. All are the same

does_it_exist
ask_a_thousand_times
shut_down_vm
remove_snapshot
remove_dir

exit 0

Pros:

1. Very fast provisioning. It takes almost five seconds, and that’s because my server is somewhat loaded.

2. Dependable: KISS at its marvel.

3. Conservative on space

4. Conservative on I/O load (unlike the traditional use of LVM snapshot, as explained in the beginning of this section).

Cons:

1. Cannot streamline the contents of snapshot into the main image (LVM team will implement it in the future, I think)

2. Cannot take a snapshot of a snapshot (same as above)

3. If the COOW space of any of the snapshots is full (viewable through the command ‘lvs‘) then on boot, the source LV might not become active (confirmed RH4 bug, and this is the system I have used)

4. My script does not edit/alter /etc/fstab (I have decided it to be rather risky, and it was not worth the effort at this time)

5. My script does not check if there is enough available space in the VG. Not required, as it will fail if creation of LV will fail

You are most welcome to contribute any further changes done to this script. Please maintain my URL in the script if you decide to use it.

Thanks!

Linux LVM performace measurement

Sunday, June 10th, 2007

Modern Linux LVM offers great abilities to maintain snapshots of existing logical volumes. Unlike NetApp “Write Anywhere File Layout” (WAFL), Linux LVM uses “Copy-on-Write” (COW) to allow snapshots. The process, in general, can be described in this pdf document.

I have issues several small tests, just to get real-life estimations of what is the actual performance impact such COW method can cause.

Server details:

1. CPU: 2x Xion 2.8GHz

2. Disks: /dev/sda – system disk. Did not touch it; /dev/sdb – used for the LVM; /dev/sdc – used for the LVM

3. Mount: LV is mounted (and remains mounted) on /vmware

Results:

1. No snapshot, Using VG on /dev/sdb only:

# time dd if=/dev/zero of=/vmware/test.2GB bs=1M count=2048
2048+0 records in
2048+0 records out

real 0m16.088s
user 0m0.009s
sys 0m8.756s

2. With snapshot on the same disk (/dev/sdb):

# time dd if=/dev/zero of=/vmware/test.2GB bs=1M count=2048
2048+0 records in
2048+0 records out

real 6m5.185s
user 0m0.008s
sys 0m11.754s

3. With snapshot on 2nd disk (/dev/sdc):

# time dd if=/dev/zero of=/vmware/test.2GB bs=1M count=2048
2048+0 records in
2048+0 records out

real 5m17.604s
user 0m0.004s
sys 0m11.265s

4. Same as before, creating a new empty file on the disk:

# time dd if=/dev/zero of=/vmware/test2.2GB bs=1M count=2048
2048+0 records in
2048+0 records out

real 3m24.804s
user 0m0.006s
sys 0m11.907s

5. Removed the snapshot. Created a 3rd file:

LVM Snapshots with MySQL

Saturday, December 2nd, 2006

Nowadays, when LVM2 is common and is actually the default in installation of RedHat based distributions, using its snapshot capabilities can save lots of grief when files are deleted or when you need to revert to a day in the past – both for your files and for your MySQL DB.

I have created a script which is based on the following assumptions:

1. Inside /etc/samba/smb.conf there is a directive such as: include /etc/samba/smb.conf

2. There is a single LV containing all the system’s data. It doesn’t occupy all the physical disk (or, for the matter, the entire VG space). Free space is 10-20% of disk size

3. Specific share directives are located inside /etc/samba/smb.conf.snapshot.full. An empty file /etc/samba/smb.conf.snapshot.empty exists.

4. I do not trust all places to hold a password for their MySQL (although it is advised!). This script assumes such password doesn’t always exist

5. The script mounts the snapshot read-only just after creating an empty file with the date of the snapshot inside its root.

The script is attached here. take-snapshot.txt

Ontap Simulator, and some insights about NetApp

Tuesday, May 9th, 2006

First and foremost – the Ontap simulator, a great tool which surely can assist in learning NetApp interface and utilization, lacks in performance. It has some built-in limitations – No FCP, no disks (virtual disks) larger than 1GB (per my trial-and-error. I might find out I was wrong somehow, and put in on this website), and low performance. I’ve got about 300KB/s transfer rate both on iSCSI and on NFS. To make sure it was not due to some network hog hiding somewhere on my net(s), I’ve even tried it from the host of the simulator itself, but to no avail. Low performance. Don’t try to use it as your own home iSCSI Target. Better just use Linux for this purpose, with the drivers obtained from here (It’s one of my next steps into “shared storage(s) for all”).

Another issue – After much reading through NetApp documentation, I’ve reached the following concepts of the product. Please correct me if you see fit:

The older method was to create a volume (vol create) directly from disks. Either using raid_dp or raid4.

The current method is to create aggregations (aggr create) from disks. Each aggregate consists of raid groups. A raid group (rg) can be made up of up to eight physical disks. Each group of disks (an rg) has one or two parity disks, depending on the type of raid (raid 4 uses one parity, and raid_dp uses “double parity”, as its name can suggest).

Actually, I can assume that each aggregation is formatted using the WAFL filesystem, which leads to the conclusion that modern (flex) volumes are logical “chunks” of this whole WAFL layout. In the past, each volume was a separated WAFL formatted unit, and each size change required adding disks.

This separation of the flex volume from the aggregation suggests to me the possibility of multiple-root capable WAFL. It can explain the lack of requirement for a continuous space on the aggregation. This eases the space management, and allows for fast and easy “cloning” of volumes.

I believe that the new “clone” method is based on the WAFL built-in snapshot capabilities. Although WAFL Snapshots are supposed to be space conservatives, they require a guaranteed space on the aggregation prior to committing the clone itself. If the aggregation is too crowded, they will fail with the error message “not enough space”. If there is enough for snapshots, but not enough to guarantee a full clone, you’ll get a message saying “space not guaranteed”.

I see the flex volumes as some combination between filesystem (WAFL) and LVM, living together on the same level.

LUNs on NetApp: iSCSI and/or Fibre LUNs are actually managed as a single (per-LUN) large file contained within a volume. This file has special permissions (I was not able to copy it or modify it while it was online and I had root permissions. However, I am rather new to NetApp technology), and it is being exported as a disk outside. Much like an ISO image (which is a large file containing a whole filesystem layout) these files contain a whole disk layout, including partition tables, LVM headers, etc – just like a real disk.

Thinking about it, it’s neither impossible nor very surprising. A disk is no more than a container of data, of blocks, and if you can utilize the required communication protocol used for accessing it and managing its blocks (aka, the transport layer on which filesystem can access the block data), you can, with just a little translation interface, set up a virtual disk which will behave just like any regular disk.

This brings us to the advantages of NetApp’s WAFL – the ability to minimize I/O while maintaining a set of snapshots for the system – a list of per-block modification history. It means you can “snapshot” your LUN, being physically no more than a file on a WAFL-based volume, and you can go back with your data to a previous date – an hour, a day, a week. Time travel for your data.

There are, unfortunately, some major side effects. If you’ve read the WAFL description from NetApp, my summary will be inaccurate at best. If you haven’t, it will be enough, but still you are most encouraged to read it. The idea is that this filesystem is made out of multi-layers of pointers, and of blocks. A pointer can point to more than one block. When you commit a snapshot, you do not change the pointers, you do not move data, you just modify the set of pointers. When there is any change in the data (meaning a block is changed), the pointer points to the alternate block instead of the previous (historical) block, but keeps reference of the older block’s location. This way, only modified blocks are actually recreated, while any unmodified data remains on the same spot on the physical disk. An additional claim of NetApp is that their WAFL is optimized for the raid4 and raid_dp they use, and utilizes it in a smart manner.

The problem with WAFL, as can be easily seen, is fragmentation. For CIFS and NFS, it does not cause much of a problem, as the system is very capable of read-ahead just to solve this issue. However, A LUN (which is supposed to act as a continuous layout, just like any hard-drive or raid-array in the world and on which various file-system related operations occur) gets fragmented.

Unlike CIFS or NFS, LUN read-ahead is harder to predict, as the client tries to do just the same. Unlike real disks, NetApp LUNs do not behave, performance-wise, like the hard-drive layout any DB or FS has learned to expect and was best optimized for. It means, for my example, that on a DB with lots of small changes, that the DB itself would have tried to commit changes in large write operations, committed every so and so interval, and would thrive to commit them as close to each other, as continuous as possible. On NetApp LUN this will cause fragmentation, and will result in lower write (and later read) performance.

That’s all for today.