Posts Tagged ‘snapshots’

ZFS clone script

Sunday, March 28th, 2021

ZFS has some magical features, comparable to NetApp’s WAFL capabilities. One of the less-used on is the ZFS send/receive, which can be utilised as an engine below something much like NetApp’s SnapMirror or SnapVault.

The idea, if you are not familiar with NetApp’s products, is to take a snapshot of a dataset on the source, and clone it to a remote storage. Then, take another snapshot, and clone only the delta between both snapshots, and so on. This allows for cloning block-level changes only, which reduces clone payload and the time required to clone it.

Copy and save this file as clone_zfs_snapshots.sh. Give it execution permissions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
#!/bin/bash
# This script will clone ZFS snapshots incrementally over SSH to a target server
# Snapshot name structure: [email protected]${TGT_HASH}_INT ; where INT is an increment number
# Written by Etzion. Feel free to use. See more stuff in my blog at https://run.tournament.org.il
# Arguments:
# $1: ZFS filesystem name
# $2: (target ZFS system):(target ZFS filesystem)
 
IAM=$0
ZFS=/sbin/zfs
LOCKDIR=/dev/shm
LOCAL_SNAPS_TO_LEAVE=3
RESUME_LIMIT=3
 
### FUNCTIONS ###
 
# Sanity and usage
function usage() {
	echo "Usage: $IAM SRC REMOTE_SERVER:ZFS_TARGET (port=SSH_PORT)"
	echo "ZFS_TARGET is the parent of filesystems which will be created with the original source names"
	echo "Example: $IAM share/test backupsrv:backup"
	echo "It will create a filesystem 'test' under the pool 'backup' on 'backupsrv' with clone"
	echo "of the current share/test ZFS filesystem"
	echo "This script is (on purpose) not a recursive script"
	echo "For the script to work correctly, it *must* have SSH key exchanged from source to target"
	exit 0
}
 
function abort() {
	# exit errorously with a message
	echo "[email protected]"
	pkill -P $$
	remove_lock
	exit 1
}
 
function parse_parameters() {
	# Parses command line parameters
	# called with $*
	SRC_FS=$1
	shift
	TGT=$1
	shift
	for i in $*
	do
		case ${i} in
			port=*)	PORT=${i##*=}
			;;
			hash=*)	HASH=${i##*=}
			;;
		esac
	done
	TGT_SYS=${TGT%%:*}
	TGT_FS=${TGT##*:}
	# Use a short substring of MD5sum of the target name for later unique identification
	SRC_DIRNAME_FS=${SRC_FS#*/}
	if [ -z "$hash" ]
	then
		TGT_FULLHASH="`echo $TGT_FS/${SRC_DIRNAME_FS} | md5sum -`"
		TGT_HASH=${TGT_FULLHASH:1:7}
	else
		TGT_HASH=${hash}
	fi
 
}
 
function sanity() {
	# Verify we have all details
	[ -z "$SRC_FS" ] && usage
	[ -z "$TGT_FS" ] && usage
	[ -z "$TGT_SYS" ] && usage
	$ZFS list -H -o name $SRC_FS > /dev/null 2>&1 || abort "Source filesystem $SRC_FS does not exist"
	# check_target_fs || abort "Target ZFS filesystem $TGT_FS on $TGT_SYS does not exist, or not imported"
}
 
function remove_lock() {
	# Removes the lock file
	\rm -f ${LOCKDIR}/$SRC_LOCK
}
 
function construct_ssh_cmd() {
	# Constract the remote SSH command
	# Here is a good place to put atomic parameters used for the SSH
	[ -z "${PORT}" ] && PORT=22
	SSH="ssh -p $PORT $TGT_SYS -o ConnectTimeout=3"
	CONTROL_SSH="$SSH -f"
}
 
function get_last_remote_snapshots() {
	# Gets the last snapshot name on a remote system, to match it to our snapshots
	remoteSnapTmpObj=`$SSH "$ZFS list -H -t snapshot -r -o name ${TGT_FS}/${SRC_DIRNAME_FS}" | grep ${SRC_DIRNAME_FS}@ | grep ${TGT_HASH}`
	# Create a list of all snapshot indexes. Empty means its the first one
	remoteSnaps=""
	for snapIter in ${remoteSnapTmpObj}
	do
	  remoteSnaps="$remoteSnaps ${snapIter##*@${TGT_HASH}_}"
	done
}
 
function check_if_remote_snapshot_exists() {
	# Argument: $1 -> Name of snapshot
	# Checks if this snapshot exists on remote node
	$SSH "$ZFS list -H -t snapshot -r -o name ${TGT_FS}/${SRC_DIRNAME_FS}@${TGT_HASH}_${newLocalIndex}"
	return $?
}
 
function get_last_local_snapshots() {
	# This function will return an array of local existing snapshots using the existing TGT_HASH
    localSnapTmpObj=`$ZFS list -H -t snapshot -r -o name $SRC_FS | grep $SRC_FS@ | grep $TGT_HASH `
    # Convert into a list and remove the HASH and everything before it. We should have clear list of indexes
    localSnapList=""
    for snapIter in ${localSnapTmpObj}
    do
    	localSnapList="$localSnapList ${snapIter##*@${TGT_HASH}_}"
    done
    # Convert object to array
    localSnapList=( $localSnapList )
    # Get the last object
    let localSnapArrayObj=${#localSnapList[@]}-1
}
 
function delete_snapshot() {
	# This function will delete a snapshot
	# arguments: $1 -> snapshot name
	[ -z "$1" ] && abort "Cleanup snapshot got no arguments"
	$ZFS destroy $1
	#$ZFS destroy ${SRC_FS}@${TGT_HASH}_${newLocalIndex}
}
 
function find_matching_snapshot() {
	# This function will attempt to find a matching snapshot as a replication baseline
	# Gets the latest local snapshot index
	localRecentIndex=${localSnapList[$localSnapArrayObj]}
    # Gets the latest mutual snapshot index
    while [ $localSnapArrayObj -ge 0 ]
    do
    	# Check if the current counter already exists
    	if echo "$remoteSnaps" | grep -w ${localSnapList[$localSnapArrayObj]} > /dev/null 2>&1
    	then
    		# We know the mutual index.
    		commonIndex=${localSnapList[$localSnapArrayObj]}
    		return 0
    	fi
    	let localSnapArrayObj--
    done
    # If we've reached here - there is no mutual index!
    abort "There is no mutual snapshot index, you will have to resync"
}
 
function cleanup_snapshots() {
	# Creates a list of snapshots to delete and then calls delete_snapshot function
	# We are using the most recent common index, $localSnapArrayObj as the latest reference for deletion
	let deleteArrayObj=$localSnapArrayObj-${LOCAL_SNAPS_TO_LEAVE}
	snapsToDelete=""
	# Construct a list of snapshots to delete, and delete it in reverse order
	while [ $deleteArrayObj -ge 0 ]
	do
		# Construct snapshot name
		snapsToDelete="$snapsToDelete ${SRC_FS}@${TGT_HASH}_${localSnapList[$deleteArrayObj]}"
		let deleteArrayObj--
	done
	snapsToDelete=( $snapsToDelete )
 
	snapDelete=0
 
	while [ $snapDelete -lt ${#snapsToDelete[@]} ]
	do
		# Delete snapshot
		delete_snapshot ${snapsToDelete[$snapDelete]}
		let snapDelete++
	done
}
 
function initialize() {
	# This is a unique case where we initialize the first sync
	# We will call this procedure when $remoteSnaps is empty (meaning that there was no snapshot whatsoever)
	# We have to verify that the target has no existing old snapshots here
	# is it empty?
	echo "Going to perform an initialization replication. It might wipe the target $TGT_FS completely"
	echo "Press Enter to proceed, or Ctrl+C to abort"
	read "abc"
	### Decided to remove this check
	### [ -n "$LOCSNAP_LIST" ] && abort "No target snapshots while local history snapshots exists. Clean up history and try again"
	RECEIVE_FLAGS="-sFdvu"
	newLocalIndex=1
	# NEW_LOC_INDEX=1
	create_local_snapshot $newLocalIndex
	open_remote_socket
	sleep 1
	$ZFS send -ce ${SRC_FS}@${TGT_HASH}_${newLocalIndex} | nc $TGT_SYS $NC_PORT 2>&1
	if [ "$?" -ne "0" ]
	then
		# Do no cleanup current snapshot
		# delete_snapshot ${SRC_FS}@${TGT_HASH}_${newLocalIndex}
		abort "Failed to send initial snapshot to target system"
	fi
	sleep 1
	# Set target to RO
	$SSH $ZFS set readonly=on $TGT_FS
	[ "$?" -ne "0" ] && abort "Failed to set remote filesystem $TGT_FS to read-only" # No need to remove local snapshot
}
 
function create_local_snapshot() {
	# Creates snapshot on local storage
	# uses argument $1
	[ -z "$1" ] && abort "Failed to get new snapshot index"
	$ZFS snapshot ${SRC_FS}@${TGT_HASH}_${1}
	[ "$?" -ne "0" ] && abort "Failed to create local snapshot. Check error message"
}
 
function open_remote_socket() {
	# Starts remote socket via SSH (as the control operation)
	# port is 3000 + three-digit random number
	let NC_PORT=3000+$RANDOM%1000
	$CONTROL_SSH "nc -l -i 90 $NC_PORT | $ZFS receive ${RECEIVE_FLAGS} $TGT_FS > /tmp/output 2>&1 ; sync"
	#$CONTROL_SSH "socat tcp4-listen:${NC_PORT} - | $ZFS receive ${RECEIVE_FLAGS} $TGT_FS > /tmp/output 2>&1 ; sync"
	#zfs send -R [email protected] | zfs receive -Fdvu zpnew
}
 
function send_zfs() {
	# Do the heavy lifting of opening remote socket and starting ZFS send/receive
	open_remote_socket
	sleep 1
	$ZFS send -ce -I ${SRC_FS}@${TGT_HASH}_${commonIndex} ${SRC_FS}@${TGT_HASH}_${newLocalIndex} | nc -i 90 $TGT_SYS $NC_PORT 
	#$ZFS send -ce -I ${SRC_FS}@${TGT_HASH}_${commonIndex} ${SRC_FS}@${TGT_HASH}_${newLocalIndex} | socat tcp4-connect:${TGT_SYS}:${NC_PORT} -
	sleep 20
 
}
 
function increment() {
	# Create a new snapshot with the index $localRecentIndex+1, and replicate it to the remote system
	# Baseline is the most recent common snapshot index $commonIndex
	RECEIVE_FLAGS="-Fsdvu" # With an 'F' flag maybe?
	# Handle the case of latest snapshot in DR is newer than current latest snapshot, due to mistaken deletion
	remoteSnaps=( $remoteSnaps )
	let remoteIndex=${#remoteSnaps[@]} # Get last snapshot on DR
	if [ ${localRecentIndex} -lt ${remoteIndex} ]
	then
		let newLocalIndex=${remoteIndex}+1
	else
		let newLocalIndex=localRecentIndex+1
	fi
	create_local_snapshot $newLocalIndex
 
	send_zfs
 
	# if [ "$?" -ne "0" ]
	# then
 
		# Cleanup current snapshot
		#delete_snapshot ${SRC_FS}@${TGT_HASH}_${newLocalIndex}
		#abort "Failed to send incremental snapshot to target system"
	# fi
	if ! verify_correctness
	then
 
		if ! loop_resume # If we can
		then
			# We either could not resume operation or failed to run with the required amount of iterations
			# For now we abort. 
			echo "Deleting local snapshot"
			delete_snapshot ${SRC_FS}@${TGT_HASH}_${newLocalIndex}
			abort "Remote snapshot should have the index of the latest snapshot, but it is not. The current remote snapshot index is ${commonIndex}"
		fi
	fi
}
 
function loop_resume() {
	# Attempts to loop over resuming until limit attempt has been reached
	REMOTE_TOKEN=$($SSH "$ZFS get -Ho value receive_resume_token ${TGT_FS}/${SRC_DIRNAME_FS}")
	if [ "$REMOTE_TOKEN" == "-" ]
	then
		return 1
	fi
	# We have a valid resume token. We will retry
	COUNT=1
	while [ "$COUNT" -le "$RESUME_LIMIT" ]
	do
		# For ease of handline - for each iteration, we will request the token again
		echo "Attempting resume operation" 
		REMOTE_TOKEN=$($SSH "$ZFS get -Ho value receive_resume_token ${TGT_FS}/${SRC_DIRNAME_FS}")
		let COUNT++
		open_remote_socket
		$ZFS send -e -t $REMOTE_TOKEN | nc -i 90 $TGT_SYS $NC_PORT
		#$ZFS send -e -t $REMOTE_TOKEN | socat tcp4-connect:${TGT_SYS}:${NC_PORT} -
		sleep 20
		if verify_correctness
		then
			echo "Done"
			return 0
		fi
	done
	# If we've reached here, we have failed to run the required iterations. Lets just verify again
	return 1
}
 
function verify_correctness() {
	# Check remote index, and verify it is correct with the current, latest snapshot
 
    if check_if_remote_snapshot_exists
    then
    	echo "Replication Successful"
    	return 0
    else
    	echo "Replication failed"
    	return 1
    fi
}
 
### MAIN ###
[ `whoami` != "root" ] && abort "This script has to be called by the root user"
[ -z "$1" ] && usage
parse_parameters $*
SRC_LOCK=`echo $SRC_FS | tr / _`
if [ -f ${LOCKDIR}/$SRC_LOCK ] 
then
	echo "Already locked. If should not be the case - remove ${LOCKDIR}/$SRC_LOCK"
	exit 1
fi
sanity
touch ${LOCKDIR}/$SRC_LOCK
construct_ssh_cmd
get_last_remote_snapshots # Have a string list of remoteSnaps
# If we dont have remote snapshot it should be initialization
if [ -z "$remoteSnaps" ]
then
	initialize
	echo "completed initialization. Done"
	remove_lock
	exit 0
fi
 
# We can get here only if it is not initialization
get_last_local_snapshots # Have a list (array) of localSnaps
find_matching_snapshot # Get the latest local index and the latest common index available
increment # Creates a new snapshot and sends/receives it
cleanup_snapshots # Cleans up old local snapshots
pkill -P $$
remove_lock
echo "Done"

A manual initial run should be called manually. If you expect a very long initial sync, you should run it in tmux to screen, to avoid failing in the middle.

To run the command, run it like this:

./clone_zfs_snapshots.sh share/my-data backuphost:share

This will create under the pool ‘share’ in the host ‘backuphost’ a filesystem matching the source (in this case: share/my-data) and set it to read-only. The script will create a snapshot with a unique name based on a shortened hash of the destination, with a counting number suffix, and start cloning the snapshot to the remote host. When called again, it will create a snapshot with the same name, but different index, and clone the delta to the remote host. In case of a disconnection, the clone will retry a few times before failing.

Note that the receiving side does not remove snapshots, so handling (too) old snapshots on the backup host remains up to you.

Attach multiple Oracle ASM snapshots to the same host

Thursday, September 12th, 2013

The goal – connecting multiple Oracle ASM snapshots (same source LUNs, of course) to the same machine. The next process will demonstrate how to do it.

Problem: ASM disks use a disk label called ASMLib to maintain access even when the logical disk path might change (like adding a LUN with a lower ID and rebooting the server). This solves a major problem which was experienced with RAW devices, when order changed, and the ‘wrong’ disks took the place of others. ASM labels are a vital part in managing ASM disks and ASM DiskGroups. Also – the ASM DiskGroup name should be unique. You cannot have multiple DiskGroups with the same name.

Limitations – you cannot connect the snapshot LUNs to the same server which has access to the source LUNs.

Process:

  1. Take a snapshot of the source LUN. If the ASM DiskGroup spans across several LUNs, you must create a consistency group (each storage device has its own lingo for the task).
  2. Map the snapshots to the target server (EMC – prepare EMC Snapshot Mount Points (SMP) in advance. Other storage devices – depending)
  3. Perform partprobe on all target servers.
  4. Run ‘service oracleasm scandisks‘ to scan for the new ASM disk labels. We will need to change them now, so that the additional snapshot will not use duplicate ASM labels.
  5. For each of the new ASM disks, run ‘service oracleasm force-renamedisk SRC_NAME TGT_NAME‘. You will want to rename the source name (SRC_NAME) to a unique target name, with some correlation to the snapshot name/purpose. This is the reasonable way of making some sense of a possibly very messy setup.
  6. As the Oracle user, with the correct PATH variables ($ORACLE_HOME should point to the CRS_HOME) and the right ORACLE_SID (for example – +ASM1), run: ‘renamedg phase=both dgname=SRC_DG_NAME newdgname=NEW_DG_NAME verbose=true‘. The value ‘SRC_DG_NAME’ represents the original (on the source) DiskGroup name, and the NEW_DG_NAME represents the new name. Much like when renaming the disks – the name should have some relationship with either the snapshot name, so you can find your hands and legs in this mess (again – imagine having six snapshots, each of a DiskGroup with four LUNs. Now – this is a mess).
  7. You can now mount the DiskGroup (named NEW_DG_NAME in my example) on both nodes

Assumptions:

  1. Oracle GI is up and running all through this process
  2. I tested it with Oracle 11.2.0.3. Other versions of 11.2.0.x might work, I have no clue about previous 11.1.x versions, or any earlier versions.
  3. It was tested on Linux. My primary work platform. It was, to be exact, on RHEL 6.4, but it should work just the same on any RHEL-like platform. I believe it will work on other Linux platforms. I have no clue about running it on any other Unix/Windows platform.
  4. The DiskGroup should not be mounted (no reason for it to be mounted right on discovery). Do not manually mount it prior to performing this procedure.

Good luck, and post a comment if you find this explanation either unclear, or if you encounter any problem.

 

 

 

Citrix XenServer 5.0 cannot cooperate with NetApp SnapMirror

Tuesday, September 8th, 2009

It has been a long while, I know. I was busy with life, work and everything around it. Not much worth mentioning.

This, however, is something else.

I have discovered an issue with Citrix XenServer 5.0 (probably the case with 5.5, but I have other issues with that release) using NetApp through NetApp API SR – Any non XenServer-generated snapshot will be deleted as soon as any snapshot-related action would be performed on that volume. Meaning that if I had manually created a snapshot called “1111” (short and easy to recognize, especially with all these UUID-based volumes, LUNs and snapshot names XenServer uses…), the next time anyone would create a snapshot of a machine which has a disk (VDI) on this specific volume, the snapshot, my snapshot, “1111” will be removed under that specific volume. The message seen in /var/log/SMlog would look like this:

Removing unused snap (1111)

While under normal operation, this does not matter much, as non-XenServer snapshots have little value, when using NetApp SnapMirror technology, the mechanism works a bit differently.

It appears that the SnapMirror system takes snapshots with predefined names (non-XenServer UUID type, luckily for us all). These snapshots include the entire changes performed since the last SnapMirror snapshots, and are used for replication. Unfortunately, XenServer deletes them. No SnapMirror snapshots, well, this is quite obvious, is it not? No SnapMirror…

We did not detect this problem immediately, and I should take the blame for that. I had to define a set of simple trial and error tests, as described above, instead of battling with a system I did not quite follow at that time – NetApp SnapMirror. Now I do, however, and I have this wonderful insight which can make your personal life, if you had issues with SnapMirror and XenServer, and did not know how to make it work, better. This solution cannot be an official one, due to its nature, which you will understand shortly. This is a personal patch for your pleasure, based on the hard fact that SnapMirror uses a predefined name for its snapshots. This name, in my case, is the name of the DR storage device. You must figure out what name is being used as part of the snapshot naming convention on your own site. Search for my ‘storagedr’ phrase, and replace it with yours.

This is the diff file for /opt/xensource/sm/NETAPPSR.py . Of course – back up your original file. Also – this is not an official patch. It was tested to function correctly on XenServer 5.0, and it will not work on XenServer 5.5 (since NETAPPSR.py is different). Last warning – it might break on the next update or upgrade you have for your XenServer environment, and if that happens, you better monitor your SnapMirror status closely then.

400,403c400,404
<                     util.SMlog("Removing unused snap (%s)" % val)
<                     out = netapplib.fvol_snapdelete_wrapper(self.sv, val, volname)
<                     if not na_test_result(out):
 		    if 'storagedr' not in val:
>                     	util.SMlog("Removing unused snap (%s)" % val)
>                     	out = netapplib.fvol_snapdelete_wrapper(self.sv, val, volname)
>                     	if not na_test_result(out):
>                         	pass

Hope it helps!

MySQL permissions for LVM Snapshots

Thursday, October 23rd, 2008

aking LVM snapshots as a mean of backing up MySQL is rather simple, as can be described here. However, if you are into security, you would strive to grant minimal permissions for the action to the MySQL user. Per MySQL Documentation, the required privileges is “RELOAD”. That should be enough, granted on *.*, of course.

Quick provisioning of virtual machines

Friday, February 1st, 2008

When one wants to achieve fast provisioning of virtual machines, some solutions might come into account. The one I prefer uses Linux LVM snapshot capabilities to duplicate one working machine into few.

This can happen, of course, only if the host running VMware-Server is Linux.

LVM snapshots have one vast disadvantage – performance. When a block on the source of the snapshot is being changed for the first time, the original block is being replicated to each and every snapshot COOW space. It means that a creation of a 1GB file on a volume having ten snapshots means a total copy of 10GB of data across your disks. You cannot ignore this performance impact.

LVM2 has support for read/write snapshots. I have come up with a nice way of utilizing this capability to my benefit. An R/W snapshot which is being changed does not replicate its changes to any other snapshot. All changes are considered local to this snapshot, and are being maintained only in its COOW space. So adding a 1GB file to a snapshot has zero impact on the rest of the snapshots or volumes.

The idea is quite simple, and it works like this:

1. Create adequate logical volume with a given size (I used 9GB for my own purposes). The name of the LV in my case will be /dev/VGVM3/centos-base

2. Mount this LV on a directory, and create a VM inside it. In my case, it’s in /vmware/centos-base

3. Install the VM as the baseline for all your future VMs. If you might not want Apache on some of them, don’t install it on the baseline.

4. Install vmware-tools on the baseline.

5. Disable the service “kudzu”

6. Update as required

7. In my case I always use DHCP. You can set it to obtain its IP once from a given location, or whatever you feel like.

8. Shut down the VM.

9. In the VM’s .vmx file add a line like this:

uuid.action = “create”

I have added below (expand to read) two scripts which will create the snapshot, mount it and register it, including new MAC and UUID.

Press below for the scripts I have used to create and destroy VMs

create-replica.sh:

#!/bin/sh
# This script will replicate vms from a given (predefined) source to a new system
# Written by Ez-Aton, http://www.tournament.org.il/run
# Arguments: name

# FUNCITONS BE HERE
test_can_do () {
# To be able to snapshot, we need a set of things to happen
if [ -d $DIR/$TARGET ] ; then
echo “Directory already exists. You don’t want to do it…”
exit 1
fi
if [ -f $VG/$TARGET ] ; then
echo “Target snapshot exists”
exit 1
fi
if [ `vmrun list | grep -c $DIR/$SRC/$SRC.vmx` -gt “0” ] ; then
echo “Source VM is still running. Shut it down before proceeding”
exit 1
fi
if [ `vmware-cmd -l | grep -c $DIR/$TARGET/$SRC.vmx` -ne “0” ] ; then
echo “VM already registered. Unregister first”
exit 1
fi
}

do_snapshot () {
# Take the snapshot
lvcreate -s -n $TARGET -L $SNAPSIZE $VG/$SRC
RET=$?
if [ “$RET” -ne “0” ]; then
echo “Failed to create snapshot”
exit 1
fi
}

mount_snapshot () {
# This function creates the required directories and mounts the snapshot there
mkdir $DIR/$TARGET
mount $VG/$TARGET $DIR/$TARGET
RET=$?
if [ “$RET” -ne “0” ]; then
echo “Failed to mount snapshot”
exit 1
fi
}

alter_snap_vmx () {
# This function will alter the name in the VMX and make it the $TARGET name
cat $DIR/$TARGET/$SRC.vmx | grep -v “displayName” > $DIR/$TARGET/$TARGET.vmx
echo “displayName = “$TARGET”” >> $DIR/$TARGET/$TARGET.vmx
cat $DIR/$TARGET/$TARGET.vmx > $DIR/$TARGET/$SRC.vmx
rm $DIR/$TARGET/$TARGET.vmx
}

register_vm () {
# This function will register the VM to VMWARE
vmware-cmd -s register $DIR/$TARGET/$SRC.vmx
}

# MAIN
if [ -z “$1” ]; then
echo “Arguments: The target name”
exit 1
fi

# Parameters:
SRC=centos-base         #The name of the source image, and the source dir
PREFIX=centos             #All targets will be created in the name centos-$NAME
DIR=/vmware               #My VMware VMs default dir
SNAPSIZE=6G              #My COOW space
VG=/dev/VGVM3           #The name of the VG
TARGET=”$PREFIX-$1″

test_can_do
do_snapshot
mount_snapshot
alter_snap_vmx
register_vm
exit 0

remove-replica.sh:

#!/bin/sh
# This script will remove a snapshot machine
# Written by Ez-Aton, http://www.tournament.org.il/run
# Arguments: machine name

#FUNCTIONS
does_it_exist () {
# Check if the described VM exists
if [ `vmware-cmd -l | grep -c $DIR/$TARGET/$SRC.vmx` -eq “0” ]; then
echo “No such VM”
exit 1
fi
if [ ! -e $VG/$TARGET ]; then
echo “There is no matching snapshot volume”
exit 1
fi
if [ `lvs $VG/$TARGET | awk ‘{print $5}’ | grep -c $SRC` -eq “0” ]; then
echo “This is not a snapshot, or a snapshot of the wrong LV”
exit 1
fi
}

ask_a_thousand_times () {
# This function verifies that the right thing is actually done
echo “You are about to remove a virtual machine and an LVM. Details:”
echo “Machine name: $TARGET”
echo “Logical Volume: $VG/$TARGET”
echo -n “Are you sure? (y/N): ”
read RES
if [ “$RES” != “Y” ]&&[ “$RES” != “y” ]; then
echo “Decided not to do it”
exit 0
fi
echo “”
echo “You have asked to remove this machine”
echo -n “Again: Are you sure? (y/N): ”
read RES
if [ “$RES” != “Y” ]&&[ “$RES” != “y” ]; then
echo “Decided not to do it”
exit 0
fi
echo “Removing VM and snapshot”
}

shut_down_vm () {
# Shut down the VM and unregister it
vmware-cmd $DIR/$TARGET/$SRC.vmx stop hard
vmware-cmd -s unregister $DIR/$TARGET/$SRC.vmx
}

remove_snapshot () {
# Umount and remove the snapshot
umount $DIR/$TARGET
RET=$?
if [ “$RET” -ne “0” ]; then
echo “Cannot umount $DIR/$TARGET”
exit 1
fi
lvremove -f $VG/$TARGET
RET=$?
if [ “$RET” -ne “0” ]; then
echo “Cannot remove snapshot LV”
exit 1
fi
}

remove_dir () {
# Removes the mount point
rmdir $DIR/$TARGET
}

#MAIN
if [ -z “$1” ]; then
echo “No machine name. Exiting”
exit 1
fi

#PARAMETERS:
DIR=/vmware                #VMware default VMs location
VG=/dev/VGVM3            #The name of the VG
PREFIX=centos              #Prefix to the name. All these VMs will be called centos-$NAME
TARGET=”$PREFIX-$1″
SRC=centos-base           #The name of the baseline image, LVM, etc. All are the same

does_it_exist
ask_a_thousand_times
shut_down_vm
remove_snapshot
remove_dir

exit 0

Pros:

1. Very fast provisioning. It takes almost five seconds, and that’s because my server is somewhat loaded.

2. Dependable: KISS at its marvel.

3. Conservative on space

4. Conservative on I/O load (unlike the traditional use of LVM snapshot, as explained in the beginning of this section).

Cons:

1. Cannot streamline the contents of snapshot into the main image (LVM team will implement it in the future, I think)

2. Cannot take a snapshot of a snapshot (same as above)

3. If the COOW space of any of the snapshots is full (viewable through the command ‘lvs‘) then on boot, the source LV might not become active (confirmed RH4 bug, and this is the system I have used)

4. My script does not edit/alter /etc/fstab (I have decided it to be rather risky, and it was not worth the effort at this time)

5. My script does not check if there is enough available space in the VG. Not required, as it will fail if creation of LV will fail

You are most welcome to contribute any further changes done to this script. Please maintain my URL in the script if you decide to use it.

Thanks!