ZFS clone script

ZFS has some magical features, comparable to NetApp’s WAFL capabilities. One of the less-used on is the ZFS send/receive, which can be utilised as an engine below something much like NetApp’s SnapMirror or SnapVault.

The idea, if you are not familiar with NetApp’s products, is to take a snapshot of a dataset on the source, and clone it to a remote storage. Then, take another snapshot, and clone only the delta between both snapshots, and so on. This allows for cloning block-level changes only, which reduces clone payload and the time required to clone it.

Copy and save this file as clone_zfs_snapshots.sh. Give it execution permissions.

#!/bin/bash
# This script will clone ZFS snapshots incrementally over SSH to a target server
# Snapshot name structure: [email protected]${TGT_HASH}_INT ; where INT is an increment number
# Written by Etzion. Feel free to use. See more stuff in my blog at https://run.tournament.org.il
# Arguments:
# $1: ZFS filesystem name
# $2: (target ZFS system):(target ZFS filesystem)

IAM=$0
ZFS=/sbin/zfs
LOCKDIR=/dev/shm
LOCAL_SNAPS_TO_LEAVE=3
RESUME_LIMIT=3

### FUNCTIONS ###

# Sanity and usage
function usage() {
	echo "Usage: $IAM SRC REMOTE_SERVER:ZFS_TARGET (port=SSH_PORT)"
	echo "ZFS_TARGET is the parent of filesystems which will be created with the original source names"
	echo "Example: $IAM share/test backupsrv:backup"
	echo "It will create a filesystem 'test' under the pool 'backup' on 'backupsrv' with clone"
	echo "of the current share/test ZFS filesystem"
	echo "This script is (on purpose) not a recursive script"
	echo "For the script to work correctly, it *must* have SSH key exchanged from source to target"
	exit 0
}

function abort() {
	# exit errorously with a message
	echo "[email protected]"
	pkill -P $$
	remove_lock
	exit 1
}

function parse_parameters() {
	# Parses command line parameters
	# called with $*
	SRC_FS=$1
	shift
	TGT=$1
	shift
	for i in $*
	do
		case ${i} in
			port=*)	PORT=${i##*=}
			;;
			hash=*)	HASH=${i##*=}
			;;
		esac
	done
	TGT_SYS=${TGT%%:*}
	TGT_FS=${TGT##*:}
	# Use a short substring of MD5sum of the target name for later unique identification
	SRC_DIRNAME_FS=${SRC_FS#*/}
	if [ -z "$hash" ]
	then
		TGT_FULLHASH="`echo $TGT_FS/${SRC_DIRNAME_FS} | md5sum -`"
		TGT_HASH=${TGT_FULLHASH:1:7}
	else
		TGT_HASH=${hash}
	fi

}

function sanity() {
	# Verify we have all details
	[ -z "$SRC_FS" ] && usage
	[ -z "$TGT_FS" ] && usage
	[ -z "$TGT_SYS" ] && usage
	$ZFS list -H -o name $SRC_FS > /dev/null 2>&1 || abort "Source filesystem $SRC_FS does not exist"
	# check_target_fs || abort "Target ZFS filesystem $TGT_FS on $TGT_SYS does not exist, or not imported"
}

function remove_lock() {
	# Removes the lock file
	\rm -f ${LOCKDIR}/$SRC_LOCK
}

function construct_ssh_cmd() {
	# Constract the remote SSH command
	# Here is a good place to put atomic parameters used for the SSH
	[ -z "${PORT}" ] && PORT=22
	SSH="ssh -p $PORT $TGT_SYS -o ConnectTimeout=3"
	CONTROL_SSH="$SSH -f"
}

function get_last_remote_snapshots() {
	# Gets the last snapshot name on a remote system, to match it to our snapshots
	remoteSnapTmpObj=`$SSH "$ZFS list -H -t snapshot -r -o name ${TGT_FS}/${SRC_DIRNAME_FS}" | grep ${SRC_DIRNAME_FS}@ | grep ${TGT_HASH}`
	# Create a list of all snapshot indexes. Empty means its the first one
	remoteSnaps=""
	for snapIter in ${remoteSnapTmpObj}
	do
	  remoteSnaps="$remoteSnaps ${snapIter##*@${TGT_HASH}_}"
	done
}

function check_if_remote_snapshot_exists() {
	# Argument: $1 ->; Name of snapshot
	# Checks if this snapshot exists on remote node
	$SSH "$ZFS list -H -t snapshot -r -o name ${TGT_FS}/${SRC_DIRNAME_FS}@${TGT_HASH}_${newLocalIndex}"
	return $?
}

function get_last_local_snapshots() {
	# This function will return an array of local existing snapshots using the existing TGT_HASH
    localSnapTmpObj=`$ZFS list -H -t snapshot -r -o name $SRC_FS | grep [email protected] | grep $TGT_HASH `
    # Convert into a list and remove the HASH and everything before it. We should have clear list of indexes
    localSnapList=""
    for snapIter in ${localSnapTmpObj}
    do
    	localSnapList="$localSnapList ${snapIter##*@${TGT_HASH}_}"
    done
    # Convert object to array
    localSnapList=( $localSnapList )
    # Get the last object
    let localSnapArrayObj=${#localSnapList[@]}-1
}

function delete_snapshot() {
	# This function will delete a snapshot
	# arguments: $1 -> snapshot name
	[ -z "$1" ] && abort "Cleanup snapshot got no arguments"
	$ZFS destroy $1
	#$ZFS destroy ${SRC_FS}@${TGT_HASH}_${newLocalIndex}
}

function find_matching_snapshot() {
	# This function will attempt to find a matching snapshot as a replication baseline
	# Gets the latest local snapshot index
	localRecentIndex=${localSnapList[$localSnapArrayObj]}
    # Gets the latest mutual snapshot index
    while [ $localSnapArrayObj -ge 0 ]
    do
    	# Check if the current counter already exists
    	if echo "$remoteSnaps" | grep -w ${localSnapList[$localSnapArrayObj]} > /dev/null 2>&1
    	then
    		# We know the mutual index.
    		commonIndex=${localSnapList[$localSnapArrayObj]}
    		return 0
    	fi
    	let localSnapArrayObj--
    done
    # If we've reached here - there is no mutual index!
    abort "There is no mutual snapshot index, you will have to resync"
}

function cleanup_snapshots() {
	# Creates a list of snapshots to delete and then calls delete_snapshot function
	# We are using the most recent common index, $localSnapArrayObj as the latest reference for deletion
	let deleteArrayObj=$localSnapArrayObj-${LOCAL_SNAPS_TO_LEAVE}
	snapsToDelete=""
	# Construct a list of snapshots to delete, and delete it in reverse order
	while [ $deleteArrayObj -ge 0 ]
	do
		# Construct snapshot name
		snapsToDelete="$snapsToDelete ${SRC_FS}@${TGT_HASH}_${localSnapList[$deleteArrayObj]}"
		let deleteArrayObj--
	done
	snapsToDelete=( $snapsToDelete )

	snapDelete=0
	
	while [ $snapDelete -lt ${#snapsToDelete[@]} ]
	do
		# Delete snapshot
		delete_snapshot ${snapsToDelete[$snapDelete]}
		let snapDelete++
	done
}

function initialize() {
	# This is a unique case where we initialize the first sync
	# We will call this procedure when $remoteSnaps is empty (meaning that there was no snapshot whatsoever)
	# We have to verify that the target has no existing old snapshots here
	# is it empty?
	echo "Going to perform an initialization replication. It might wipe the target $TGT_FS completely"
	echo "Press Enter to proceed, or Ctrl+C to abort"
	read "abc"
	### Decided to remove this check
	### [ -n "$LOCSNAP_LIST" ] && abort "No target snapshots while local history snapshots exists. Clean up history and try again"
	RECEIVE_FLAGS="-sFdvu"
	newLocalIndex=1
	# NEW_LOC_INDEX=1
	create_local_snapshot $newLocalIndex
	open_remote_socket
	sleep 1
	$ZFS send -ce ${SRC_FS}@${TGT_HASH}_${newLocalIndex} | nc $TGT_SYS $NC_PORT 2>&1
	if [ "$?" -ne "0" ]
	then
		# Do no cleanup current snapshot
		# delete_snapshot ${SRC_FS}@${TGT_HASH}_${newLocalIndex}
		abort "Failed to send initial snapshot to target system"
	fi
	sleep 1
	# Set target to RO
	$SSH $ZFS set readonly=on $TGT_FS
	[ "$?" -ne "0" ] && abort "Failed to set remote filesystem $TGT_FS to read-only" # No need to remove local snapshot
}

function create_local_snapshot() {
	# Creates snapshot on local storage
	# uses argument $1
	[ -z "$1" ] && abort "Failed to get new snapshot index"
	$ZFS snapshot ${SRC_FS}@${TGT_HASH}_${1}
	[ "$?" -ne "0" ] && abort "Failed to create local snapshot. Check error message"
}

function open_remote_socket() {
	# Starts remote socket via SSH (as the control operation)
	# port is 3000 + three-digit random number
	let NC_PORT=3000+$RANDOM%1000
	$CONTROL_SSH "nc -l -i 90 $NC_PORT | $ZFS receive ${RECEIVE_FLAGS} $TGT_FS > /tmp/output 2>&1 ; sync"
	#$CONTROL_SSH "socat tcp4-listen:${NC_PORT} - | $ZFS receive ${RECEIVE_FLAGS} $TGT_FS > /tmp/output 2>&1 ; sync"
	#zfs send -R [email protected] | zfs receive -Fdvu zpnew
}

function send_zfs() {
	# Do the heavy lifting of opening remote socket and starting ZFS send/receive
	open_remote_socket
	sleep 1
	$ZFS send -ce -I ${SRC_FS}@${TGT_HASH}_${commonIndex} ${SRC_FS}@${TGT_HASH}_${newLocalIndex} | nc -i 90 $TGT_SYS $NC_PORT 
	#$ZFS send -ce -I ${SRC_FS}@${TGT_HASH}_${commonIndex} ${SRC_FS}@${TGT_HASH}_${newLocalIndex} | socat tcp4-connect:${TGT_SYS}:${NC_PORT} -
	sleep 20

}

function increment() {
	# Create a new snapshot with the index $localRecentIndex+1, and replicate it to the remote system
	# Baseline is the most recent common snapshot index $commonIndex
	RECEIVE_FLAGS="-Fsdvu" # With an 'F' flag maybe?
	# Handle the case of latest snapshot in DR is newer than current latest snapshot, due to mistaken deletion
	remoteSnaps=( $remoteSnaps )
	let remoteIndex=${#remoteSnaps[@]} # Get last snapshot on DR
	if [ ${localRecentIndex} -lt ${remoteIndex} ]
	then
		let newLocalIndex=${remoteIndex}+1
	else
		let newLocalIndex=localRecentIndex+1
	fi
	create_local_snapshot $newLocalIndex

	send_zfs

	# if [ "$?" -ne "0" ]
	# then

		# Cleanup current snapshot
		#delete_snapshot ${SRC_FS}@${TGT_HASH}_${newLocalIndex}
		#abort "Failed to send incremental snapshot to target system"
	# fi
	if ! verify_correctness
	then

		if ! loop_resume # If we can
		then
			# We either could not resume operation or failed to run with the required amount of iterations
			# For now we abort. 
			echo "Deleting local snapshot"
			delete_snapshot ${SRC_FS}@${TGT_HASH}_${newLocalIndex}
			abort "Remote snapshot should have the index of the latest snapshot, but it is not. The current remote snapshot index is ${commonIndex}"
		fi
	fi
}

function loop_resume() {
	# Attempts to loop over resuming until limit attempt has been reached
	REMOTE_TOKEN=$($SSH "$ZFS get -Ho value receive_resume_token ${TGT_FS}/${SRC_DIRNAME_FS}")
	if [ "$REMOTE_TOKEN" == "-" ]
	then
		return 1
	fi
	# We have a valid resume token. We will retry
	COUNT=1
	while [ "$COUNT" -le "$RESUME_LIMIT" ]
	do
		# For ease of handline - for each iteration, we will request the token again
		echo "Attempting resume operation" 
		REMOTE_TOKEN=$($SSH "$ZFS get -Ho value receive_resume_token ${TGT_FS}/${SRC_DIRNAME_FS}")
		let COUNT++
		open_remote_socket
		$ZFS send -e -t $REMOTE_TOKEN | nc -i 90 $TGT_SYS $NC_PORT
		#$ZFS send -e -t $REMOTE_TOKEN | socat tcp4-connect:${TGT_SYS}:${NC_PORT} -
		sleep 20
		if verify_correctness
		then
			echo "Done"
			return 0
		fi
	done
	# If we've reached here, we have failed to run the required iterations. Lets just verify again
	return 1
}

function verify_correctness() {
	# Check remote index, and verify it is correct with the current, latest snapshot

    if check_if_remote_snapshot_exists
    then
    	echo "Replication Successful"
    	return 0
    else
    	echo "Replication failed"
    	return 1
    fi
}

### MAIN ###
[ `whoami` != "root" ] && abort "This script has to be called by the root user"
[ -z "$1" ] && usage
parse_parameters $*
SRC_LOCK=`echo $SRC_FS | tr / _`
if [ -f ${LOCKDIR}/$SRC_LOCK ] 
then
	echo "Already locked. If should not be the case - remove ${LOCKDIR}/$SRC_LOCK"
	exit 1
fi
sanity
touch ${LOCKDIR}/$SRC_LOCK
construct_ssh_cmd
get_last_remote_snapshots # Have a string list of remoteSnaps
# If we dont have remote snapshot it should be initialization
if [ -z "$remoteSnaps" ]
then
	initialize
	echo "completed initialization. Done"
	remove_lock
	exit 0
fi

# We can get here only if it is not initialization
get_last_local_snapshots # Have a list (array) of localSnaps
find_matching_snapshot # Get the latest local index and the latest common index available
increment # Creates a new snapshot and sends/receives it
cleanup_snapshots # Cleans up old local snapshots
pkill -P $$
remove_lock
echo "Done"

A manual initial run should be called manually. If you expect a very long initial sync, you should run it in tmux to screen, to avoid failing in the middle.

To run the command, run it like this:

./clone_zfs_snapshots.sh share/my-data backuphost:share

This will create under the pool ‘share’ in the host ‘backuphost’ a filesystem matching the source (in this case: share/my-data) and set it to read-only. The script will create a snapshot with a unique name based on a shortened hash of the destination, with a counting number suffix, and start cloning the snapshot to the remote host. When called again, it will create a snapshot with the same name, but different index, and clone the delta to the remote host. In case of a disconnection, the clone will retry a few times before failing.

Note that the receiving side does not remove snapshots, so handling (too) old snapshots on the backup host remains up to you.

Tags: , , , , , ,

Leave a Reply