ZFS clone script
ZFS has some magical features, comparable to NetApp’s WAFL capabilities. One of the less-used on is the ZFS send/receive, which can be utilised as an engine below something much like NetApp’s SnapMirror or SnapVault.
The idea, if you are not familiar with NetApp’s products, is to take a snapshot of a dataset on the source, and clone it to a remote storage. Then, take another snapshot, and clone only the delta between both snapshots, and so on. This allows for cloning block-level changes only, which reduces clone payload and the time required to clone it.
Copy and save this file as clone_zfs_snapshots.sh. Give it execution permissions.
#!/bin/bash
# This script will clone ZFS snapshots incrementally over SSH to a target server
# Snapshot name structure: filesystem@${TGT_HASH}_INT ; where INT is an increment number
# Written by Etzion. Feel free to use. See more stuff in my blog at https://run.tournament.org.il
# Arguments:
# $1: ZFS filesystem name
# $2: (target ZFS system):(target ZFS filesystem)
IAM=$0
ZFS=/sbin/zfs
LOCKDIR=/dev/shm
LOCAL_SNAPS_TO_LEAVE=3
RESUME_LIMIT=3
### FUNCTIONS ###
# Sanity and usage
function usage() {
echo "Usage: $IAM SRC REMOTE_SERVER:ZFS_TARGET (port=SSH_PORT)"
echo "ZFS_TARGET is the parent of filesystems which will be created with the original source names"
echo "Example: $IAM share/test backupsrv:backup"
echo "It will create a filesystem 'test' under the pool 'backup' on 'backupsrv' with clone"
echo "of the current share/test ZFS filesystem"
echo "This script is (on purpose) not a recursive script"
echo "For the script to work correctly, it *must* have SSH key exchanged from source to target"
exit 0
}
function abort() {
# exit errorously with a message
echo "$@"
pkill -P $$
remove_lock
exit 1
}
function parse_parameters() {
# Parses command line parameters
# called with $*
SRC_FS=$1
shift
TGT=$1
shift
for i in $*
do
case ${i} in
port=*) PORT=${i##*=}
;;
hash=*) HASH=${i##*=}
;;
esac
done
TGT_SYS=${TGT%%:*}
TGT_FS=${TGT##*:}
# Use a short substring of MD5sum of the target name for later unique identification
SRC_DIRNAME_FS=${SRC_FS#*/}
if [ -z "$hash" ]
then
TGT_FULLHASH="`echo $TGT_FS/${SRC_DIRNAME_FS} | md5sum -`"
TGT_HASH=${TGT_FULLHASH:1:7}
else
TGT_HASH=${hash}
fi
}
function sanity() {
# Verify we have all details
[ -z "$SRC_FS" ] && usage
[ -z "$TGT_FS" ] && usage
[ -z "$TGT_SYS" ] && usage
$ZFS list -H -o name $SRC_FS > /dev/null 2>&1 || abort "Source filesystem $SRC_FS does not exist"
# check_target_fs || abort "Target ZFS filesystem $TGT_FS on $TGT_SYS does not exist, or not imported"
}
function remove_lock() {
# Removes the lock file
\rm -f ${LOCKDIR}/$SRC_LOCK
}
function construct_ssh_cmd() {
# Constract the remote SSH command
# Here is a good place to put atomic parameters used for the SSH
[ -z "${PORT}" ] && PORT=22
SSH="ssh -p $PORT $TGT_SYS -o ConnectTimeout=3"
CONTROL_SSH="$SSH -f"
}
function get_last_remote_snapshots() {
# Gets the last snapshot name on a remote system, to match it to our snapshots
remoteSnapTmpObj=`$SSH "$ZFS list -H -t snapshot -r -o name ${TGT_FS}/${SRC_DIRNAME_FS}" | grep ${SRC_DIRNAME_FS}@ | grep ${TGT_HASH}`
# Create a list of all snapshot indexes. Empty means its the first one
remoteSnaps=""
for snapIter in ${remoteSnapTmpObj}
do
remoteSnaps="$remoteSnaps ${snapIter##*@${TGT_HASH}_}"
done
}
function check_if_remote_snapshot_exists() {
# Argument: $1 ->; Name of snapshot
# Checks if this snapshot exists on remote node
$SSH "$ZFS list -H -t snapshot -r -o name ${TGT_FS}/${SRC_DIRNAME_FS}@${TGT_HASH}_${newLocalIndex}"
return $?
}
function get_last_local_snapshots() {
# This function will return an array of local existing snapshots using the existing TGT_HASH
localSnapTmpObj=`$ZFS list -H -t snapshot -r -o name $SRC_FS | grep $SRC_FS@ | grep $TGT_HASH `
# Convert into a list and remove the HASH and everything before it. We should have clear list of indexes
localSnapList=""
for snapIter in ${localSnapTmpObj}
do
localSnapList="$localSnapList ${snapIter##*@${TGT_HASH}_}"
done
# Convert object to array
localSnapList=( $localSnapList )
# Get the last object
let localSnapArrayObj=${#localSnapList[@]}-1
}
function delete_snapshot() {
# This function will delete a snapshot
# arguments: $1 -> snapshot name
[ -z "$1" ] && abort "Cleanup snapshot got no arguments"
$ZFS destroy $1
#$ZFS destroy ${SRC_FS}@${TGT_HASH}_${newLocalIndex}
}
function find_matching_snapshot() {
# This function will attempt to find a matching snapshot as a replication baseline
# Gets the latest local snapshot index
localRecentIndex=${localSnapList[$localSnapArrayObj]}
# Gets the latest mutual snapshot index
while [ $localSnapArrayObj -ge 0 ]
do
# Check if the current counter already exists
if echo "$remoteSnaps" | grep -w ${localSnapList[$localSnapArrayObj]} > /dev/null 2>&1
then
# We know the mutual index.
commonIndex=${localSnapList[$localSnapArrayObj]}
return 0
fi
let localSnapArrayObj--
done
# If we've reached here - there is no mutual index!
abort "There is no mutual snapshot index, you will have to resync"
}
function cleanup_snapshots() {
# Creates a list of snapshots to delete and then calls delete_snapshot function
# We are using the most recent common index, $localSnapArrayObj as the latest reference for deletion
let deleteArrayObj=$localSnapArrayObj-${LOCAL_SNAPS_TO_LEAVE}
snapsToDelete=""
# Construct a list of snapshots to delete, and delete it in reverse order
while [ $deleteArrayObj -ge 0 ]
do
# Construct snapshot name
snapsToDelete="$snapsToDelete ${SRC_FS}@${TGT_HASH}_${localSnapList[$deleteArrayObj]}"
let deleteArrayObj--
done
snapsToDelete=( $snapsToDelete )
snapDelete=0
while [ $snapDelete -lt ${#snapsToDelete[@]} ]
do
# Delete snapshot
delete_snapshot ${snapsToDelete[$snapDelete]}
let snapDelete++
done
}
function initialize() {
# This is a unique case where we initialize the first sync
# We will call this procedure when $remoteSnaps is empty (meaning that there was no snapshot whatsoever)
# We have to verify that the target has no existing old snapshots here
# is it empty?
echo "Going to perform an initialization replication. It might wipe the target $TGT_FS completely"
echo "Press Enter to proceed, or Ctrl+C to abort"
read "abc"
### Decided to remove this check
### [ -n "$LOCSNAP_LIST" ] && abort "No target snapshots while local history snapshots exists. Clean up history and try again"
RECEIVE_FLAGS="-sFdvu"
newLocalIndex=1
# NEW_LOC_INDEX=1
create_local_snapshot $newLocalIndex
open_remote_socket
sleep 1
$ZFS send -ce ${SRC_FS}@${TGT_HASH}_${newLocalIndex} | nc $TGT_SYS $NC_PORT 2>&1
if [ "$?" -ne "0" ]
then
# Do no cleanup current snapshot
# delete_snapshot ${SRC_FS}@${TGT_HASH}_${newLocalIndex}
abort "Failed to send initial snapshot to target system"
fi
sleep 1
# Set target to RO
$SSH $ZFS set readonly=on $TGT_FS
[ "$?" -ne "0" ] && abort "Failed to set remote filesystem $TGT_FS to read-only" # No need to remove local snapshot
}
function create_local_snapshot() {
# Creates snapshot on local storage
# uses argument $1
[ -z "$1" ] && abort "Failed to get new snapshot index"
$ZFS snapshot ${SRC_FS}@${TGT_HASH}_${1}
[ "$?" -ne "0" ] && abort "Failed to create local snapshot. Check error message"
}
function open_remote_socket() {
# Starts remote socket via SSH (as the control operation)
# port is 3000 + three-digit random number
let NC_PORT=3000+$RANDOM%1000
$CONTROL_SSH "nc -l -i 90 $NC_PORT | $ZFS receive ${RECEIVE_FLAGS} $TGT_FS > /tmp/output 2>&1 ; sync"
#$CONTROL_SSH "socat tcp4-listen:${NC_PORT} - | $ZFS receive ${RECEIVE_FLAGS} $TGT_FS > /tmp/output 2>&1 ; sync"
#zfs send -R zp03@01 | zfs receive -Fdvu zpnew
}
function send_zfs() {
# Do the heavy lifting of opening remote socket and starting ZFS send/receive
open_remote_socket
sleep 1
$ZFS send -ce -I ${SRC_FS}@${TGT_HASH}_${commonIndex} ${SRC_FS}@${TGT_HASH}_${newLocalIndex} | nc -i 90 $TGT_SYS $NC_PORT
#$ZFS send -ce -I ${SRC_FS}@${TGT_HASH}_${commonIndex} ${SRC_FS}@${TGT_HASH}_${newLocalIndex} | socat tcp4-connect:${TGT_SYS}:${NC_PORT} -
sleep 20
}
function increment() {
# Create a new snapshot with the index $localRecentIndex+1, and replicate it to the remote system
# Baseline is the most recent common snapshot index $commonIndex
RECEIVE_FLAGS="-Fsdvu" # With an 'F' flag maybe?
# Handle the case of latest snapshot in DR is newer than current latest snapshot, due to mistaken deletion
remoteSnaps=( $remoteSnaps )
let remoteIndex=${#remoteSnaps[@]} # Get last snapshot on DR
if [ ${localRecentIndex} -lt ${remoteIndex} ]
then
let newLocalIndex=${remoteIndex}+1
else
let newLocalIndex=localRecentIndex+1
fi
create_local_snapshot $newLocalIndex
send_zfs
# if [ "$?" -ne "0" ]
# then
# Cleanup current snapshot
#delete_snapshot ${SRC_FS}@${TGT_HASH}_${newLocalIndex}
#abort "Failed to send incremental snapshot to target system"
# fi
if ! verify_correctness
then
if ! loop_resume # If we can
then
# We either could not resume operation or failed to run with the required amount of iterations
# For now we abort.
echo "Deleting local snapshot"
delete_snapshot ${SRC_FS}@${TGT_HASH}_${newLocalIndex}
abort "Remote snapshot should have the index of the latest snapshot, but it is not. The current remote snapshot index is ${commonIndex}"
fi
fi
}
function loop_resume() {
# Attempts to loop over resuming until limit attempt has been reached
REMOTE_TOKEN=$($SSH "$ZFS get -Ho value receive_resume_token ${TGT_FS}/${SRC_DIRNAME_FS}")
if [ "$REMOTE_TOKEN" == "-" ]
then
return 1
fi
# We have a valid resume token. We will retry
COUNT=1
while [ "$COUNT" -le "$RESUME_LIMIT" ]
do
# For ease of handline - for each iteration, we will request the token again
echo "Attempting resume operation"
REMOTE_TOKEN=$($SSH "$ZFS get -Ho value receive_resume_token ${TGT_FS}/${SRC_DIRNAME_FS}")
let COUNT++
open_remote_socket
$ZFS send -e -t $REMOTE_TOKEN | nc -i 90 $TGT_SYS $NC_PORT
#$ZFS send -e -t $REMOTE_TOKEN | socat tcp4-connect:${TGT_SYS}:${NC_PORT} -
sleep 20
if verify_correctness
then
echo "Done"
return 0
fi
done
# If we've reached here, we have failed to run the required iterations. Lets just verify again
return 1
}
function verify_correctness() {
# Check remote index, and verify it is correct with the current, latest snapshot
if check_if_remote_snapshot_exists
then
echo "Replication Successful"
return 0
else
echo "Replication failed"
return 1
fi
}
### MAIN ###
[ `whoami` != "root" ] && abort "This script has to be called by the root user"
[ -z "$1" ] && usage
parse_parameters $*
SRC_LOCK=`echo $SRC_FS | tr / _`
if [ -f ${LOCKDIR}/$SRC_LOCK ]
then
echo "Already locked. If should not be the case - remove ${LOCKDIR}/$SRC_LOCK"
exit 1
fi
sanity
touch ${LOCKDIR}/$SRC_LOCK
construct_ssh_cmd
get_last_remote_snapshots # Have a string list of remoteSnaps
# If we dont have remote snapshot it should be initialization
if [ -z "$remoteSnaps" ]
then
initialize
echo "completed initialization. Done"
remove_lock
exit 0
fi
# We can get here only if it is not initialization
get_last_local_snapshots # Have a list (array) of localSnaps
find_matching_snapshot # Get the latest local index and the latest common index available
increment # Creates a new snapshot and sends/receives it
cleanup_snapshots # Cleans up old local snapshots
pkill -P $$
remove_lock
echo "Done"
A manual initial run should be called manually. If you expect a very long initial sync, you should run it in tmux to screen, to avoid failing in the middle.
To run the command, run it like this:
./clone_zfs_snapshots.sh share/my-data backuphost:share
This will create under the pool ‘share’ in the host ‘backuphost’ a filesystem matching the source (in this case: share/my-data) and set it to read-only. The script will create a snapshot with a unique name based on a shortened hash of the destination, with a counting number suffix, and start cloning the snapshot to the remote host. When called again, it will create a snapshot with the same name, but different index, and clone the delta to the remote host. In case of a disconnection, the clone will retry a few times before failing.
Note that the receiving side does not remove snapshots, so handling (too) old snapshots on the backup host remains up to you.