ZFS clone script
ZFS has some magical features, comparable to NetApp’s WAFL capabilities. One of the less-used on is the ZFS send/receive, which can be utilised as an engine below something much like NetApp’s SnapMirror or SnapVault.
The idea, if you are not familiar with NetApp’s products, is to take a snapshot of a dataset on the source, and clone it to a remote storage. Then, take another snapshot, and clone only the delta between both snapshots, and so on. This allows for cloning block-level changes only, which reduces clone payload and the time required to clone it.
Copy and save this file as clone_zfs_snapshots.sh. Give it execution permissions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 | #!/bin/bash # This script will clone ZFS snapshots incrementally over SSH to a target server # Snapshot name structure: filesystem@${TGT_HASH}_INT ; where INT is an increment number # Written by Etzion. Feel free to use. See more stuff in my blog at https://run.tournament.org.il # Arguments: # $1: ZFS filesystem name # $2: (target ZFS system):(target ZFS filesystem) IAM=$0 ZFS= /sbin/zfs LOCKDIR= /dev/shm LOCAL_SNAPS_TO_LEAVE=3 RESUME_LIMIT=3 ### FUNCTIONS ### # Sanity and usage function usage() { echo "Usage: $IAM SRC REMOTE_SERVER:ZFS_TARGET (port=SSH_PORT)" echo "ZFS_TARGET is the parent of filesystems which will be created with the original source names" echo "Example: $IAM share/test backupsrv:backup" echo "It will create a filesystem 'test' under the pool 'backup' on 'backupsrv' with clone" echo "of the current share/test ZFS filesystem" echo "This script is (on purpose) not a recursive script" echo "For the script to work correctly, it *must* have SSH key exchanged from source to target" exit 0 } function abort() { # exit errorously with a message echo "$@" pkill -P $$ remove_lock exit 1 } function parse_parameters() { # Parses command line parameters # called with $* SRC_FS=$1 shift TGT=$1 shift for i in $* do case ${i} in port=*) PORT=${i ##*=} ;; hash =*) HASH=${i ##*=} ;; esac done TGT_SYS=${TGT%%:*} TGT_FS=${TGT ##*:} # Use a short substring of MD5sum of the target name for later unique identification SRC_DIRNAME_FS=${SRC_FS #*/} if [ -z "$hash" ] then TGT_FULLHASH= "`echo $TGT_FS/${SRC_DIRNAME_FS} | md5sum -`" TGT_HASH=${TGT_FULLHASH:1:7} else TGT_HASH=${ hash } fi } function sanity() { # Verify we have all details [ -z "$SRC_FS" ] && usage [ -z "$TGT_FS" ] && usage [ -z "$TGT_SYS" ] && usage $ZFS list -H -o name $SRC_FS > /dev/null 2>&1 || abort "Source filesystem $SRC_FS does not exist" # check_target_fs || abort "Target ZFS filesystem $TGT_FS on $TGT_SYS does not exist, or not imported" } function remove_lock() { # Removes the lock file \ rm -f ${LOCKDIR}/$SRC_LOCK } function construct_ssh_cmd() { # Constract the remote SSH command # Here is a good place to put atomic parameters used for the SSH [ -z "${PORT}" ] && PORT=22 SSH= "ssh -p $PORT $TGT_SYS -o ConnectTimeout=3" CONTROL_SSH= "$SSH -f" } function get_last_remote_snapshots() { # Gets the last snapshot name on a remote system, to match it to our snapshots remoteSnapTmpObj=`$SSH "$ZFS list -H -t snapshot -r -o name ${TGT_FS}/${SRC_DIRNAME_FS}" | grep ${SRC_DIRNAME_FS}@ | grep ${TGT_HASH}` # Create a list of all snapshot indexes. Empty means its the first one remoteSnaps= "" for snapIter in ${remoteSnapTmpObj} do remoteSnaps= "$remoteSnaps ${snapIter##*@${TGT_HASH}_}" done } function check_if_remote_snapshot_exists() { # Argument: $1 ->; Name of snapshot # Checks if this snapshot exists on remote node $SSH "$ZFS list -H -t snapshot -r -o name ${TGT_FS}/${SRC_DIRNAME_FS}@${TGT_HASH}_${newLocalIndex}" return $? } function get_last_local_snapshots() { # This function will return an array of local existing snapshots using the existing TGT_HASH localSnapTmpObj=`$ZFS list -H -t snapshot -r -o name $SRC_FS | grep $SRC_FS@ | grep $TGT_HASH ` # Convert into a list and remove the HASH and everything before it. We should have clear list of indexes localSnapList= "" for snapIter in ${localSnapTmpObj} do localSnapList= "$localSnapList ${snapIter##*@${TGT_HASH}_}" done # Convert object to array localSnapList=( $localSnapList ) # Get the last object let localSnapArrayObj=${ #localSnapList[@]}-1 } function delete_snapshot() { # This function will delete a snapshot # arguments: $1 -> snapshot name [ -z "$1" ] && abort "Cleanup snapshot got no arguments" $ZFS destroy $1 #$ZFS destroy ${SRC_FS}@${TGT_HASH}_${newLocalIndex} } function find_matching_snapshot() { # This function will attempt to find a matching snapshot as a replication baseline # Gets the latest local snapshot index localRecentIndex=${localSnapList[$localSnapArrayObj]} # Gets the latest mutual snapshot index while [ $localSnapArrayObj - ge 0 ] do # Check if the current counter already exists if echo "$remoteSnaps" | grep -w ${localSnapList[$localSnapArrayObj]} > /dev/null 2>&1 then # We know the mutual index. commonIndex=${localSnapList[$localSnapArrayObj]} return 0 fi let localSnapArrayObj-- done # If we've reached here - there is no mutual index! abort "There is no mutual snapshot index, you will have to resync" } function cleanup_snapshots() { # Creates a list of snapshots to delete and then calls delete_snapshot function # We are using the most recent common index, $localSnapArrayObj as the latest reference for deletion let deleteArrayObj=$localSnapArrayObj-${LOCAL_SNAPS_TO_LEAVE} snapsToDelete= "" # Construct a list of snapshots to delete, and delete it in reverse order while [ $deleteArrayObj - ge 0 ] do # Construct snapshot name snapsToDelete= "$snapsToDelete ${SRC_FS}@${TGT_HASH}_${localSnapList[$deleteArrayObj]}" let deleteArrayObj-- done snapsToDelete=( $snapsToDelete ) snapDelete=0 while [ $snapDelete -lt ${ #snapsToDelete[@]} ] do # Delete snapshot delete_snapshot ${snapsToDelete[$snapDelete]} let snapDelete++ done } function initialize() { # This is a unique case where we initialize the first sync # We will call this procedure when $remoteSnaps is empty (meaning that there was no snapshot whatsoever) # We have to verify that the target has no existing old snapshots here # is it empty? echo "Going to perform an initialization replication. It might wipe the target $TGT_FS completely" echo "Press Enter to proceed, or Ctrl+C to abort" read "abc" ### Decided to remove this check ### [ -n "$LOCSNAP_LIST" ] && abort "No target snapshots while local history snapshots exists. Clean up history and try again" RECEIVE_FLAGS= "-sFdvu" newLocalIndex=1 # NEW_LOC_INDEX=1 create_local_snapshot $newLocalIndex open_remote_socket sleep 1 $ZFS send -ce ${SRC_FS}@${TGT_HASH}_${newLocalIndex} | nc $TGT_SYS $NC_PORT 2>&1 if [ "$?" - ne "0" ] then # Do no cleanup current snapshot # delete_snapshot ${SRC_FS}@${TGT_HASH}_${newLocalIndex} abort "Failed to send initial snapshot to target system" fi sleep 1 # Set target to RO $SSH $ZFS set readonly =on $TGT_FS [ "$?" - ne "0" ] && abort "Failed to set remote filesystem $TGT_FS to read-only" # No need to remove local snapshot } function create_local_snapshot() { # Creates snapshot on local storage # uses argument $1 [ -z "$1" ] && abort "Failed to get new snapshot index" $ZFS snapshot ${SRC_FS}@${TGT_HASH}_${1} [ "$?" - ne "0" ] && abort "Failed to create local snapshot. Check error message" } function open_remote_socket() { # Starts remote socket via SSH (as the control operation) # port is 3000 + three-digit random number let NC_PORT=3000+$RANDOM%1000 $CONTROL_SSH "nc -l -i 90 $NC_PORT | $ZFS receive ${RECEIVE_FLAGS} $TGT_FS > /tmp/output 2>&1 ; sync" #$CONTROL_SSH "socat tcp4-listen:${NC_PORT} - | $ZFS receive ${RECEIVE_FLAGS} $TGT_FS > /tmp/output 2>&1 ; sync" #zfs send -R zp03@01 | zfs receive -Fdvu zpnew } function send_zfs() { # Do the heavy lifting of opening remote socket and starting ZFS send/receive open_remote_socket sleep 1 $ZFS send -ce -I ${SRC_FS}@${TGT_HASH}_${commonIndex} ${SRC_FS}@${TGT_HASH}_${newLocalIndex} | nc -i 90 $TGT_SYS $NC_PORT #$ZFS send -ce -I ${SRC_FS}@${TGT_HASH}_${commonIndex} ${SRC_FS}@${TGT_HASH}_${newLocalIndex} | socat tcp4-connect:${TGT_SYS}:${NC_PORT} - sleep 20 } function increment() { # Create a new snapshot with the index $localRecentIndex+1, and replicate it to the remote system # Baseline is the most recent common snapshot index $commonIndex RECEIVE_FLAGS= "-Fsdvu" # With an 'F' flag maybe? # Handle the case of latest snapshot in DR is newer than current latest snapshot, due to mistaken deletion remoteSnaps=( $remoteSnaps ) let remoteIndex=${ #remoteSnaps[@]} # Get last snapshot on DR if [ ${localRecentIndex} -lt ${remoteIndex} ] then let newLocalIndex=${remoteIndex}+1 else let newLocalIndex=localRecentIndex+1 fi create_local_snapshot $newLocalIndex send_zfs # if [ "$?" -ne "0" ] # then # Cleanup current snapshot #delete_snapshot ${SRC_FS}@${TGT_HASH}_${newLocalIndex} #abort "Failed to send incremental snapshot to target system" # fi if ! verify_correctness then if ! loop_resume # If we can then # We either could not resume operation or failed to run with the required amount of iterations # For now we abort. echo "Deleting local snapshot" delete_snapshot ${SRC_FS}@${TGT_HASH}_${newLocalIndex} abort "Remote snapshot should have the index of the latest snapshot, but it is not. The current remote snapshot index is ${commonIndex}" fi fi } function loop_resume() { # Attempts to loop over resuming until limit attempt has been reached REMOTE_TOKEN=$($SSH "$ZFS get -Ho value receive_resume_token ${TGT_FS}/${SRC_DIRNAME_FS}" ) if [ "$REMOTE_TOKEN" == "-" ] then return 1 fi # We have a valid resume token. We will retry COUNT=1 while [ "$COUNT" - le "$RESUME_LIMIT" ] do # For ease of handline - for each iteration, we will request the token again echo "Attempting resume operation" REMOTE_TOKEN=$($SSH "$ZFS get -Ho value receive_resume_token ${TGT_FS}/${SRC_DIRNAME_FS}" ) let COUNT++ open_remote_socket $ZFS send -e -t $REMOTE_TOKEN | nc -i 90 $TGT_SYS $NC_PORT #$ZFS send -e -t $REMOTE_TOKEN | socat tcp4-connect:${TGT_SYS}:${NC_PORT} - sleep 20 if verify_correctness then echo "Done" return 0 fi done # If we've reached here, we have failed to run the required iterations. Lets just verify again return 1 } function verify_correctness() { # Check remote index, and verify it is correct with the current, latest snapshot if check_if_remote_snapshot_exists then echo "Replication Successful" return 0 else echo "Replication failed" return 1 fi } ### MAIN ### [ ` whoami ` != "root" ] && abort "This script has to be called by the root user" [ -z "$1" ] && usage parse_parameters $* SRC_LOCK=` echo $SRC_FS | tr / _` if [ -f ${LOCKDIR}/$SRC_LOCK ] then echo "Already locked. If should not be the case - remove ${LOCKDIR}/$SRC_LOCK" exit 1 fi sanity touch ${LOCKDIR}/$SRC_LOCK construct_ssh_cmd get_last_remote_snapshots # Have a string list of remoteSnaps # If we dont have remote snapshot it should be initialization if [ -z "$remoteSnaps" ] then initialize echo "completed initialization. Done" remove_lock exit 0 fi # We can get here only if it is not initialization get_last_local_snapshots # Have a list (array) of localSnaps find_matching_snapshot # Get the latest local index and the latest common index available increment # Creates a new snapshot and sends/receives it cleanup_snapshots # Cleans up old local snapshots pkill -P $$ remove_lock echo "Done" |
A manual initial run should be called manually. If you expect a very long initial sync, you should run it in tmux to screen, to avoid failing in the middle.
To run the command, run it like this:
1 | . /clone_zfs_snapshots .sh share /my-data backuphost:share |
This will create under the pool ‘share’ in the host ‘backuphost’ a filesystem matching the source (in this case: share/my-data) and set it to read-only. The script will create a snapshot with a unique name based on a shortened hash of the destination, with a counting number suffix, and start cloning the snapshot to the remote host. When called again, it will create a snapshot with the same name, but different index, and clone the delta to the remote host. In case of a disconnection, the clone will retry a few times before failing.
Note that the receiving side does not remove snapshots, so handling (too) old snapshots on the backup host remains up to you.