### ZFS clone script

Sunday, March 28th, 2021

ZFS has some magical features, comparable to NetApp’s WAFL capabilities. One of the less-used on is the ZFS send/receive, which can be utilised as an engine below something much like NetApp’s SnapMirror or SnapVault.

The idea, if you are not familiar with NetApp’s products, is to take a snapshot of a dataset on the source, and clone it to a remote storage. Then, take another snapshot, and clone only the delta between both snapshots, and so on. This allows for cloning block-level changes only, which reduces clone payload and the time required to clone it.

Copy and save this file as clone_zfs_snapshots.sh. Give it execution permissions.

#!/bin/bash
# This script will clone ZFS snapshots incrementally over SSH to a target server
# Snapshot name structure: [email protected]${TGT_HASH}_INT ; where INT is an increment number # Written by Etzion. Feel free to use. See more stuff in my blog at https://run.tournament.org.il # Arguments: #$1: ZFS filesystem name
# $2: (target ZFS system):(target ZFS filesystem) IAM=$0
ZFS=/sbin/zfs
LOCKDIR=/dev/shm
LOCAL_SNAPS_TO_LEAVE=3
RESUME_LIMIT=3

### FUNCTIONS ###

# Sanity and usage
function usage() {
echo "Usage: $IAM SRC REMOTE_SERVER:ZFS_TARGET (port=SSH_PORT)" echo "ZFS_TARGET is the parent of filesystems which will be created with the original source names" echo "Example:$IAM share/test backupsrv:backup"
echo "It will create a filesystem 'test' under the pool 'backup' on 'backupsrv' with clone"
echo "of the current share/test ZFS filesystem"
echo "This script is (on purpose) not a recursive script"
echo "For the script to work correctly, it *must* have SSH key exchanged from source to target"
exit 0
}

function abort() {
# exit errorously with a message
echo "[email protected]"
pkill -P $$remove_lock exit 1 } function parse_parameters() { # Parses command line parameters # called with * SRC_FS=1 shift TGT=1 shift for i in * do case {i} in port=*) PORT={i##*=} ;; hash=*) HASH={i##*=} ;; esac done TGT_SYS={TGT%%:*} TGT_FS={TGT##*:} # Use a short substring of MD5sum of the target name for later unique identification SRC_DIRNAME_FS={SRC_FS#*/} if [ -z "hash" ] then TGT_FULLHASH="echo TGT_FS/{SRC_DIRNAME_FS} | md5sum -" TGT_HASH={TGT_FULLHASH:1:7} else TGT_HASH={hash} fi } function sanity() { # Verify we have all details [ -z "SRC_FS" ] && usage [ -z "TGT_FS" ] && usage [ -z "TGT_SYS" ] && usage ZFS list -H -o name SRC_FS > /dev/null 2>&1 || abort "Source filesystem SRC_FS does not exist" # check_target_fs || abort "Target ZFS filesystem TGT_FS on TGT_SYS does not exist, or not imported" } function remove_lock() { # Removes the lock file \rm -f {LOCKDIR}/SRC_LOCK } function construct_ssh_cmd() { # Constract the remote SSH command # Here is a good place to put atomic parameters used for the SSH [ -z "{PORT}" ] && PORT=22 SSH="ssh -p PORT TGT_SYS -o ConnectTimeout=3" CONTROL_SSH="SSH -f" } function get_last_remote_snapshots() { # Gets the last snapshot name on a remote system, to match it to our snapshots remoteSnapTmpObj=SSH "ZFS list -H -t snapshot -r -o name {TGT_FS}/{SRC_DIRNAME_FS}" | grep {SRC_DIRNAME_FS}@ | grep {TGT_HASH} # Create a list of all snapshot indexes. Empty means its the first one remoteSnaps="" for snapIter in {remoteSnapTmpObj} do remoteSnaps="remoteSnaps {snapIter##*@{TGT_HASH}_}" done } function check_if_remote_snapshot_exists() { # Argument: 1 ->; Name of snapshot # Checks if this snapshot exists on remote node SSH "ZFS list -H -t snapshot -r -o name {TGT_FS}/{SRC_DIRNAME_FS}@{TGT_HASH}_{newLocalIndex}" return ? } function get_last_local_snapshots() { # This function will return an array of local existing snapshots using the existing TGT_HASH localSnapTmpObj=ZFS list -H -t snapshot -r -o name SRC_FS | grep [email protected] | grep TGT_HASH  # Convert into a list and remove the HASH and everything before it. We should have clear list of indexes localSnapList="" for snapIter in {localSnapTmpObj} do localSnapList="localSnapList {snapIter##*@{TGT_HASH}_}" done # Convert object to array localSnapList=( localSnapList ) # Get the last object let localSnapArrayObj={#localSnapList[@]}-1 } function delete_snapshot() { # This function will delete a snapshot # arguments: 1 -> snapshot name [ -z "1" ] && abort "Cleanup snapshot got no arguments" ZFS destroy 1 #ZFS destroy {SRC_FS}@{TGT_HASH}_{newLocalIndex} } function find_matching_snapshot() { # This function will attempt to find a matching snapshot as a replication baseline # Gets the latest local snapshot index localRecentIndex={localSnapList[localSnapArrayObj]} # Gets the latest mutual snapshot index while [ localSnapArrayObj -ge 0 ] do # Check if the current counter already exists if echo "remoteSnaps" | grep -w {localSnapList[localSnapArrayObj]} > /dev/null 2>&1 then # We know the mutual index. commonIndex={localSnapList[localSnapArrayObj]} return 0 fi let localSnapArrayObj-- done # If we've reached here - there is no mutual index! abort "There is no mutual snapshot index, you will have to resync" } function cleanup_snapshots() { # Creates a list of snapshots to delete and then calls delete_snapshot function # We are using the most recent common index, localSnapArrayObj as the latest reference for deletion let deleteArrayObj=localSnapArrayObj-{LOCAL_SNAPS_TO_LEAVE} snapsToDelete="" # Construct a list of snapshots to delete, and delete it in reverse order while [ deleteArrayObj -ge 0 ] do # Construct snapshot name snapsToDelete="snapsToDelete {SRC_FS}@{TGT_HASH}_{localSnapList[deleteArrayObj]}" let deleteArrayObj-- done snapsToDelete=( snapsToDelete ) snapDelete=0 while [ snapDelete -lt {#snapsToDelete[@]} ] do # Delete snapshot delete_snapshot {snapsToDelete[snapDelete]} let snapDelete++ done } function initialize() { # This is a unique case where we initialize the first sync # We will call this procedure when remoteSnaps is empty (meaning that there was no snapshot whatsoever) # We have to verify that the target has no existing old snapshots here # is it empty? echo "Going to perform an initialization replication. It might wipe the target TGT_FS completely" echo "Press Enter to proceed, or Ctrl+C to abort" read "abc" ### Decided to remove this check ### [ -n "LOCSNAP_LIST" ] && abort "No target snapshots while local history snapshots exists. Clean up history and try again" RECEIVE_FLAGS="-sFdvu" newLocalIndex=1 # NEW_LOC_INDEX=1 create_local_snapshot newLocalIndex open_remote_socket sleep 1 ZFS send -ce {SRC_FS}@{TGT_HASH}_{newLocalIndex} | nc TGT_SYS NC_PORT 2>&1 if [ "?" -ne "0" ] then # Do no cleanup current snapshot # delete_snapshot {SRC_FS}@{TGT_HASH}_{newLocalIndex} abort "Failed to send initial snapshot to target system" fi sleep 1 # Set target to RO SSH ZFS set readonly=on TGT_FS [ "?" -ne "0" ] && abort "Failed to set remote filesystem TGT_FS to read-only" # No need to remove local snapshot } function create_local_snapshot() { # Creates snapshot on local storage # uses argument 1 [ -z "1" ] && abort "Failed to get new snapshot index" ZFS snapshot {SRC_FS}@{TGT_HASH}_{1} [ "?" -ne "0" ] && abort "Failed to create local snapshot. Check error message" } function open_remote_socket() { # Starts remote socket via SSH (as the control operation) # port is 3000 + three-digit random number let NC_PORT=3000+RANDOM%1000 CONTROL_SSH "nc -l -i 90 NC_PORT | ZFS receive {RECEIVE_FLAGS} TGT_FS > /tmp/output 2>&1 ; sync" #CONTROL_SSH "socat tcp4-listen:{NC_PORT} - | ZFS receive {RECEIVE_FLAGS} TGT_FS > /tmp/output 2>&1 ; sync" #zfs send -R [email protected] | zfs receive -Fdvu zpnew } function send_zfs() { # Do the heavy lifting of opening remote socket and starting ZFS send/receive open_remote_socket sleep 1 ZFS send -ce -I {SRC_FS}@{TGT_HASH}_{commonIndex} {SRC_FS}@{TGT_HASH}_{newLocalIndex} | nc -i 90 TGT_SYS NC_PORT #ZFS send -ce -I {SRC_FS}@{TGT_HASH}_{commonIndex} {SRC_FS}@{TGT_HASH}_{newLocalIndex} | socat tcp4-connect:{TGT_SYS}:{NC_PORT} - sleep 20 } function increment() { # Create a new snapshot with the index localRecentIndex+1, and replicate it to the remote system # Baseline is the most recent common snapshot index commonIndex RECEIVE_FLAGS="-Fsdvu" # With an 'F' flag maybe? # Handle the case of latest snapshot in DR is newer than current latest snapshot, due to mistaken deletion remoteSnaps=( remoteSnaps ) let remoteIndex={#remoteSnaps[@]} # Get last snapshot on DR if [ {localRecentIndex} -lt {remoteIndex} ] then let newLocalIndex={remoteIndex}+1 else let newLocalIndex=localRecentIndex+1 fi create_local_snapshot newLocalIndex send_zfs # if [ "?" -ne "0" ] # then # Cleanup current snapshot #delete_snapshot {SRC_FS}@{TGT_HASH}_{newLocalIndex} #abort "Failed to send incremental snapshot to target system" # fi if ! verify_correctness then if ! loop_resume # If we can then # We either could not resume operation or failed to run with the required amount of iterations # For now we abort. echo "Deleting local snapshot" delete_snapshot {SRC_FS}@{TGT_HASH}_{newLocalIndex} abort "Remote snapshot should have the index of the latest snapshot, but it is not. The current remote snapshot index is {commonIndex}" fi fi } function loop_resume() { # Attempts to loop over resuming until limit attempt has been reached REMOTE_TOKEN=(SSH "ZFS get -Ho value receive_resume_token {TGT_FS}/{SRC_DIRNAME_FS}") if [ "REMOTE_TOKEN" == "-" ] then return 1 fi # We have a valid resume token. We will retry COUNT=1 while [ "COUNT" -le "RESUME_LIMIT" ] do # For ease of handline - for each iteration, we will request the token again echo "Attempting resume operation" REMOTE_TOKEN=(SSH "ZFS get -Ho value receive_resume_token {TGT_FS}/{SRC_DIRNAME_FS}") let COUNT++ open_remote_socket ZFS send -e -t REMOTE_TOKEN | nc -i 90 TGT_SYS NC_PORT #ZFS send -e -t REMOTE_TOKEN | socat tcp4-connect:{TGT_SYS}:{NC_PORT} - sleep 20 if verify_correctness then echo "Done" return 0 fi done # If we've reached here, we have failed to run the required iterations. Lets just verify again return 1 } function verify_correctness() { # Check remote index, and verify it is correct with the current, latest snapshot if check_if_remote_snapshot_exists then echo "Replication Successful" return 0 else echo "Replication failed" return 1 fi } ### MAIN ### [ whoami != "root" ] && abort "This script has to be called by the root user" [ -z "1" ] && usage parse_parameters * SRC_LOCK=echo SRC_FS | tr / _ if [ -f {LOCKDIR}/SRC_LOCK ] then echo "Already locked. If should not be the case - remove {LOCKDIR}/SRC_LOCK" exit 1 fi sanity touch {LOCKDIR}/SRC_LOCK construct_ssh_cmd get_last_remote_snapshots # Have a string list of remoteSnaps # If we dont have remote snapshot it should be initialization if [ -z "remoteSnaps" ] then initialize echo "completed initialization. Done" remove_lock exit 0 fi # We can get here only if it is not initialization get_last_local_snapshots # Have a list (array) of localSnaps find_matching_snapshot # Get the latest local index and the latest common index available increment # Creates a new snapshot and sends/receives it cleanup_snapshots # Cleans up old local snapshots pkill -P$$
remove_lock
echo "Done"


A manual initial run should be called manually. If you expect a very long initial sync, you should run it in tmux to screen, to avoid failing in the middle.

To run the command, run it like this:

./clone_zfs_snapshots.sh share/my-data backuphost:share


This will create under the pool ‘share’ in the host ‘backuphost’ a filesystem matching the source (in this case: share/my-data) and set it to read-only. The script will create a snapshot with a unique name based on a shortened hash of the destination, with a counting number suffix, and start cloning the snapshot to the remote host. When called again, it will create a snapshot with the same name, but different index, and clone the delta to the remote host. In case of a disconnection, the clone will retry a few times before failing.

Note that the receiving side does not remove snapshots, so handling (too) old snapshots on the backup host remains up to you.

### NetApp SnapMirror monitor script

Sunday, December 13th, 2009

I have had some work done lately with NetApp SnapMirror. I have snapped-mirrored some volumes and qtrees and I wanted to monitor their use and behavior over the line.

As you can expect, site-to-site replication of data is a fragile thing, especially when done on the level of the storage device, which is agnostic to the data kept on it. When replicating volumes, I should expect the relevant employees to be responsible regarding what’s placed there, because the storage does not filter out the junk. If someone had decided to add a new DVD image on the DB storage space, well – the DB won’t care, as long as there is enough free space, but the storage will attempt to replicate the added data to the alternate site, which means that if you are around your bandwidth limits, which is never a good thing, you will just create a delay gap you would hardly (if at all) be able to close.

For that, and since I don’t tend to trust people not to do stupid things, I have written this script.

What does it do?

This script will perform the following:

Assuming SnapMirror is scheduled to a specific time, the script will alert if a session is active. With the flag ‘-a no’, it will not send an e-mail (if possible, see the configuration section below). With ‘-r yes’, it will react, setting throttle for each non-idle session, but then ‘-t VALUE’ should be specified, where VALUE is the numeric throttle in KB/s.

Limiting throttle to a SnapMirror session

Use with ‘-m throttle_limit’

The script will set a throttle for SnapMirror session(s). Setting limit by the flag ‘-t VALUE’, where VALUE is the numeric throttle in KB/s per each session.

Cancelling throttle limit

Use with ‘-m throttle_unlimit’

The script will set unlimited throttle for SnapMirror session(s).

Checking SnapMirror lag

Use with the ‘-m check_lag’

Since replication has a purpose of recovering, the lag of each SnapMirror session would show how far back we are. Use with ‘-d VALUE’, VALUE being numeric time in minutes to set alert threshold. The default threshold delay is one day (1440 minutes).

Checking snapshots size

Use with the ‘-m check_size’

This reports the expected delta to transfer. This can help estimate the success or failure of a future sync of data (snapmirror update) before it begins. Use with ‘-l’ flag to set it to log date/time of measure and the expected sizes into a file. By default, in /tmp/target_name.txt, where the target is the SnapMirror target.

General Options

Use with ‘-c filename’ for alternate configuration file.

Use with ‘-h’ to get general help.

Use with a list target names in the format of storage:/vol/volname/qtree or storage:volname to ignore targets in configuration file and use your own.

Configuration File

The configuration file is rather simple. By default it should be called “/etc/snapmirror_monitor.conf“. It consists of two main variables for the system:

TGTS=”storage2:/vol/volname/qtree

storage3:volname2

storage1:/vol/volnew/qtr2″

Prerequisites

This script will run on any modern Linux machine. For it to communicate with the NetApp devices, you will need SSH enabled on the NetApps, and ssh key exchange so that the Linux would be able to access the NetApp without using passwords.

The Script

Below is the script. You can download it and use it as you like.

#!/bin/bash
# This script will monitor snapmirror status
# Assumption: Access through ssh to root on all storage devices involved
# This will also attempt to detect the diff which is to sync

# Written by Ez-Aton. Check http://run.tournament.org.il for updates or

# Modes:
# throttle_limit -> Limit throttle to a given number (default or manually set)
# throttle_unlimit -> Open throttle limitation
# check_lag -> Report the snapmirror lage
# check_size -> Report the estimated data size to move

# Global variables
CONF=/etc/snapmirror_monitor.conf
LOG_PREFIX=/tmp

test_connection () {
# Test to see that you can access the storage device
# Arguments: NetApp name
SSH_OPTS="-o ConnectTimeout=2"
if ! ssh $SSH_OPTS$1 hostname &>/dev/null
then
echo "Cannot communicate via SSH to $1" exit 1 fi } abort () { # Exit with a predefined error message echo$*
exit 1
}

get_arguments () {
# Get all arguments and define options
# Argument: [email protected]
[ -z "$1" ] && set -- -h while [ -n "$1" ]
do
case "$1" in -m) shift case "$1" in
alert|throttle_limit|throttle_unlimit|check_lag|check_size)     MODE=$1 ;; *) abort "Mode is mandatory. Use -h flag to get list of avialable flags" ;; esac ;; -a) shift case "$1" in
[nN][oO])       NOMAIL=1
;;
*)              NOMAIL=0
;;
esac
;;
-r)     shift
case "$1" in [yY][eE][sS]) REACT=1 ;; *) REACT=0 ;; esac ;; -d) shift declare -i DELAY_TMP DELAY_TMP=$1
[ "$DELAY_TMP" != "$1" ] && abort "Delay needs to be a number in minutes"
DELAY=$DELAY_TMP ;; -t) shift declare -i THROTTLE_TMP THROTTLE_TMP=$1
[ "$THROTTLE_TMP" != "$1" ] && abort "Throttle needs to be a number"
THROTTLE=$THROTTLE_TMP ;; -c) shift [ -f "$1" ] || abort "Cannot find specified conf file"
CONF="$1" ;; -l) LOG=1 ;; -h) echo "Usage:$0 -m [alert|throttle_limit|throttle_unlimit|check_lag|check_size] (-c CONF_FILE) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
echo "Alert if SnapMirror is still running: $0 -m alert [-a no] (-r yes) [tgt_filer:volume tgt_filer:/vol/vol/qtree]" echo "Alert and throttle (react):$0 -m alert [-a no] -r yes -t [throttle_in_kb] [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
echo "Throttle a running SnapMirror: $0 -m throttle_limit -t throttle_in_kb [tgt_filer:volume tgt_filer:/vol/vol/qtree]" echo "Unlimit SnapMirror throttle:$0 -m throttle_unlimit [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
echo "To check lag: $0 -m check_lag -d delay_in_minutes (-a no) [tgt_filer:volume tgt_filer:/vol/vol/qtree]" echo "To check delta:$0 -m check_size [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
exit 0
;;
*)      [ -z "$MODE" ] && abort "$0 mode required"
TGTS="$*" ;; esac shift done } notify () { # Send an e-mail notification # Arguments: [email protected] - the subject # Contents are empty # And yes - one e-mail per event mail -s "[email protected]"$EMAIL /dev/null #Checks if the snapmirror is idle. If so, return true
return $? } set_throttle () { # Sets throttle for target # Arguments:$1 Target name (example: storage:/vol/volname/qtree)
# Arguments: $2 throttle value (number) # Get the storage name out NETAPP=${1%%:*}
test_connection $NETAPP #Verify this netapp is accessible ssh$NETAPP snapmirror throttle $2$1
}

get_lag () {
# Gets the lag of snapmirror relationship in minutes
# Arguments: Target name (example: storage:/vol/volname/qtree)

# Get the storage name out
NETAPP=${1%%:*} test_connection$NETAPP #Verify this netapp is accessible
LAG=ssh $NETAPP snapmirror status$1 | tail -1 | awk '{print $4}' # LAG is in hh:mm:ss. We need to transfer it to minutes only H=echo$LAG | cut -f 1 -d :
M=echo $LAG | cut -f 2 -d : let M=$M+$H*60 echo$M
}

check_size () {
# Checks the size of the snapshot to copy (diff)
# Arguments: Target name (example: storage:/vol/volname/qtree)

# Get the storage name out
NETAPP=${1%%:*} test_connection$NETAPP #Verify this netapp is accessible
# Get source storage name and path
SRC=ssh $NETAPP snapmirror status$1 | tail -1 | awk '{print $1}' # Get the source filer and vol name from that NETAPP=${SRC%%:*}
SPATH=${SRC##*:} SPATH=echo$SPATH | sed s/'/vol/'//
SPATH=${SPATH%%/*} test_connection$NETAPP # Verify the target NetApp is accessible
SNAP=ssh $NETAPP snap list -n$SPATH | grep snapmirror | tail -1 | awk '{print $4}' DELTA=ssh$NETAPP snap delta $SPATH$SNAP | tail -2 | head -1 | awk '{print $5}' echo "Snap delta for$1 is $DELTA KB" LOG_TARGET=echo$1 | tr / _.txt
[ -n "$LOG" ] && echo "date$DELTA" >> $LOG_PREFIX/$LOG_TARGET
}

### MAIN ###
get_arguments [email protected]
. $CONF &>/dev/null # if e-mail is not set, don't try to send [ -z "$EMAIL" ] && NOMAIL=1

[ -z "$TGTS" ] && abort "You need at least one snapmirror target" case$MODE in
alert)  if [ "$REACT" == "1" ] then [ -z "$THROTTLE" ] && abort "When setting 'react' flag, you must specify throttle"
fi
for i in $TGTS do if ! idle$i
then
echo -n "$i is not idle. " [ "$NOMAIL" != "1" ] && notify "$i is not idle" if [ "$REACT" == "1" ]
then
echo -n "We are set to react. Limiting throttle"
set_throttle $i$THROTTLE
fi
echo
fi
done
;;
throttle_limit) [ -z "$THROTTLE" ] && abort "Throttle requires throttle value" for i in$TGTS
do
echo "Setting throttle for $i to$THROTTLE"
set_throttle $i$THROTTLE
done
;;
throttle_unlimit)       for i in $TGTS do echo "Setting throttle for$i to unlimited"
set_throttle $i 0 done ;; check_lag) [ -z "$DELAY" ] && DELAY=1440
for i in $TGTS do LAG=get_lag$i
if [ "$LAG" -gt "$DELAY" ]
then
echo "Failure: The delay for $i is$LAG minutes"
[ "$NOMAIL" != "1" ] && notify "$i is lagged $LAG minutes, above the threshold$DELAY"
else
echo "Normal: The delay for $i is$LAG minutes"
fi
done
;;
check_size)     for i in $TGTS do check_size$i
done
;;
*)      echo "Option \$MODE is not implemented yet"
exit 0
;;
esac


### Microsoft Exchange, data replication

Sunday, November 27th, 2005

Here’s a little issue. If you were to replicate MS Exchange DB from one machine to another, how/what would you have done?

The scenario goes as follows: You have your own domain, and you use, for your own core services AD and MS Exchange for the whole organization. While AD supports some built-in replication, so you could hold a second
site and join a Windows server machine into this domain, and assuming you have some sort of link, you would have a replica of your AD on the secondary site. In case something happened to your primary site, your secondary site would have a complete and correct replica of your primary site.

Exchange doesn’t act that nice. Replicating MS Exchange is not simple, even assuming you have a method of transferring the data from one site to the other.

Two methods I can think of – The first would be to somehow proxy the information getting into it, and transport it (maybe rewrite the headers to point to a different recipient) to another site as well. In this case, you are actually agnostic to the store. You don’t care if you maintain your mailboxes on MS Exchange, Windows, Unix, or whatever system. You have two copies, and you’re fine with it. You should find some logical method to make your primary site’s mail server transport all mail, even internal mail, through this proxy mechanism.

The second method is a bit more interesting. If we could have had a real-time replica of the primary site’s Exchange DB, we would have been able to “mount” it on the secondary site. It is not trivial, but it can be done using some
not-so-common software or hardware based solutions. However, the secondary site would require few things, per this list:

1) Similarity:

I would strive to similarity between both sites. MS recommend similarity (when dealing with cold-recover), per this site, especially with the Exchange patch level, and somewhat with Windows patch level. The more similar, the better chances we’ll have. So:

a. Similarity in patch level.

b. Similarity in Storage group settings, especially transaction logs’ location.

c. Similarity in the structure of each store under each storage group

d. Similarity in the absolute path of each store’s edb and stm location between source and target

2) Recoverability:

To make things recoverable, you should make sure each store has, under the “Database” tab, under properties, the option which will allow restore to overwrite the DB.

3) Flexibility:

You need to find some flexible script which will be able to change, as fast as possible, the pointers in your AD, to point to the new Exchange server, in the users’ Exchange attributes. I have such a script, but I cannot disclose it here.

Armed with these three, you can copy, transfer, replicate, or whatever, your DBs from one location to the other. Make sure you replicate the logs as well, else it’ll get messy, and will require tons of time for DB recovery.

To recover, you should only mount the stores. Assuming you have followed the prerequisites, you would be able to mount the stores in no time. Otherwise, you would need to run eseutil on the stores. It might get messy.

Afterwards, only one thing to do – mass change the attributes of the users in question to point to the alternate Exchange server, and the alternate Exchange store. Should work like a charm.