Posts Tagged ‘replication’

NetApp SnapMirror monitor script

Sunday, December 13th, 2009

I have had some work done lately with NetApp SnapMirror. I have snapped-mirrored some volumes and qtrees and I wanted to monitor their use and behavior over the line.

As you can expect, site-to-site replication of data is a fragile thing, especially when done on the level of the storage device, which is agnostic to the data kept on it. When replicating volumes, I should expect the relevant employees to be responsible regarding what’s placed there, because the storage does not filter out the junk. If someone had decided to add a new DVD image on the DB storage space, well – the DB won’t care, as long as there is enough free space, but the storage will attempt to replicate the added data to the alternate site, which means that if you are around your bandwidth limits, which is never a good thing, you will just create a delay gap you would hardly (if at all) be able to close.

For that, and since I don’t tend to trust people not to do stupid things, I have written this script.

What does it do?

This script will perform the following:

Alerting about non-idle SnapMirror session

Use with ‘-m alert’

Assuming SnapMirror is scheduled to a specific time, the script will alert if a session is active. With the flag ‘-a no’, it will not send an e-mail (if possible, see the configuration section below). With ‘-r yes’, it will react, setting throttle for each non-idle session, but then ‘-t VALUE’ should be specified, where VALUE is the numeric throttle in KB/s.

Limiting throttle to a SnapMirror session

Use with ‘-m throttle_limit’

The script will set a throttle for SnapMirror session(s). Setting limit by the flag ‘-t VALUE’, where VALUE is the numeric throttle in KB/s per each session.

Cancelling throttle limit

Use with ‘-m throttle_unlimit’

The script will set unlimited throttle for SnapMirror session(s).

Checking SnapMirror lag

Use with the ‘-m check_lag’

Since replication has a purpose of recovering, the lag of each SnapMirror session would show how far back we are. Use with ‘-d VALUE’, VALUE being numeric time in minutes to set alert threshold. The default threshold delay is one day (1440 minutes).

Checking snapshots size

Use with the ‘-m check_size’

This reports the expected delta to transfer. This can help estimate the success or failure of a future sync of data (snapmirror update) before it begins. Use with ‘-l’ flag to set it to log date/time of measure and the expected sizes into a file. By default, in /tmp/target_name.txt, where the target is the SnapMirror target.

General Options

Use with ‘-c filename’ for alternate configuration file.

Use with ‘-h’ to get general help.

Use with a list target names in the format of storage:/vol/volname/qtree or storage:volname to ignore targets in configuration file and use your own.

Configuration File

The configuration file is rather simple. By default it should be called “/etc/snapmirror_monitor.conf“. It consists of two main variables for the system:




EMAIL=”[email protected] [email protected]


This script will run on any modern Linux machine. For it to communicate with the NetApp devices, you will need SSH enabled on the NetApps, and ssh key exchange so that the Linux would be able to access the NetApp without using passwords.

The Script

Below is the script. You can download it and use it as you like.

# This script will monitor snapmirror status
# Assumption: Access through ssh to root on all storage devices involved
# This will also attempt to detect the diff which is to sync

# Written by Ez-Aton. Check for updates or
# additional information

# Modes: 
# alert -> alert if snapmirror is still active
# throttle_limit -> Limit throttle to a given number (default or manually set)
# throttle_unlimit -> Open throttle limitation
# check_lag -> Report the snapmirror lage
# check_size -> Report the estimated data size to move

# Global variables

test_connection () {
        # Test to see that you can access the storage device
        # Arguments: NetApp name
        SSH_OPTS="-o ConnectTimeout=2"
        if ! ssh $SSH_OPTS $1 hostname &>/dev/null
                echo "Cannot communicate via SSH to $1"
                exit 1

abort () {
        # Exit with a predefined error message
        echo $*
        exit 1

get_arguments () {
        # Get all arguments and define options
        # Argument: [email protected]
        [ -z "$1" ] && set -- -h
        while [ -n "$1" ]
                case "$1" in
                        -m)     shift
                                case "$1" in
                                        alert|throttle_limit|throttle_unlimit|check_lag|check_size)     MODE=$1
                                        *)      abort "Mode is mandatory. Use -h flag to get list of avialable flags"
                        -a)     shift
                                case "$1" in
                                        [nN][oO])       NOMAIL=1
                                        *)              NOMAIL=0
                        -r)     shift
                                case "$1" in
                                        [yY][eE][sS])   REACT=1
                                        *)              REACT=0
                        -d)     shift
                                declare -i DELAY_TMP
                                [ "$DELAY_TMP" != "$1" ] && abort "Delay needs to be a number in minutes"
                        -t)     shift
                                declare -i THROTTLE_TMP
                                [ "$THROTTLE_TMP" != "$1" ] && abort "Throttle needs to be a number"
                        -c)     shift
                                [ -f "$1" ] || abort "Cannot find specified conf file"
                        -l)     LOG=1
                        -h)     echo "Usage: $0 -m [alert|throttle_limit|throttle_unlimit|check_lag|check_size] (-c CONF_FILE) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Alert if SnapMirror is still running: $0 -m alert [-a no] (-r yes) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Alert and throttle (react): $0 -m alert [-a no] -r yes -t [throttle_in_kb] [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Throttle a running SnapMirror: $0 -m throttle_limit -t throttle_in_kb [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Unlimit SnapMirror throttle: $0 -m throttle_unlimit [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "To check lag: $0 -m check_lag -d delay_in_minutes (-a no) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "To check delta: $0 -m check_size [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                exit 0
                        *)      [ -z "$MODE" ] && abort "$0 mode required"

notify () {
        # Send an e-mail notification
        # Arguments: [email protected] - the subject
        # Contents are empty
        # And yes - one e-mail per event
        mail -s "[email protected]" $EMAIL /dev/null #Checks if the snapmirror is idle. If so, return true
        return $?

set_throttle () {
        # Sets throttle for target
        # Arguments: $1 Target name (example: storage:/vol/volname/qtree)
        # Arguments: $2 throttle value (number)

        # Get the storage name out
        test_connection $NETAPP #Verify this netapp is accessible
        ssh $NETAPP snapmirror throttle $2 $1

get_lag () {
        # Gets the lag of snapmirror relationship in minutes
        # Arguments: Target name (example: storage:/vol/volname/qtree)

        # Get the storage name out
        test_connection $NETAPP #Verify this netapp is accessible
        LAG=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $4}'`
        # LAG is in hh:mm:ss. We need to transfer it to minutes only
        H=`echo $LAG | cut -f 1 -d :`
        M=`echo $LAG | cut -f 2 -d :`
        let M=$M+$H*60
        echo $M

check_size () {
        # Checks the size of the snapshot to copy (diff)
        # Arguments: Target name (example: storage:/vol/volname/qtree)

        # Get the storage name out
        test_connection $NETAPP #Verify this netapp is accessible
        # Get source storage name and path
        SRC=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $1}'`
        # Get the source filer and vol name from that
        SPATH=`echo $SPATH | sed s/'/vol/'//`

        test_connection $NETAPP # Verify the target NetApp is accessible
        SNAP=`ssh $NETAPP snap list -n $SPATH | grep snapmirror | tail -1 | awk '{print $4}'`
        DELTA=`ssh $NETAPP snap delta $SPATH $SNAP | tail -2 | head -1 | awk '{print $5}'`
        echo "Snap delta for $1 is $DELTA KB"  
        LOG_TARGET=`echo $1 | tr / _`.txt
        [ -n "$LOG" ] && echo "`date` $DELTA" >> $LOG_PREFIX/$LOG_TARGET

### MAIN ###
get_arguments [email protected]
. $CONF &>/dev/null
# if e-mail is not set, don't try to send
[ -z "$EMAIL" ] && NOMAIL=1

[ -z "$TGTS" ] && abort "You need at least one snapmirror target"

case $MODE in
        alert)  if [ "$REACT" == "1" ]
                        [ -z "$THROTTLE" ] && abort "When setting 'react' flag, you must specify throttle"
                for i in $TGTS
                        if ! idle $i
                                echo -n "$i is not idle. "
                                [ "$NOMAIL" != "1" ] && notify "$i is not idle"
                                if [ "$REACT" == "1" ]
                                        echo -n "We are set to react. Limiting throttle"
                                        set_throttle $i $THROTTLE
        throttle_limit) [ -z "$THROTTLE" ] && abort "Throttle requires throttle value"
                        for i in $TGTS
                                echo "Setting throttle for $i to $THROTTLE"
                                set_throttle $i $THROTTLE
        throttle_unlimit)       for i in $TGTS
                                        echo "Setting throttle for $i to unlimited"
                                        set_throttle $i 0
        check_lag)      [ -z "$DELAY" ] && DELAY=1440
                        for i in $TGTS
                                LAG=`get_lag $i`
                                if [ "$LAG" -gt "$DELAY" ]
                                        echo "Failure: The delay for $i is $LAG minutes"
                                        [ "$NOMAIL" != "1" ] && notify "$i is lagged $LAG minutes, above the threshold $DELAY"
                                        echo "Normal: The delay for $i is $LAG minutes"
        check_size)     for i in $TGTS
                                check_size $i
        *)      echo "Option $MODE is not implemented yet"
                exit 0

Microsoft Exchange, data replication

Sunday, November 27th, 2005

Here’s a little issue. If you were to replicate MS Exchange DB from one machine to another, how/what would you have done?

The scenario goes as follows: You have your own domain, and you use, for your own core services AD and MS Exchange for the whole organization. While AD supports some built-in replication, so you could hold a second
site and join a Windows server machine into this domain, and assuming you have some sort of link, you would have a replica of your AD on the secondary site. In case something happened to your primary site, your secondary site would have a complete and correct replica of your primary site.

Exchange doesn’t act that nice. Replicating MS Exchange is not simple, even assuming you have a method of transferring the data from one site to the other.

Two methods I can think of – The first would be to somehow proxy the information getting into it, and transport it (maybe rewrite the headers to point to a different recipient) to another site as well. In this case, you are actually agnostic to the store. You don’t care if you maintain your mailboxes on MS Exchange, Windows, Unix, or whatever system. You have two copies, and you’re fine with it. You should find some logical method to make your primary site’s mail server transport all mail, even internal mail, through this proxy mechanism.

The second method is a bit more interesting. If we could have had a real-time replica of the primary site’s Exchange DB, we would have been able to “mount” it on the secondary site. It is not trivial, but it can be done using some
not-so-common software or hardware based solutions. However, the secondary site would require few things, per this list:

1) Similarity:

I would strive to similarity between both sites. MS recommend similarity (when dealing with cold-recover), per this site, especially with the Exchange patch level, and somewhat with Windows patch level. The more similar, the better chances we’ll have. So:

a. Similarity in patch level.

b. Similarity in Storage group settings, especially transaction logs’ location.

c. Similarity in the structure of each store under each storage group

d. Similarity in the absolute path of each store’s edb and stm location between source and target

2) Recoverability:

To make things recoverable, you should make sure each store has, under the “Database” tab, under properties, the option which will allow restore to overwrite the DB.

3) Flexibility:

You need to find some flexible script which will be able to change, as fast as possible, the pointers in your AD, to point to the new Exchange server, in the users’ Exchange attributes. I have such a script, but I cannot disclose it here.

Armed with these three, you can copy, transfer, replicate, or whatever, your DBs from one location to the other. Make sure you replicate the logs as well, else it’ll get messy, and will require tons of time for DB recovery.

To recover, you should only mount the stores. Assuming you have followed the prerequisites, you would be able to mount the stores in no time. Otherwise, you would need to run eseutil on the stores. It might get messy.

Afterwards, only one thing to do – mass change the attributes of the users in question to point to the alternate Exchange server, and the alternate Exchange store. Should work like a charm.