NetApp SnapMirror monitor script
I have had some work done lately with NetApp SnapMirror. I have snapped-mirrored some volumes and qtrees and I wanted to monitor their use and behavior over the line.
As you can expect, site-to-site replication of data is a fragile thing, especially when done on the level of the storage device, which is agnostic to the data kept on it. When replicating volumes, I should expect the relevant employees to be responsible regarding what’s placed there, because the storage does not filter out the junk. If someone had decided to add a new DVD image on the DB storage space, well – the DB won’t care, as long as there is enough free space, but the storage will attempt to replicate the added data to the alternate site, which means that if you are around your bandwidth limits, which is never a good thing, you will just create a delay gap you would hardly (if at all) be able to close.
For that, and since I don’t tend to trust people not to do stupid things, I have written this script.
What does it do?
This script will perform the following:
Alerting about non-idle SnapMirror session
Use with ‘-m alert’
Assuming SnapMirror is scheduled to a specific time, the script will alert if a session is active. With the flag ‘-a no’, it will not send an e-mail (if possible, see the configuration section below). With ‘-r yes’, it will react, setting throttle for each non-idle session, but then ‘-t VALUE’ should be specified, where VALUE is the numeric throttle in KB/s.
Limiting throttle to a SnapMirror session
Use with ‘-m throttle_limit’
The script will set a throttle for SnapMirror session(s). Setting limit by the flag ‘-t VALUE’, where VALUE is the numeric throttle in KB/s per each session.
Cancelling throttle limit
Use with ‘-m throttle_unlimit’
The script will set unlimited throttle for SnapMirror session(s).
Checking SnapMirror lag
Use with the ‘-m check_lag’
Since replication has a purpose of recovering, the lag of each SnapMirror session would show how far back we are. Use with ‘-d VALUE’, VALUE being numeric time in minutes to set alert threshold. The default threshold delay is one day (1440 minutes).
Checking snapshots size
Use with the ‘-m check_size’
This reports the expected delta to transfer. This can help estimate the success or failure of a future sync of data (snapmirror update) before it begins. Use with ‘-l’ flag to set it to log date/time of measure and the expected sizes into a file. By default, in /tmp/target_name.txt, where the target is the SnapMirror target.
General Options
Use with ‘-c filename’ for alternate configuration file.
Use with ‘-h’ to get general help.
Use with a list target names in the format of storage:/vol/volname/qtree or storage:volname to ignore targets in configuration file and use your own.
Configuration File
The configuration file is rather simple. By default it should be called “/etc/snapmirror_monitor.conf“. It consists of two main variables for the system:
TGTS=”storage2:/vol/volname/qtree
storage3:volname2
storage1:/vol/volnew/qtr2″
EMAIL=”[email protected] [email protected]”
Prerequisites
This script will run on any modern Linux machine. For it to communicate with the NetApp devices, you will need SSH enabled on the NetApps, and ssh key exchange so that the Linux would be able to access the NetApp without using passwords.
The Script
Below is the script. You can download it and use it as you like.
#!/bin/bash # This script will monitor snapmirror status # Assumption: Access through ssh to root on all storage devices involved # This will also attempt to detect the diff which is to sync # Written by Ez-Aton. Check http://run.tournament.org.il for updates or # additional information # Modes: # alert -> alert if snapmirror is still active # throttle_limit -> Limit throttle to a given number (default or manually set) # throttle_unlimit -> Open throttle limitation # check_lag -> Report the snapmirror lage # check_size -> Report the estimated data size to move # Global variables CONF=/etc/snapmirror_monitor.conf LOG_PREFIX=/tmp test_connection () { # Test to see that you can access the storage device # Arguments: NetApp name SSH_OPTS="-o ConnectTimeout=2" if ! ssh $SSH_OPTS $1 hostname &>/dev/null then echo "Cannot communicate via SSH to $1" exit 1 fi } abort () { # Exit with a predefined error message echo $* exit 1 } get_arguments () { # Get all arguments and define options # Argument: $@ [ -z "$1" ] && set -- -h while [ -n "$1" ] do case "$1" in -m) shift case "$1" in alert|throttle_limit|throttle_unlimit|check_lag|check_size) MODE=$1 ;; *) abort "Mode is mandatory. Use -h flag to get list of avialable flags" ;; esac ;; -a) shift case "$1" in [nN][oO]) NOMAIL=1 ;; *) NOMAIL=0 ;; esac ;; -r) shift case "$1" in [yY][eE][sS]) REACT=1 ;; *) REACT=0 ;; esac ;; -d) shift declare -i DELAY_TMP DELAY_TMP=$1 [ "$DELAY_TMP" != "$1" ] && abort "Delay needs to be a number in minutes" DELAY=$DELAY_TMP ;; -t) shift declare -i THROTTLE_TMP THROTTLE_TMP=$1 [ "$THROTTLE_TMP" != "$1" ] && abort "Throttle needs to be a number" THROTTLE=$THROTTLE_TMP ;; -c) shift [ -f "$1" ] || abort "Cannot find specified conf file" CONF="$1" ;; -l) LOG=1 ;; -h) echo "Usage: $0 -m [alert|throttle_limit|throttle_unlimit|check_lag|check_size] (-c CONF_FILE) [tgt_filer:volume tgt_filer:/vol/vol/qtree]" echo "Alert if SnapMirror is still running: $0 -m alert [-a no] (-r yes) [tgt_filer:volume tgt_filer:/vol/vol/qtree]" echo "Alert and throttle (react): $0 -m alert [-a no] -r yes -t [throttle_in_kb] [tgt_filer:volume tgt_filer:/vol/vol/qtree]" echo "Throttle a running SnapMirror: $0 -m throttle_limit -t throttle_in_kb [tgt_filer:volume tgt_filer:/vol/vol/qtree]" echo "Unlimit SnapMirror throttle: $0 -m throttle_unlimit [tgt_filer:volume tgt_filer:/vol/vol/qtree]" echo "To check lag: $0 -m check_lag -d delay_in_minutes (-a no) [tgt_filer:volume tgt_filer:/vol/vol/qtree]" echo "To check delta: $0 -m check_size [tgt_filer:volume tgt_filer:/vol/vol/qtree]" exit 0 ;; *) [ -z "$MODE" ] && abort "$0 mode required" TGTS="$*" ;; esac shift done } notify () { # Send an e-mail notification # Arguments: $@ - the subject # Contents are empty # And yes - one e-mail per event mail -s "$@" $EMAIL /dev/null #Checks if the snapmirror is idle. If so, return true return $? } set_throttle () { # Sets throttle for target # Arguments: $1 Target name (example: storage:/vol/volname/qtree) # Arguments: $2 throttle value (number) # Get the storage name out NETAPP=${1%%:*} test_connection $NETAPP #Verify this netapp is accessible ssh $NETAPP snapmirror throttle $2 $1 } get_lag () { # Gets the lag of snapmirror relationship in minutes # Arguments: Target name (example: storage:/vol/volname/qtree) # Get the storage name out NETAPP=${1%%:*} test_connection $NETAPP #Verify this netapp is accessible LAG=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $4}'` # LAG is in hh:mm:ss. We need to transfer it to minutes only H=`echo $LAG | cut -f 1 -d :` M=`echo $LAG | cut -f 2 -d :` let M=$M+$H*60 echo $M } check_size () { # Checks the size of the snapshot to copy (diff) # Arguments: Target name (example: storage:/vol/volname/qtree) # Get the storage name out NETAPP=${1%%:*} test_connection $NETAPP #Verify this netapp is accessible # Get source storage name and path SRC=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $1}'` # Get the source filer and vol name from that NETAPP=${SRC%%:*} SPATH=${SRC##*:} SPATH=`echo $SPATH | sed s/'/vol/'//` SPATH=${SPATH%%/*} test_connection $NETAPP # Verify the target NetApp is accessible SNAP=`ssh $NETAPP snap list -n $SPATH | grep snapmirror | tail -1 | awk '{print $4}'` DELTA=`ssh $NETAPP snap delta $SPATH $SNAP | tail -2 | head -1 | awk '{print $5}'` echo "Snap delta for $1 is $DELTA KB" LOG_TARGET=`echo $1 | tr / _`.txt [ -n "$LOG" ] && echo "`date` $DELTA" >> $LOG_PREFIX/$LOG_TARGET } ### MAIN ### get_arguments $@ . $CONF &>/dev/null # if e-mail is not set, don't try to send [ -z "$EMAIL" ] && NOMAIL=1 [ -z "$TGTS" ] && abort "You need at least one snapmirror target" case $MODE in alert) if [ "$REACT" == "1" ] then [ -z "$THROTTLE" ] && abort "When setting 'react' flag, you must specify throttle" fi for i in $TGTS do if ! idle $i then echo -n "$i is not idle. " [ "$NOMAIL" != "1" ] && notify "$i is not idle" if [ "$REACT" == "1" ] then echo -n "We are set to react. Limiting throttle" set_throttle $i $THROTTLE fi echo fi done ;; throttle_limit) [ -z "$THROTTLE" ] && abort "Throttle requires throttle value" for i in $TGTS do echo "Setting throttle for $i to $THROTTLE" set_throttle $i $THROTTLE done ;; throttle_unlimit) for i in $TGTS do echo "Setting throttle for $i to unlimited" set_throttle $i 0 done ;; check_lag) [ -z "$DELAY" ] && DELAY=1440 for i in $TGTS do LAG=`get_lag $i` if [ "$LAG" -gt "$DELAY" ] then echo "Failure: The delay for $i is $LAG minutes" [ "$NOMAIL" != "1" ] && notify "$i is lagged $LAG minutes, above the threshold $DELAY" else echo "Normal: The delay for $i is $LAG minutes" fi done ;; check_size) for i in $TGTS do check_size $i done ;; *) echo "Option $MODE is not implemented yet" exit 0 ;; esac
I am trying to download the snapshot_mirror.sh script, with no success.
Could you please update/provide the path to this script?
thanks,
alexf
Weird. Probably a problem with the plugin.
Use the little down-facing arrow to the right of the snapshot_mirror.sh header in the post, and you will see the entire script. Just copy/paste its content. I will try to see how to solve it.
Thanks for letting me know about this problem!
Ez
#!/bin/bash
##############################
# Written by Arkadi Landes
# 07/2013
##############################
# GLOBAL Variables
EMAIL=”[email protected]”
MAX_DELAY_MINUTES=600
EHTRIX_VOL_NUMBER=17
SSH_SERVER=bzika
DEST_NETAPP=nalab01
EMAIL_TMP_FILE=/tmp/check_ethrix.tmp
# FUNCTIONS
get_lag_for_volume () {
vol=$1
#echo “Calculating lag for Volume $vol”
snap_time=`grep $vol /tmp/ethrix.txt| awk ‘{print $2}’`
H=`echo $snap_time | cut -f1 -d”:”`
M=`echo $snap_time | cut -f2 -d”:”`
let TOTAL_LAG_MINUTES=$H*60+$M
echo -e “The lag for $vol is $TOTAL_LAG_MINUTES minutesn”
}
##############################
# MAIN
##############################
# Empty email tmp file
:>$EMAIL_TMP_FILE
:>/tmp/ethrix.txt
echo -e “Welcomen”
echo “Checking snapmirror status in ${DEST_NETAPP} (thru ${SSH_SERVER})”
ssh ${SSH_SERVER} ssh ${DEST_NETAPP} snapmirror status | grep -i ethrix | awk -F” ” ‘{print $1″t”$4}’ > /tmp/ethrix.txt
# Check if all the volumes are in the output
if [ `wc -l /tmp/ethrix.txt | awk ‘{print$1}’` -ne $EHTRIX_VOL_NUMBER ]; then
echo “Some volumes are missing …”
exit 1
fi
for volume in `cat /tmp/ethrix.txt | awk ‘{print $1}’`
do
get_lag_for_volume $volume
if [[ $TOTAL_LAG_MINUTES -gt $MAX_DELAY_MINUTES ]]
then # LAG is LARGER than the MAX valu
echo “ERROR in volume $volume”
echo -e “Lag in volume $volume is $TOTAL_LAG_MINUTES. Its larger than the MAX value ($MAX_DELAY_MINUTES) n” >> ${EMAIL_TMP_FILE}
else # Lag is Smaller – Everything is OK
echo “Lag for volume $volume is OK…”
fi
done
# Send error report
if [ `wc -l ${EMAIL_TMP_FILE} | awk ‘{print$1}’` -gt 0 ]; then
cat ${EMAIL_TMP_FILE} | mutt -s “Ethrix Snapmirror Error” $EMAIL
fi
Thanks Arkady! I will try to implement it and use it. Thanks for sharing!
Ez