NetApp SnapMirror monitor script

I have had some work done lately with NetApp SnapMirror. I have snapped-mirrored some volumes and qtrees and I wanted to monitor their use and behavior over the line.

As you can expect, site-to-site replication of data is a fragile thing, especially when done on the level of the storage device, which is agnostic to the data kept on it. When replicating volumes, I should expect the relevant employees to be responsible regarding what’s placed there, because the storage does not filter out the junk. If someone had decided to add a new DVD image on the DB storage space, well – the DB won’t care, as long as there is enough free space, but the storage will attempt to replicate the added data to the alternate site, which means that if you are around your bandwidth limits, which is never a good thing, you will just create a delay gap you would hardly (if at all) be able to close.

For that, and since I don’t tend to trust people not to do stupid things, I have written this script.

What does it do?

This script will perform the following:

Alerting about non-idle SnapMirror session

Use with ‘-m alert’

Assuming SnapMirror is scheduled to a specific time, the script will alert if a session is active. With the flag ‘-a no’, it will not send an e-mail (if possible, see the configuration section below). With ‘-r yes’, it will react, setting throttle for each non-idle session, but then ‘-t VALUE’ should be specified, where VALUE is the numeric throttle in KB/s.

Limiting throttle to a SnapMirror session

Use with ‘-m throttle_limit’

The script will set a throttle for SnapMirror session(s). Setting limit by the flag ‘-t VALUE’, where VALUE is the numeric throttle in KB/s per each session.

Cancelling throttle limit

Use with ‘-m throttle_unlimit’

The script will set unlimited throttle for SnapMirror session(s).

Checking SnapMirror lag

Use with the ‘-m check_lag’

Since replication has a purpose of recovering, the lag of each SnapMirror session would show how far back we are. Use with ‘-d VALUE’, VALUE being numeric time in minutes to set alert threshold. The default threshold delay is one day (1440 minutes).

Checking snapshots size

Use with the ‘-m check_size’

This reports the expected delta to transfer. This can help estimate the success or failure of a future sync of data (snapmirror update) before it begins. Use with ‘-l’ flag to set it to log date/time of measure and the expected sizes into a file. By default, in /tmp/target_name.txt, where the target is the SnapMirror target.

General Options

Use with ‘-c filename’ for alternate configuration file.

Use with ‘-h’ to get general help.

Use with a list target names in the format of storage:/vol/volname/qtree or storage:volname to ignore targets in configuration file and use your own.

Configuration File

The configuration file is rather simple. By default it should be called “/etc/snapmirror_monitor.conf“. It consists of two main variables for the system:

TGTS=”storage2:/vol/volname/qtree

storage3:volname2

storage1:/vol/volnew/qtr2″

EMAIL=”user@domain.com another_user@domain.com”

Prerequisites

This script will run on any modern Linux machine. For it to communicate with the NetApp devices, you will need SSH enabled on the NetApps, and ssh key exchange so that the Linux would be able to access the NetApp without using passwords.

The Script

Below is the script. You can download it and use it as you like.

#!/bin/bash
# This script will monitor snapmirror status
# Assumption: Access through ssh to root on all storage devices involved
# This will also attempt to detect the diff which is to sync
 
# Written by Ez-Aton. Check http://run.tournament.org.il for updates or
# additional information
 
# Modes: 
# alert -> alert if snapmirror is still active
# throttle_limit -> Limit throttle to a given number (default or manually set)
# throttle_unlimit -> Open throttle limitation
# check_lag -> Report the snapmirror lage
# check_size -> Report the estimated data size to move
 
# Global variables
CONF=/etc/snapmirror_monitor.conf
LOG_PREFIX=/tmp
 
test_connection () {
        # Test to see that you can access the storage device
        # Arguments: NetApp name
        SSH_OPTS="-o ConnectTimeout=2"
        if ! ssh $SSH_OPTS $1 hostname &>/dev/null
        then
                echo "Cannot communicate via SSH to $1"
                exit 1
        fi
}
 
abort () {
        # Exit with a predefined error message
        echo $*
        exit 1
}
 
get_arguments () {
        # Get all arguments and define options
        # Argument: $@
        [ -z "$1" ] && set -- -h
        while [ -n "$1" ]
        do
                case "$1" in
                        -m)     shift
                                case "$1" in
                                        alert|throttle_limit|throttle_unlimit|check_lag|check_size)     MODE=$1
                                        ;;
                                        *)      abort "Mode is mandatory. Use -h flag to get list of avialable flags"
                                        ;;
                                esac
                                ;;
                        -a)     shift
                                case "$1" in
                                        [nN][oO])       NOMAIL=1
                                                        ;;
                                        *)              NOMAIL=0
                                                        ;;
                                esac
                                ;;
                        -r)     shift
                                case "$1" in
                                        [yY][eE][sS])   REACT=1
                                                        ;;
                                        *)              REACT=0
                                                        ;;
                                esac
                                ;;
                        -d)     shift
                                declare -i DELAY_TMP
                                DELAY_TMP=$1
                                [ "$DELAY_TMP" != "$1" ] && abort "Delay needs to be a number in minutes"
                                DELAY=$DELAY_TMP
                                ;;
                        -t)     shift
                                declare -i THROTTLE_TMP
                                THROTTLE_TMP=$1
                                [ "$THROTTLE_TMP" != "$1" ] && abort "Throttle needs to be a number"
                                THROTTLE=$THROTTLE_TMP
                                ;;
                        -c)     shift
                                [ -f "$1" ] || abort "Cannot find specified conf file"
                                CONF="$1"
                                ;;
                        -l)     LOG=1
                                ;;
                        -h)     echo "Usage: $0 -m [alert|throttle_limit|throttle_unlimit|check_lag|check_size] (-c CONF_FILE) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Alert if SnapMirror is still running: $0 -m alert [-a no] (-r yes) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Alert and throttle (react): $0 -m alert [-a no] -r yes -t [throttle_in_kb] [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Throttle a running SnapMirror: $0 -m throttle_limit -t throttle_in_kb [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Unlimit SnapMirror throttle: $0 -m throttle_unlimit [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "To check lag: $0 -m check_lag -d delay_in_minutes (-a no) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "To check delta: $0 -m check_size [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                exit 0
                                ;;
                        *)      [ -z "$MODE" ] && abort "$0 mode required"
                                TGTS="$*"
                                ;;
                esac
                shift
        done
}
 
notify () {
        # Send an e-mail notification
        # Arguments: $@ - the subject
        # Contents are empty
        # And yes - one e-mail per event
        mail -s "$@" $EMAIL < /dev/null
}
 
idle () {
        # Check if transaction is idle
        # Arguments: Target name (example: storage:/vol/volname/qtree
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        ssh $NETAPP snapmirror status $1 | tail -1 | grep Idle$ &>/dev/null #Checks if the snapmirror is idle. If so, return true
        return $?
}
 
set_throttle () {
        # Sets throttle for target
        # Arguments: $1 Target name (example: storage:/vol/volname/qtree)
        # Arguments: $2 throttle value (number)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        ssh $NETAPP snapmirror throttle $2 $1
}
 
get_lag () {
        # Gets the lag of snapmirror relationship in minutes
        # Arguments: Target name (example: storage:/vol/volname/qtree)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        LAG=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $4}'`
        # LAG is in hh:mm:ss. We need to transfer it to minutes only
        H=`echo $LAG | cut -f 1 -d :`
        M=`echo $LAG | cut -f 2 -d :`
        let M=$M+$H*60
        echo $M
}
 
check_size () {
        # Checks the size of the snapshot to copy (diff)
        # Arguments: Target name (example: storage:/vol/volname/qtree)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        # Get source storage name and path
        SRC=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $1}'`
        # Get the source filer and vol name from that
        NETAPP=${SRC%%:*}
        SPATH=${SRC##*:}
        SPATH=`echo $SPATH | sed s/'\/vol\/'//`
        SPATH=${SPATH%%/*}
 
        test_connection $NETAPP # Verify the target NetApp is accessible
        SNAP=`ssh $NETAPP snap list -n $SPATH | grep snapmirror | tail -1 | awk '{print $4}'`
        DELTA=`ssh $NETAPP snap delta $SPATH $SNAP | tail -2 | head -1 | awk '{print $5}'`
        echo "Snap delta for $1 is $DELTA KB"  
        LOG_TARGET=`echo $1 | tr / _`.txt
        [ -n "$LOG" ] && echo "`date` $DELTA" >> $LOG_PREFIX/$LOG_TARGET
}
 
 
### MAIN ###
get_arguments $@
. $CONF &>/dev/null
# if e-mail is not set, don't try to send
[ -z "$EMAIL" ] && NOMAIL=1
 
[ -z "$TGTS" ] && abort "You need at least one snapmirror target"
 
case $MODE in
        alert)  if [ "$REACT" == "1" ]
                then
                        [ -z "$THROTTLE" ] && abort "When setting 'react' flag, you must specify throttle"
                fi
                for i in $TGTS
                do
                        if ! idle $i
                        then
                                echo -n "$i is not idle. "
                                [ "$NOMAIL" != "1" ] && notify "$i is not idle"
                                if [ "$REACT" == "1" ]
                                then
                                        echo -n "We are set to react. Limiting throttle"
                                        set_throttle $i $THROTTLE
                                fi
                                echo
                        fi
                done
                ;;
        throttle_limit) [ -z "$THROTTLE" ] && abort "Throttle requires throttle value"
                        for i in $TGTS
                        do
                                echo "Setting throttle for $i to $THROTTLE"
                                set_throttle $i $THROTTLE
                        done
                        ;;
        throttle_unlimit)       for i in $TGTS
                                do
                                        echo "Setting throttle for $i to unlimited"
                                        set_throttle $i 0
                                done
                        ;;
        check_lag)      [ -z "$DELAY" ] && DELAY=1440
                        for i in $TGTS
                        do
                                LAG=`get_lag $i`
                                if [ "$LAG" -gt "$DELAY" ]
                                then
                                        echo "Failure: The delay for $i is $LAG minutes"
                                        [ "$NOMAIL" != "1" ] && notify "$i is lagged $LAG minutes, above the threshold $DELAY"
                                else
                                        echo "Normal: The delay for $i is $LAG minutes"
                                fi
                        done
                        ;;
        check_size)     for i in $TGTS
                        do
                                check_size $i
                        done
                        ;;
        *)      echo "Option $MODE is not implemented yet"
                exit 0
                ;;
esac

Rapid-guide – Updating RedHat initrd

Warning: This is not the recommended method if you’re not sure you know what you’re doing.

Linux Initial Ram Disk (initrd) is a mechanism to perform disk-independent actions before attempting to mount the ‘/’ disk. These actions usually include loading disk drivers, setting up LVM or software RAID, etc.

The reason these actions are performed within initrd is that it is all based on Ram Disk loaded by the boot loader, and thus it breaks the loop of “how would I load storage drivers without storage access?”

It happens that due to some special even we need to modify it manually. To do so we need first to open it, and then to close it back in, replacing (backup the old one, will you?) the previous one.

This is rather simple. The tools used by us will be ‘gzip’ and ‘cpio’.

Lets begin.

First – create a temporary directory:

mkdir /tmp/initrd

Extracting

We have our temporary directory, so now, we need to extract the initrd into it. I assume the name of the file is /boot/initrd.img. You should replace my line with whatever the name of your initrd file:

cd /tmp/initrd

cat /boot/initrd.img | gzip -dc |  cpio -id

This will extract the contents of the initrd into /tmp/initrd.

Now you can edit its contents directly.

Package

To package initrd back in, we will need to perform the following actions.

Warning – before you do it, make sure you have an available copy of your original initrd file, in case you have created some damage.

cd /tmp/initrd

find . | cpio -o -H newc | gzip -9 > /boot/initrd.img

This line packages the initrd, and replaces the old one.

That’s all for today :-)

Quickly install Xen Community Linux VM

On RHEL-type of systems, with virt-manager (libvirt), you can make use of virt-manager to easy your life. I, for myself, prefer to work with ‘xm‘ tools, but for the initial install, virt-manager is the quickest and most simple available tool.

To install a new Linux VM, all you need to follow this flow

Create an LV for your VM (I use LVs because it’s easier to manage). If not LV, use a file. To create an LV, run the following command

lvcreate -L 10G -n new_vm1 VolGroup00

I assume that the name you wish to grant is ‘new_vm1′ (better maintain order there, else you will find yourself with hundreds of small LVs you have no idea what to do with), and that the name of the volume group is ‘VolGroup00′. Change to different values to match your environment.

Next, make sure you have your ISO contents unpacked (you can use loop device) and exported via NFS (my favorite method).

To mount a CD/DVD ISO, you should use ‘mount’ command with the ‘loop’ options. This would look like this:

mount -o loop my_iso.iso /mnt/temp

Again, I assume the name of the ISO is my_iso.iso and that the target directory /mnt/temp is available.

Now, export your newly created directory. If you have NFS already running, you can either add to /etc/exports the newly mounted directory /mnt/temp and restart the ‘nfs’ service, or you could use ‘exportfs’ to add it:

exportfs -o no_root_squash *:/mnt/temp

would probably do the trick. I added ‘no_root_squash’ to make sure no permission/access problems present themselves during the installation phase. Test your export to verify it’s working.

Now you could begin your installation. Run the following command:

virt-install -n new_vm1 -r 512 -p -f /dev/VolGroup00/new_vm1 –nographics nfs://nfs_server:/mnt/temp

The name follows the ‘-n’ flag. The amount of RAM to give is 512MB. The -p means it’s paravirtualized. The -f shows which device will be the block device, and the last argument is the source of the installation. Do not use local files, as the VM installer should be able to access the installation source.

Following that, you should have a very nice TUI installation experience.

Now – let’s make this machine ‘xm’ compatible.

Currently, the VM is virt-manager compatible. It means you need virt-manager to start/stop it correctly. Since I prefer ‘xm’ commands, I will show you how to convert this machine to VM.

First – export its XML file:

virsh dumpxml new_vm1 > /tmp/new_vm1.xml

virsh domxml-to-native xen-xm /tmp/new_vm1.xml > /etc/xen/new_vm1

This should do the trick.

Now you can turn the newly created VM off, and remove the VM from virt-manager using

virsh undefine new_vm1

and you’re back to ‘xm’-only interface.

iSCSI persistent configurations agains us all

Using iSCSI with dm-multipath is rather common setup. With iSCSI running over Ethernet cables, which are too easy to disconnect (either on purpose or by mistake), being cheap and common technology – multipath becomes a must. If you have multiple network links, this is only expected that you use multipath for your iSCSI configuration. It’s cheap, it’s easy and it works.

This, however, comes with a price tag. Not money – the components are cheap and common, but there are configuration acts which should take place.

It is easy to find info either in the open-iscsi documentations, the Internet, whatever, and I will go over them just below, but there are some catches which one should be aware of.

Per the common documentation, unlike regular iSCSI communication, when dealing with multipath, you would like iSCSI to fail rather quickly and let the SCSI layer handle the errors, thus letting dm-multipath handle the errors and do its work.

The configuration directives are rather simple. In the iscsid.conf file (on RHEL5 located in /etc/iscsi/iscsid.conf ), you need to change the value

node.session.timeo.replacement_timeout =

To a very short period of time. By default, it is set to 120 seconds, which are two minutes before anyone will notify the SCSI subsystem of any disk IO errors. A good value would be 5 seconds, which should allow for very short network disconnection (which could happen) and still – let the dm-multipath manage errors fast enough so that applications would not fail on disk timeouts.

Another two parameters which should be defined are the following (read the comments above them in the config file):

node.conn[0].timeo.noop_out_interval =
node.conn[0].timeo.noop_out_timeout =

These values control the interval in which the iSCSI layer tests communication to the targets.

Also, in multipath.conf you will need to set the following feature, so that IOPs will not be lost:

features		"1 queue_if_no_path"

These configuration directives can be found in these two pages from RedHat:

iscsi-modifying-link-loss-behavior-dmmultipath

iscsi-replacements_timeout

This is nice and pretty. However, if you have failed to do so at start, and defined your iSCSI targets based on the default configurations, you will notice that it still takes very long for iSCSI to notify the SCSI subsystem of the errors. You could check the values used by iSCSI through running the command:

iscsiadm -m node -T <target name>

Check out especially the line called node.session.timeo.replacement_timeout. Its value is the one which decides the actual behavior of iSCSI.

To change it, there are several methods. One of them is to clean up the iSCSI persistent configurations, located in /var/lib/iscsi and then re-login to the iSCSI targets. Only then you will have the new target configuration.

Check again with iscsiadm as described above, and check that this value matches.

XenServer “Internal error: Failure… no loader found”

It has been long since I had the time to write here. I have recently been involved more and more with XenServer virtualization, as you might see in the blogs, and following a solution to a rather common problem, I have decided to post it here.

The problem: When attempting to boot a Linux VM on XenServer (5.0 and 5.5), you get the following error message:

Error: Starting VM ‘Cacti’ – Internal error: Failure(“Error from xenguesthelper: caught exception: Failure(\\\”Subprocess failure: Failure(\\\\\\\”xc_dom_linux_build: [2] xc_dom_find_loader: no loader found\\\\\\\\n\\\\\\\”)\\\”)”)

This is very common with Linux VMs which were converted from physical (or other, non-PV virtualization) to XenServer.

This will probably either happen during the P2V process, or after a successful update to the Linux VM.

The cause is that the original kernel, non PV-aware one, has not been removed, and GRUB likes to load from it. XenServer will use the GRUB menu, but will not display it to us to select our desired kernel.

With no chance to intervene, XenServer will attempt to load a PV-enabled machine using non-PV kernel, and will fail.

Preventing the problem is quite simple – remove your non-PV kernel (non-xen) so that future updates will not attempt to update it as well and set it to be the default kernel. Very simple.

Solving the problem in less than two minutes is a bit more tricky. Let’s see how to solve it.

All operations are performed from within the control domain. This guide does not apply to StorageLink or NetApp/Equalogic devices, as they behave differently. This applies only to LVM-over-something, whatever it may be.

First, we will need to find the name of the VDI we are to work on. Use xe in the following manner, using the VM’s name:

xe vbd-list vm-name-label=Cacti

uuid ( RO)             : 128f29dc-4a14-1a2d-75d1-8674d3d2403b
vm-uuid ( RO): eae053de-4a20-28a5-f335-f5a18dd79993
vm-name-label ( RO): Cacti
vdi-uuid ( RO): 90524af4-5b20-4412-9bfe-f1fe27f220b1
empty ( RO): false
device ( RO): xvda

uuid ( RO)             : de177727-b28a-8b79-e73e-d08366d56277
vm-uuid ( RO): eae053de-4a20-28a5-f335-f5a18dd79993
vm-name-label ( RO): Cacti
vdi-uuid ( RO): <not in database>
empty ( RO): true
device ( RO): xvdd

It is very common that xvdd is used for CDROM, so we can safely ignore the second section. The first section is the more interesting one. There is a correlation between the name of the VDI and the name of the LVM on the disk. We can find this specific LV using the following command. Notice that the name of the VDI is used here as the argument for the ‘grep’ command:

lvs | grep 90524af4-5b20-4412-9bfe-f1fe27f220b1

LV-90524af4-5b20-4412-9bfe-f1fe27f220b1 VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b -wi-a-  10.00G

We now have our LV path! As you can see, its status is offline. We need to set it to online state. Using both the LV and the VG name, we can do it like that:

lvchange -ay /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

Now we can access the volume. We can actually check that the problem is the one we look for, using pygrub:

pygrub /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

We should now see the GRUB menu of the VM at question. If you don’t see any menu, either you have missed a step or used the wrong disk.

The menu should show you all the list of kernels. The default one is the one highlighted, and if it doesn’t include the word “xen” with it, most likely that we have found the problem.

We now need to change to a PV-capable kernel. We will need to access the “/boot” partition of the Linux VM, and change GRUB’s options there.

First we map the disk to a loop device, so we can access its partitions:

losetup /dev/loop1 /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

Notice that you need to use the entire path to the LV, that the LV is online, and that loop1 is not in use. If it is, you will have a message saying something like “LOOP_SET_FD: Device or resource busy”

Now we need to access its partitions. We will map them using ‘kpartx’ to /dev/mapper/ devices. Notice we’re using the same loop device name:

kaprtx -a /dev/loop1

Now, new files present themselves in /dev/mapper:

ls -la /dev/mapper/
total 0
drwxr-xr-x  2 root root     220 Oct 24 12:39 .
drwxr-xr-x 14 root root   16560 Oct 24 12:31 ..
crw——-  1 root root  10, 62 Sep 29 10:15 control
brw-rw—-  1 root disk 252,  5 Oct 24 12:39 loop1p1
brw-rw—-  1 root disk 252,  6 Oct 24 12:39 loop1p2
brw-rw—-  1 root disk 252,  7 Oct 24 12:39 loop1p3

Usually, the first partition represents /boot, so we can now mount it and work on it:

mount /dev/mapper/loop1p1 /mnt

All we need to do is edit /mnt/grub/menu.lst to match our requirements, and then wrap everything back up:

umount /mnt

kpartx -u /dev/loop1

losetup -d /dev/loop1

We don’t have to change the LV to offline, because the XenServer will activate it if it’s not, however, we could do it, to be on the safe side:

lvchange -an /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

Now we can activate the VM, and see it boot successfully.

This whole process takes several minutes the first time, and even less later.

I hope this helps.

Citrix XenServer 5.0 cannot cooperate with NetApp SnapMirror

It has been a long while, I know. I was busy with life, work and everything around it. Not much worth mentioning.

This, however, is something else.

I have discovered an issue with Citrix XenServer 5.0 (probably the case with 5.5, but I have other issues with that release) using NetApp through NetApp API SR – Any non XenServer-generated snapshot will be deleted as soon as any snapshot-related action would be performed on that volume. Meaning that if I had manually created a snapshot called “1111″ (short and easy to recognize, especially with all these UUID-based volumes, LUNs and snapshot names XenServer uses…), the next time anyone would create a snapshot of a machine which has a disk (VDI) on this specific volume, the snapshot, my snapshot, “1111″ will be removed under that specific volume. The message seen in /var/log/SMlog would look like this:

Removing unused snap (1111)

While under normal operation, this does not matter much, as non-XenServer snapshots have little value, when using NetApp SnapMirror technology, the mechanism works a bit differently.

It appears that the SnapMirror system takes snapshots with predefined names (non-XenServer UUID type, luckily for us all). These snapshots include the entire changes performed since the last SnapMirror snapshots, and are used for replication. Unfortunately, XenServer deletes them. No SnapMirror snapshots, well, this is quite obvious, is it not? No SnapMirror…

We did not detect this problem immediately, and I should take the blame for that. I had to define a set of simple trial and error tests, as described above, instead of battling with a system I did not quite follow at that time – NetApp SnapMirror. Now I do, however, and I have this wonderful insight which can make your personal life, if you had issues with SnapMirror and XenServer, and did not know how to make it work, better. This solution cannot be an official one, due to its nature, which you will understand shortly. This is a personal patch for your pleasure, based on the hard fact that SnapMirror uses a predefined name for its snapshots. This name, in my case, is the name of the DR storage device. You must figure out what name is being used as part of the snapshot naming convention on your own site. Search for my ’storagedr’ phrase, and replace it with yours.

This is the diff file for /opt/xensource/sm/NETAPPSR.py . Of course – back up your original file. Also – this is not an official patch. It was tested to function correctly on XenServer 5.0, and it will not work on XenServer 5.5 (since NETAPPSR.py is different). Last warning – it might break on the next update or upgrade you have for your XenServer environment, and if that happens, you better monitor your SnapMirror status closely then.

400,403c400,404
<                     util.SMlog("Removing unused snap (%s)" % val)
<                     out = netapplib.fvol_snapdelete_wrapper(self.sv, val, volname)
<                     if not na_test_result(out):
<                         pass
---
> 		    if 'storagedr' not in val:
>                     	util.SMlog("Removing unused snap (%s)" % val)
>                     	out = netapplib.fvol_snapdelete_wrapper(self.sv, val, volname)
>                     	if not na_test_result(out):
>                         	pass

Hope it helps!

XenServer create snapshots for all machines

XenServer is a wonderful tool. One of the better parts of it is its powerful scripting language, powered by the ‘xe’ command.

In order to capture a mass of snapshots, you can either do it manually from the GUI, or scripted. The script supplied below will include shell functions to capture Quiesce snapshots, and it that fails, normal snapshots of every running VM on the system.

Reason: NetApp SnapMirror, or other backup (maybe for later export) scheduled actions.

#!/bin/bash
# This script will supply functions for snapshotting and snapshot destroy including disks
# Written by Ez-Aton
# Visit my web blog for more stuff, at http://run.tournament.org.il
 
# Global variables:
UUID_LIST_FILE=/tmp/SNAP_UUIDS.txt
 
# Function
function assign_all_uuids () {
	# Construct artificial non-indexed list with name (removing annoying characters) and UUID
	LIST=""
	for UUID in `xe vm-list power-state=running is-control-domain=false | grep uuid | awk '{print $NF}'`
	do
		NAME=`xe vm-param-get param-name=name-label uuid=$UUID | tr ' ' _ | tr -d '(' | tr -d ')'`
		LIST="$LIST $NAME:$UUID"
	done
	echo $LIST
}
 
function take_snap_quiesce () {
	# We attempt to take a snapshot with quench
	# Arguments: $1 name ; $2 uuid
	# We attempt to snapshot the machine and set the value of snap_uuid to the snapshot uuid, if successful.
	# Return 1 if failed
 
	if SNAP_UUID=`xe vm-snapshot-with-quiesce vm=$2 new-name-label=${1}_snapshot`
	then
		# echo "Snapshot-with-quiesce for $1 successful"
		return 0
	else
		echo "Snapshot-with-quiesce for $1 failed"
		return 1
	fi
}
 
function take_snap () {
	# We attempt to take a snapshot
	# Arguments: $1 name ; $2 uuid
	# We attempt to snapshot the machine and set the value of snap_uuid to the snapshot uuid, if successful.
	# Return 1 if failed
 
	if SNAP_UUID=`xe vm-snapshot vm=$2 new-name-label=${1}_snapshot`
	then
		#echo "Snapshot for $1 successful"
		echo $SNAP_UUID
		return 0
	else
		echo "Snapshot-with-quiesce for $1 failed"
		return 1
	fi
}
 
function stop_ha_template () {
	# Templates inherit their settings from the origin
	# We need to turn off HA
	# $1 : Template UUID
	if [ -z "$1" ]
	then
		echo "Missing template UUID"
		return 1
	fi
	xe template-param-set ha-always-run=false uuid=$1
}
 
function get_vdi () {
	# This function will get a space delimited list of VDI UUIDs of a given snapshot/template UUID
	# Arguments: $1 template UUID
	# It will also verify that each VBD is an actual snapshot
	if [ -z "$1" ]
	then
		echo "No arguments? We need the template UUID"
		return 1
	fi
	VDIS=""
	for VBD in `xe vbd-list vm-uuid=$1 | grep ^uuid | awk '{print $NF}'`
	do
		echo "VBD: $VBD"
		if [ ! `xe vbd-param-get param-name=type uuid=$VBD` = "CD" ]
		then
			CUR_VDI=`xe vdi-list vbd-uuids=$VBD | grep ^uuid | awk '{print $NF}'`
			if `xe vdi-param-get uuid=$CUR_VDI param-name=is-a-snapshot`
			then
				VDIS="$VDIS $CUR_VDI"
			else
				echo "VDI is not a snapshot!"
				return 1
			fi
			CUR_VDI=""
		fi
	done
	echo $VDIS
}
 
function remove_vdi () {
	# This function will get a list of VDIs and remove them
	# Carefull!
	for VDI in $@
	do
		if xe vdi-destroy uuid=$VDI
		then
			echo "Success in removing VDI $VDI"
		else
			echo "Failure in removing VDI $VDI"
			return 1
		fi
	done
}
 
function remove_template () {
	# This funciton will remove a template
	# $1 template UUID
	if [ -z "$1" ]
	then
		echo "Required UUID"
		return 1
	fi
	xe template-param-set is-a-template=false uuid=$1
	if ! xe vm-uninstall force=true uuid=$1
	then
		echo "Failure to remove VM/Template"
		return 1
	fi
}
 
function remove_all_template () {
	# This function will completely remove a template
	# The steps are as follow:
	# $1 is the UUID of the template
	# Calculate its VDIs
	# Remove the template
	# Remove the VDIs
	if [ -z "$1" ]
	then
		echo "No Template UUID was supplied"
		return 1
	fi
	# We now collect the value of $VDIS
	get_vdi $1
	if [ "$?" -ne "0" ]
	then
		echo "Failed to get VDIs for Template $1"
		return 1
	fi
	if ! remove_template $1
	then
		echo "Failure to remove template $1"
		return 1
	fi
	if ! remove_vdi $VDIS
	then
		return 1
	fi
}
 
function create_all_snapshots () {
	# In this function we will run all over $LIST and create snapshots of each machine, keeping the UUID of it inside a file
	# $@ - list of machines in the $LIST format
	if [ -f $UUID_LIST_FILE ]
	then
		mv $UUID_LIST_FILE $UUID_LIST_FILE.$$
	fi
	for i in $@
	do
		SNAP_UUID=`take_snap_quiesce ${i%%:*} ${i##*:}`
		if [ "$?" -ne "0" ]
		then
			echo "Problem taking snapshot with quiesce for ${i%%:*}"
			echo "Attempting normal snapshot"
			SNAP_UUID=`take_snap ${i%%:*} ${i##*:}`
			if [ "$?" -ne "0" ]
                	then
                        	echo "Problem taking snapshot for ${i%%:*}"
				SNAP_UUID=""
			fi
		fi
		stop_ha_template $SNAP_UUID
		echo $SNAP_UUID >> $UUID_LIST_FILE
	done
}

Possible use will be like this:

. /usr/local/bin/xen_functions.sh

create_all_snapshots `assign_all_uuids` &> /tmp/snap_create.log

Ad-hoc remote backups to tape

I have a nice SCSI tape connected to a single server. This allows for on-demand backups, with the hope (and seldom, with the established knowledge) that I can recover the data I have there.

Old computers, decommissioned computers and systems I wish to erase and reuse are seldom backed-up, just because of the effort in doing it. I will need to manually run something or the other, and who wants this chore?

I know that there are many full-featured backup systems out there, OSS and all, with the capability of doing what I want to do, however, these commonly use backup agents, tape formats and what’s more, just to make a simple one-time backup (which is what I want) – it looked too bloated for my needs.

Again – my needs are: take this machine, run a simple script which can be obtained from an NFS share, wait for X minutes doing something else, and be assured your system is backed up.

I have written the script below to satisfy these requirements. Hope it helps others. Notice the single SSH leading connection and its functionality. It leaves a raw text file on tape with a simple description of the backup process, and the next tracks are the contents of each mount point.

I was a bit spartan with comments, but in general, this script should be quite self-explanatory:

#!/bin/bash
# This script will backup local disk to remote tape
# Written by Ez-Aton - http://run.tournament.org.il/
 
SERVER=kruvi # The name of the server with the direct attached tape
SRV_USER=root
TAPE=/dev/nst0 # Non-rewinding tape. We need to be able to add more tracks and not overwrite our own track
SSH="ssh -o StrictHostKeyChecking=no -o ControlMaster=auto -o ControlPath=~/.ssh/socket-%r@%h:%p"
WORK_FILE=/tmp/work.$$
TAR_LOG=/tmp/backup.log
TAR_ARG="czf - --one-file-system"
 
MOUNTS=`df -TlP | grep -v tmpfs | tail -n +2 | awk '{print $7}'`
# Assume nobody is stupid enough to use white spaces in mount paths
NUM_MOUNTS=`echo $MOUNTS | wc -w`
SUM_FILE=/tmp/summery.txt
 
clean_log () {
        : > $TAR_LOG
}
 
first_disk () {
        # Assume first disk is the first entry in /proc/partitions
        DISK="/dev/`cat /proc/partitions | head -n 3 | tail -n 1 | awk '{print $4}'`"
}
 
create_sum () {
        echo "Creating summery"
        # Collect information and place it in the file. It will be the first track of the tape
        echo "Hostname: `hostname`" > $SUM_FILE
        echo >> $SUM_FILE
        date >> $SUM_FILE
        echo >> $SUM_FILE
        for i in $MOUNTS; do df -h $i | tail -n +2 >> $SUM_FILE ; done
        echo >> $SUM_FILE
        echo "There will be $(($NUM_MOUNTS + 1)) tracks in addition to the first one" >> $SUM_FILE
}
 
create_leading_ssh () {
        # Use a nice trick for giving password only once:
        $SSH -f $SRV_USER@$SERVER 'while true; do sleep 100; done'
        echo "post leading"
}
 
monitor_proc () {
        # Monitor SSH process
        # Run in the background
        touch $WORK_FILE
        PID=`ps aux | grep "$SSH" | grep -v grep | awk '{print $2}'`
        if [ -z "$PID" ]
        then
                echo "Done so soon?"
                return 1
        fi
        while [ -f $WORK_FILE ]
        do
                sleep 10
        done
        kill $PID
}
 
test_tape_cmd () {
        CMD="mt -f $TAPE status"
}
 
remote_tape_append () {
        CMD="cat > $TAPE"
}
 
test_tape () {
        test_tape_cmd
        if ! $SSH $SRV_USER@$SERVER $CMD
        then
                echo "Tape on $SERVER is not ready"
                exit 1
        fi
}
 
backup_mount () {
        # Backup the actual mount
        # $1 - the path of the mount
        remote_tape_append
        if [ -z "$1" ]
        then
                echo "Mount path is empty?"
                exit 1
        fi
        echo "Backing up $1"
        cd "$1"
        tar $TAR_ARG . | $SSH $SRV_USER@$SERVER "$CMD" > $TAR_LOG 2>&1
}
 
append_header () {
        remote_tape_append
        cat $SUM_FILE | $SSH $SRV_USER@$SERVER "$CMD"
}
 
add_mbr () {
        remote_tape_append
        first_disk
        if [ -z "$DISK" ]
        then
                echo "Can't decide on the first boot disk. Exiting now"
                echo "No MBR backup exists"
                exit 0
        fi
        echo "Backing MBR"
        dd if=$DISK bs=1M count=1 | $SSH $SRV_USER@$SERVER "$CMD"
}
 
create_sum
create_leading_ssh
monitor_proc &
test_tape
append_header
for i in $MOUNTS
do
        backup_mount $i
done
add_mbr
rm $WORK_FILE

Oracle Clusterware as a 3rd party HA framework

Oracle begin to push their Clusterware as a 3rd party HA framework. In this article we will review a quick example of how to do it. I will refer to this post as a quick-guide, as this is by no means any full-scale guide.

This article assumes you have installed Oracle Clusterware following one of the few links and guides available on the net. This quick-guide applies to both Clusterware 10 and Clusterware 11.

We will discuss the method of adding an additional NFS service on Linux.

In order to do so, you will need a shared storage – assuming the goal of the exercise is to supply the clients with a consistent storage services based on NFS. I, for myself, prefer to use OCFS2 as the choice file system for shared disks. This goes well with Oracle Clusterware, as this cluster framework does not handle disk mounts very well, and unless you are to write/search an agent which will make sure that every mount and umount behave correctly (you wouldn’t want to get a file system corruption, would you?), you will probably prefer to do the same. The lack of need to manage the disk mount actions will both save time on planned failover, and will guarantee storage safety. If you have not placed your CRS and Vote on OCFS2, you will need to install OCFS2 from here and here, and then to configure it. We will not discuss OCFS2 configuration in this post.

We will need to assume the following prerequisites:

  • Service-related IP address: 1.2.3.4. Netmask 255.255.255.248. You need this IP to be member of the same class as your public network card is.
  • Shared Storage: Formatted to OCFS2, and mounted on both nodes on /shared
  • Oracle Clusterware installed and working
  • Cluster nodes names are “node1″ and “node2″
  • Have $CRS_HOME point to your CRS installation
  • Have $CRS_HOME/bin in your $PATH

We need to create the service-related IP resource first. I would recommend to have an entry in /etc/hosts for this IP address on both nodes. Assuming the public NIC is eth0, The command would be

crs_profile -create nfs_ip -t application -a $CRS_HOME/bin/usrvip -o oi=eth0,ov=1.2.3.4,on=255.255.255.248

Now you will need to set running permissions for the oracle user. In my case, the user name is actually “oracle”:

crs_setperm nfs_ip -o root
crs_serperm nfs_ip -u user:oracle:r-x

Test that you can start the service as the oracle user:

crs_start nfs_ip

Now we need to setup NFS. For this to work, we need to setup the NFS daemon first. Edit /etc/exports and add a line such as this:

/shared *(rw,no_root_sqush,sync)

Make sure that nfs service is disabled during startup:

chkconfig nfs off
chkconfig nfslock off

Now is the time to setup Oracle Clusterware for the task:

crs_profile -create share_nfs -t application -B /etc/init.d/nfs -d “Shared NFS” -r nfs_ip -a sharenfs.scr -p favored -h “node1 node2″ -o ci=30,ft=3,fi=12,ra=5
crs_register share_nfs

Deal with permissions:

crs_setperms share_nfs -o root
crs_setperms share_nfs -u user:oracle:r-x

Fix the “sharenfs.scr” script. First, find it. It should reside in $CRS_HOME/crs/scripts if everything is OK. If not, you will be able to find it in $CRS_HOME using find.

Edit the “sharenfs.scr” script and modify the following variables which are defined relatively in the beginning of the script:

PROBE_PROCS=”nfsd”
START_APPCMD=”/etc/init.d/nfs start
START_APPCMD2=”/etc/init.d/nfslock start”
STOP_APPCMD=”/etc/init.d/nfs stop”
STOP_APPCMD2=”/etc/init.d/nfslock stop”

Copy the modified script file to the other node. Verify this script has execution permissions on both nodes.

Start the service as the oracle user:

crs_start sharenfs

Test the service. The following command should return the export path:

showmount -e 1.2.3.4

Relocate the service and test again:

crs_relocate -f sharenfs
showmount -e 1.2.3.4

Done. You now have HA NFS service above Oracle Clusterware framework.

I used this web page as a reference. I thank him for his great work!

RHEL5 100% CPU with LDAP client for Active Directory

ADS integration has been available natively since Windows 2003 R2, and in heterogeneous sites this has become the preferred method of integrating login information, as well as utilizing the added security of using Kerberos wherever possible.

The following guide is a very good one, and was the source of information I have used throughout my work integrating Linux into ADS. So far it has worked quite well for RHEL4.

RHEL5, on the other hand, is a different story. While it can work, and ldap queries return sensible results, it is too common for a process to utilize 100% CPU while doing absolutely nothing.

My research brought me to the following conclusions:

  • The high CPU utilization is being caused by something RHEL5 specific (tested to work correctly for RHEL4)
  • High CPU utilization is caused by nss_ldap module.
  • Yes, it does happen to every nss related service. NSCD does not help, and gets to 100% CPU also.
  • Tracing to nss_ldap modules return after a very long time (if ever) that the session to the ADS server has somehow hanged.

You can see an example of this bug in this specific bugzilla entry.

A quick and effective workaround was used after examining the differences between configuration directives for RHEL4 and RHEL5. Forcing LDAP version 2 instead of 3 (which is the default for RHEL5 ldap client, as it attempts the highest version possible) results in a correct behavior.The line in /etc/ldap.conf is:

ldap_version 2

FYI