Posts Tagged ‘netapp’

NetApp SnapMirror monitor script

Sunday, December 13th, 2009

I have had some work done lately with NetApp SnapMirror. I have snapped-mirrored some volumes and qtrees and I wanted to monitor their use and behavior over the line.

As you can expect, site-to-site replication of data is a fragile thing, especially when done on the level of the storage device, which is agnostic to the data kept on it. When replicating volumes, I should expect the relevant employees to be responsible regarding what’s placed there, because the storage does not filter out the junk. If someone had decided to add a new DVD image on the DB storage space, well – the DB won’t care, as long as there is enough free space, but the storage will attempt to replicate the added data to the alternate site, which means that if you are around your bandwidth limits, which is never a good thing, you will just create a delay gap you would hardly (if at all) be able to close.

For that, and since I don’t tend to trust people not to do stupid things, I have written this script.

What does it do?

This script will perform the following:

Alerting about non-idle SnapMirror session

Use with ‘-m alert’

Assuming SnapMirror is scheduled to a specific time, the script will alert if a session is active. With the flag ‘-a no’, it will not send an e-mail (if possible, see the configuration section below). With ‘-r yes’, it will react, setting throttle for each non-idle session, but then ‘-t VALUE’ should be specified, where VALUE is the numeric throttle in KB/s.

Limiting throttle to a SnapMirror session

Use with ‘-m throttle_limit’

The script will set a throttle for SnapMirror session(s). Setting limit by the flag ‘-t VALUE’, where VALUE is the numeric throttle in KB/s per each session.

Cancelling throttle limit

Use with ‘-m throttle_unlimit’

The script will set unlimited throttle for SnapMirror session(s).

Checking SnapMirror lag

Use with the ‘-m check_lag’

Since replication has a purpose of recovering, the lag of each SnapMirror session would show how far back we are. Use with ‘-d VALUE’, VALUE being numeric time in minutes to set alert threshold. The default threshold delay is one day (1440 minutes).

Checking snapshots size

Use with the ‘-m check_size’

This reports the expected delta to transfer. This can help estimate the success or failure of a future sync of data (snapmirror update) before it begins. Use with ‘-l’ flag to set it to log date/time of measure and the expected sizes into a file. By default, in /tmp/target_name.txt, where the target is the SnapMirror target.

General Options

Use with ‘-c filename’ for alternate configuration file.

Use with ‘-h’ to get general help.

Use with a list target names in the format of storage:/vol/volname/qtree or storage:volname to ignore targets in configuration file and use your own.

Configuration File

The configuration file is rather simple. By default it should be called “/etc/snapmirror_monitor.conf“. It consists of two main variables for the system:

TGTS=”storage2:/vol/volname/qtree

storage3:volname2

storage1:/vol/volnew/qtr2″

EMAIL=”user@domain.com another_user@domain.com”

Prerequisites

This script will run on any modern Linux machine. For it to communicate with the NetApp devices, you will need SSH enabled on the NetApps, and ssh key exchange so that the Linux would be able to access the NetApp without using passwords.

The Script

Below is the script. You can download it and use it as you like.

#!/bin/bash
# This script will monitor snapmirror status
# Assumption: Access through ssh to root on all storage devices involved
# This will also attempt to detect the diff which is to sync
 
# Written by Ez-Aton. Check http://run.tournament.org.il for updates or
# additional information
 
# Modes: 
# alert -> alert if snapmirror is still active
# throttle_limit -> Limit throttle to a given number (default or manually set)
# throttle_unlimit -> Open throttle limitation
# check_lag -> Report the snapmirror lage
# check_size -> Report the estimated data size to move
 
# Global variables
CONF=/etc/snapmirror_monitor.conf
LOG_PREFIX=/tmp
 
test_connection () {
        # Test to see that you can access the storage device
        # Arguments: NetApp name
        SSH_OPTS="-o ConnectTimeout=2"
        if ! ssh $SSH_OPTS $1 hostname &>/dev/null
        then
                echo "Cannot communicate via SSH to $1"
                exit 1
        fi
}
 
abort () {
        # Exit with a predefined error message
        echo $*
        exit 1
}
 
get_arguments () {
        # Get all arguments and define options
        # Argument: $@
        [ -z "$1" ] && set -- -h
        while [ -n "$1" ]
        do
                case "$1" in
                        -m)     shift
                                case "$1" in
                                        alert|throttle_limit|throttle_unlimit|check_lag|check_size)     MODE=$1
                                        ;;
                                        *)      abort "Mode is mandatory. Use -h flag to get list of avialable flags"
                                        ;;
                                esac
                                ;;
                        -a)     shift
                                case "$1" in
                                        [nN][oO])       NOMAIL=1
                                                        ;;
                                        *)              NOMAIL=0
                                                        ;;
                                esac
                                ;;
                        -r)     shift
                                case "$1" in
                                        [yY][eE][sS])   REACT=1
                                                        ;;
                                        *)              REACT=0
                                                        ;;
                                esac
                                ;;
                        -d)     shift
                                declare -i DELAY_TMP
                                DELAY_TMP=$1
                                [ "$DELAY_TMP" != "$1" ] && abort "Delay needs to be a number in minutes"
                                DELAY=$DELAY_TMP
                                ;;
                        -t)     shift
                                declare -i THROTTLE_TMP
                                THROTTLE_TMP=$1
                                [ "$THROTTLE_TMP" != "$1" ] && abort "Throttle needs to be a number"
                                THROTTLE=$THROTTLE_TMP
                                ;;
                        -c)     shift
                                [ -f "$1" ] || abort "Cannot find specified conf file"
                                CONF="$1"
                                ;;
                        -l)     LOG=1
                                ;;
                        -h)     echo "Usage: $0 -m [alert|throttle_limit|throttle_unlimit|check_lag|check_size] (-c CONF_FILE) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Alert if SnapMirror is still running: $0 -m alert [-a no] (-r yes) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Alert and throttle (react): $0 -m alert [-a no] -r yes -t [throttle_in_kb] [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Throttle a running SnapMirror: $0 -m throttle_limit -t throttle_in_kb [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Unlimit SnapMirror throttle: $0 -m throttle_unlimit [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "To check lag: $0 -m check_lag -d delay_in_minutes (-a no) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "To check delta: $0 -m check_size [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                exit 0
                                ;;
                        *)      [ -z "$MODE" ] && abort "$0 mode required"
                                TGTS="$*"
                                ;;
                esac
                shift
        done
}
 
notify () {
        # Send an e-mail notification
        # Arguments: $@ - the subject
        # Contents are empty
        # And yes - one e-mail per event
        mail -s "$@" $EMAIL < /dev/null
}
 
idle () {
        # Check if transaction is idle
        # Arguments: Target name (example: storage:/vol/volname/qtree
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        ssh $NETAPP snapmirror status $1 | tail -1 | grep Idle$ &>/dev/null #Checks if the snapmirror is idle. If so, return true
        return $?
}
 
set_throttle () {
        # Sets throttle for target
        # Arguments: $1 Target name (example: storage:/vol/volname/qtree)
        # Arguments: $2 throttle value (number)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        ssh $NETAPP snapmirror throttle $2 $1
}
 
get_lag () {
        # Gets the lag of snapmirror relationship in minutes
        # Arguments: Target name (example: storage:/vol/volname/qtree)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        LAG=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $4}'`
        # LAG is in hh:mm:ss. We need to transfer it to minutes only
        H=`echo $LAG | cut -f 1 -d :`
        M=`echo $LAG | cut -f 2 -d :`
        let M=$M+$H*60
        echo $M
}
 
check_size () {
        # Checks the size of the snapshot to copy (diff)
        # Arguments: Target name (example: storage:/vol/volname/qtree)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        # Get source storage name and path
        SRC=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $1}'`
        # Get the source filer and vol name from that
        NETAPP=${SRC%%:*}
        SPATH=${SRC##*:}
        SPATH=`echo $SPATH | sed s/'\/vol\/'//`
        SPATH=${SPATH%%/*}
 
        test_connection $NETAPP # Verify the target NetApp is accessible
        SNAP=`ssh $NETAPP snap list -n $SPATH | grep snapmirror | tail -1 | awk '{print $4}'`
        DELTA=`ssh $NETAPP snap delta $SPATH $SNAP | tail -2 | head -1 | awk '{print $5}'`
        echo "Snap delta for $1 is $DELTA KB"  
        LOG_TARGET=`echo $1 | tr / _`.txt
        [ -n "$LOG" ] && echo "`date` $DELTA" >> $LOG_PREFIX/$LOG_TARGET
}
 
 
### MAIN ###
get_arguments $@
. $CONF &>/dev/null
# if e-mail is not set, don't try to send
[ -z "$EMAIL" ] && NOMAIL=1
 
[ -z "$TGTS" ] && abort "You need at least one snapmirror target"
 
case $MODE in
        alert)  if [ "$REACT" == "1" ]
                then
                        [ -z "$THROTTLE" ] && abort "When setting 'react' flag, you must specify throttle"
                fi
                for i in $TGTS
                do
                        if ! idle $i
                        then
                                echo -n "$i is not idle. "
                                [ "$NOMAIL" != "1" ] && notify "$i is not idle"
                                if [ "$REACT" == "1" ]
                                then
                                        echo -n "We are set to react. Limiting throttle"
                                        set_throttle $i $THROTTLE
                                fi
                                echo
                        fi
                done
                ;;
        throttle_limit) [ -z "$THROTTLE" ] && abort "Throttle requires throttle value"
                        for i in $TGTS
                        do
                                echo "Setting throttle for $i to $THROTTLE"
                                set_throttle $i $THROTTLE
                        done
                        ;;
        throttle_unlimit)       for i in $TGTS
                                do
                                        echo "Setting throttle for $i to unlimited"
                                        set_throttle $i 0
                                done
                        ;;
        check_lag)      [ -z "$DELAY" ] && DELAY=1440
                        for i in $TGTS
                        do
                                LAG=`get_lag $i`
                                if [ "$LAG" -gt "$DELAY" ]
                                then
                                        echo "Failure: The delay for $i is $LAG minutes"
                                        [ "$NOMAIL" != "1" ] && notify "$i is lagged $LAG minutes, above the threshold $DELAY"
                                else
                                        echo "Normal: The delay for $i is $LAG minutes"
                                fi
                        done
                        ;;
        check_size)     for i in $TGTS
                        do
                                check_size $i
                        done
                        ;;
        *)      echo "Option $MODE is not implemented yet"
                exit 0
                ;;
esac

Citrix XenServer 5.0 cannot cooperate with NetApp SnapMirror

Tuesday, September 8th, 2009

It has been a long while, I know. I was busy with life, work and everything around it. Not much worth mentioning.

This, however, is something else.

I have discovered an issue with Citrix XenServer 5.0 (probably the case with 5.5, but I have other issues with that release) using NetApp through NetApp API SR – Any non XenServer-generated snapshot will be deleted as soon as any snapshot-related action would be performed on that volume. Meaning that if I had manually created a snapshot called “1111″ (short and easy to recognize, especially with all these UUID-based volumes, LUNs and snapshot names XenServer uses…), the next time anyone would create a snapshot of a machine which has a disk (VDI) on this specific volume, the snapshot, my snapshot, “1111″ will be removed under that specific volume. The message seen in /var/log/SMlog would look like this:

Removing unused snap (1111)

While under normal operation, this does not matter much, as non-XenServer snapshots have little value, when using NetApp SnapMirror technology, the mechanism works a bit differently.

It appears that the SnapMirror system takes snapshots with predefined names (non-XenServer UUID type, luckily for us all). These snapshots include the entire changes performed since the last SnapMirror snapshots, and are used for replication. Unfortunately, XenServer deletes them. No SnapMirror snapshots, well, this is quite obvious, is it not? No SnapMirror…

We did not detect this problem immediately, and I should take the blame for that. I had to define a set of simple trial and error tests, as described above, instead of battling with a system I did not quite follow at that time – NetApp SnapMirror. Now I do, however, and I have this wonderful insight which can make your personal life, if you had issues with SnapMirror and XenServer, and did not know how to make it work, better. This solution cannot be an official one, due to its nature, which you will understand shortly. This is a personal patch for your pleasure, based on the hard fact that SnapMirror uses a predefined name for its snapshots. This name, in my case, is the name of the DR storage device. You must figure out what name is being used as part of the snapshot naming convention on your own site. Search for my ’storagedr’ phrase, and replace it with yours.

This is the diff file for /opt/xensource/sm/NETAPPSR.py . Of course – back up your original file. Also – this is not an official patch. It was tested to function correctly on XenServer 5.0, and it will not work on XenServer 5.5 (since NETAPPSR.py is different). Last warning – it might break on the next update or upgrade you have for your XenServer environment, and if that happens, you better monitor your SnapMirror status closely then.

400,403c400,404
<                     util.SMlog("Removing unused snap (%s)" % val)
<                     out = netapplib.fvol_snapdelete_wrapper(self.sv, val, volname)
<                     if not na_test_result(out):
<                         pass
---
> 		    if 'storagedr' not in val:
>                     	util.SMlog("Removing unused snap (%s)" % val)
>                     	out = netapplib.fvol_snapdelete_wrapper(self.sv, val, volname)
>                     	if not na_test_result(out):
>                         	pass

Hope it helps!

XenServer create snapshots for all machines

Friday, August 7th, 2009

XenServer is a wonderful tool. One of the better parts of it is its powerful scripting language, powered by the ‘xe’ command.

In order to capture a mass of snapshots, you can either do it manually from the GUI, or scripted. The script supplied below will include shell functions to capture Quiesce snapshots, and it that fails, normal snapshots of every running VM on the system.

Reason: NetApp SnapMirror, or other backup (maybe for later export) scheduled actions.

#!/bin/bash
# This script will supply functions for snapshotting and snapshot destroy including disks
# Written by Ez-Aton
# Visit my web blog for more stuff, at http://run.tournament.org.il
 
# Global variables:
UUID_LIST_FILE=/tmp/SNAP_UUIDS.txt
 
# Function
function assign_all_uuids () {
	# Construct artificial non-indexed list with name (removing annoying characters) and UUID
	LIST=""
	for UUID in `xe vm-list power-state=running is-control-domain=false | grep uuid | awk '{print $NF}'`
	do
		NAME=`xe vm-param-get param-name=name-label uuid=$UUID | tr ' ' _ | tr -d '(' | tr -d ')'`
		LIST="$LIST $NAME:$UUID"
	done
	echo $LIST
}
 
function take_snap_quiesce () {
	# We attempt to take a snapshot with quench
	# Arguments: $1 name ; $2 uuid
	# We attempt to snapshot the machine and set the value of snap_uuid to the snapshot uuid, if successful.
	# Return 1 if failed
 
	if SNAP_UUID=`xe vm-snapshot-with-quiesce vm=$2 new-name-label=${1}_snapshot`
	then
		# echo "Snapshot-with-quiesce for $1 successful"
		return 0
	else
		echo "Snapshot-with-quiesce for $1 failed"
		return 1
	fi
}
 
function take_snap () {
	# We attempt to take a snapshot
	# Arguments: $1 name ; $2 uuid
	# We attempt to snapshot the machine and set the value of snap_uuid to the snapshot uuid, if successful.
	# Return 1 if failed
 
	if SNAP_UUID=`xe vm-snapshot vm=$2 new-name-label=${1}_snapshot`
	then
		#echo "Snapshot for $1 successful"
		echo $SNAP_UUID
		return 0
	else
		echo "Snapshot-with-quiesce for $1 failed"
		return 1
	fi
}
 
function stop_ha_template () {
	# Templates inherit their settings from the origin
	# We need to turn off HA
	# $1 : Template UUID
	if [ -z "$1" ]
	then
		echo "Missing template UUID"
		return 1
	fi
	xe template-param-set ha-always-run=false uuid=$1
}
 
function get_vdi () {
	# This function will get a space delimited list of VDI UUIDs of a given snapshot/template UUID
	# Arguments: $1 template UUID
	# It will also verify that each VBD is an actual snapshot
	if [ -z "$1" ]
	then
		echo "No arguments? We need the template UUID"
		return 1
	fi
	VDIS=""
	for VBD in `xe vbd-list vm-uuid=$1 | grep ^uuid | awk '{print $NF}'`
	do
		echo "VBD: $VBD"
		if [ ! `xe vbd-param-get param-name=type uuid=$VBD` = "CD" ]
		then
			CUR_VDI=`xe vdi-list vbd-uuids=$VBD | grep ^uuid | awk '{print $NF}'`
			if `xe vdi-param-get uuid=$CUR_VDI param-name=is-a-snapshot`
			then
				VDIS="$VDIS $CUR_VDI"
			else
				echo "VDI is not a snapshot!"
				return 1
			fi
			CUR_VDI=""
		fi
	done
	echo $VDIS
}
 
function remove_vdi () {
	# This function will get a list of VDIs and remove them
	# Carefull!
	for VDI in $@
	do
		if xe vdi-destroy uuid=$VDI
		then
			echo "Success in removing VDI $VDI"
		else
			echo "Failure in removing VDI $VDI"
			return 1
		fi
	done
}
 
function remove_template () {
	# This funciton will remove a template
	# $1 template UUID
	if [ -z "$1" ]
	then
		echo "Required UUID"
		return 1
	fi
	xe template-param-set is-a-template=false uuid=$1
	if ! xe vm-uninstall force=true uuid=$1
	then
		echo "Failure to remove VM/Template"
		return 1
	fi
}
 
function remove_all_template () {
	# This function will completely remove a template
	# The steps are as follow:
	# $1 is the UUID of the template
	# Calculate its VDIs
	# Remove the template
	# Remove the VDIs
	if [ -z "$1" ]
	then
		echo "No Template UUID was supplied"
		return 1
	fi
	# We now collect the value of $VDIS
	get_vdi $1
	if [ "$?" -ne "0" ]
	then
		echo "Failed to get VDIs for Template $1"
		return 1
	fi
	if ! remove_template $1
	then
		echo "Failure to remove template $1"
		return 1
	fi
	if ! remove_vdi $VDIS
	then
		return 1
	fi
}
 
function create_all_snapshots () {
	# In this function we will run all over $LIST and create snapshots of each machine, keeping the UUID of it inside a file
	# $@ - list of machines in the $LIST format
	if [ -f $UUID_LIST_FILE ]
	then
		mv $UUID_LIST_FILE $UUID_LIST_FILE.$$
	fi
	for i in $@
	do
		SNAP_UUID=`take_snap_quiesce ${i%%:*} ${i##*:}`
		if [ "$?" -ne "0" ]
		then
			echo "Problem taking snapshot with quiesce for ${i%%:*}"
			echo "Attempting normal snapshot"
			SNAP_UUID=`take_snap ${i%%:*} ${i##*:}`
			if [ "$?" -ne "0" ]
                	then
                        	echo "Problem taking snapshot for ${i%%:*}"
				SNAP_UUID=""
			fi
		fi
		stop_ha_template $SNAP_UUID
		echo $SNAP_UUID >> $UUID_LIST_FILE
	done
}

Possible use will be like this:

. /usr/local/bin/xen_functions.sh

create_all_snapshots `assign_all_uuids` &> /tmp/snap_create.log

dm-multipath and loss of all paths

Tuesday, May 13th, 2008

dm-multipath is a great tool. Its abilities were proven to me on many occasions, and I’m sure I’m not the only one. NetApp, for example, use it. HP use it as well (a slightly modified version, and still), and it works.

A problem I have encountered is as follow – if a single path fails, the device-mapper continue to work correctly (as expected) and the remaining path becomes active. However – if the last link fails, all processes which require disk access become stale. It means that many tests which search for a given process pass correctly even when this process becomes stale through delayed (forever) access to the filesystem. Also – tests which attempt to write/read a file to/from such a stale filesystem, become stale themselves, which can bring down an entire system (assume we have a cron which creates a file every minute. Every new process becomes stale immediately, so after an hour, we’ll have 60 more processes, and after a day – 1440 additional processes – all stale (D) and waiting for the disk to come back).

Certain detection systems actually fail to auto-detect cases of stale filesystems when using dm-multipath. This is caused by a (default) option called “1 queue_if_no_path”. I discovered that when this option is omitted, such as in the configuration below (only the “device” section):

device
{
vendor “NETAPP”
product “LUN”
getuid_callout “/sbin/scsi_id -g -u -s /block/%n”
prio_callout “/sbin/mpath_prio_ontap /dev/%n”
# features “1 queue_if_no_path”
hardware_handler “0″
path_grouping_policy group_by_prio
failback immediate
rr_weight uniform
rr_min_io 128
path_checker readsector0
}

multiple disk failures will actually result in the filesystem layout reporting I/O errors (which is good). A disk mounted through these options can be mounted with special parameters, such as (for ext2/3): errors=continue ; errors=read-only or errors=panic – my favorite, as it ensured data integrity through self-fencing mechanism.

Linux LVM performace measurement

Sunday, June 10th, 2007

Modern Linux LVM offers great abilities to maintain snapshots of existing logical volumes. Unlike NetApp “Write Anywhere File Layout” (WAFL), Linux LVM uses “Copy-on-Write” (COW) to allow snapshots. The process, in general, can be described in this pdf document.

I have issues several small tests, just to get real-life estimations of what is the actual performance impact such COW method can cause.

Server details:

1. CPU: 2x Xion 2.8GHz

2. Disks: /dev/sda – system disk. Did not touch it; /dev/sdb – used for the LVM; /dev/sdc – used for the LVM

3. Mount: LV is mounted (and remains mounted) on /vmware

Results:

1. No snapshot, Using VG on /dev/sdb only:

# time dd if=/dev/zero of=/vmware/test.2GB bs=1M count=2048
2048+0 records in
2048+0 records out

real 0m16.088s
user 0m0.009s
sys 0m8.756s

2. With snapshot on the same disk (/dev/sdb):

# time dd if=/dev/zero of=/vmware/test.2GB bs=1M count=2048
2048+0 records in
2048+0 records out

real 6m5.185s
user 0m0.008s
sys 0m11.754s

3. With snapshot on 2nd disk (/dev/sdc):

# time dd if=/dev/zero of=/vmware/test.2GB bs=1M count=2048
2048+0 records in
2048+0 records out

real 5m17.604s
user 0m0.004s
sys 0m11.265s

4. Same as before, creating a new empty file on the disk:

# time dd if=/dev/zero of=/vmware/test2.2GB bs=1M count=2048
2048+0 records in
2048+0 records out

real 3m24.804s
user 0m0.006s
sys 0m11.907s

5. Removed the snapshot. Created a 3rd file:

A note about Startup/Shutdown scripts – dbora

Thursday, January 11th, 2007

Per the last post in this thread, I have created a startup script to an Oracle setup I’ve had.

The script is rather simple – you “su -” to the Oracle user, and you just start the DB. Same goes for shutting it down. I have tested it and it worked well.

Due to the customer’s demands, we’ve relocated the DB data to an NFS on NetApp. Afterwards, reboots didn’t go quite will. It seemed that the DB never went down correctly, and that we’ve managed, somehow, to umount the NFS volumes without shutting it down.

After a short investigation, it was revealed that /etc/rc.d/rc, which controls the startup and shutdown of services on a system, checks to verify that there is a lock file with the same name as the startup service. In our case – we should have created an empty file in /var/lock/subsys/ named dbora during the startup of our dbora script.

When changing the script to do it (and to remove the file during shutdown), things started working correcly.

HP-UX – allowed shells, and connecting FC Multipath to NetApp

Thursday, August 10th, 2006

When adding a certain shell to an HP-UX system, for example, /usr/bin/tcsh, each user set to use this shell will not be able to FTP to the machine, until there is entry in /etc/shells. The trick is that even if the file doesn’t exist, you have to create it. By default, HP-UX allows only /sbin/sh and /bin/sh shells, but as soon as you setup this file, you can allow more shells. Mind you that you have to include /sbin/sh and /bin/sh in /etc/shells, else other things might not work correctly. Taken from here.

Connecting HP-UX to SAN storage is never too simple. The actual list of actions is:

1. Install HP-UX drivers for the FC adapter

2. Map the PWWN obtained from (reading the sticker at the back of the machine, or querying the storage/SAN switch) the machine to the relevant LUNs.

3. Run “/usr/sbin/ioscan -fnC disk” and see that the new disk devices are detected.

4. Run “/usr/sbin/ioinit -i” to create the relevant device files.

A note – HP-UX might require a reboot after the initial connection. On several cases I’ve noticed that if the server was running for a while with disconnected fiber, only being connected during before startup would result in link and in SAN registration. Of course, the driver must be installed then.

If you are to connect your HP-UX to NetApp device, as we did, take a day (or more) notice and open “now” account in http://now.netapp.com. You can find documentation about HP-UX (including step-by-step), you can find the “SAN Attach Kit for HP-UX” which will make your life easier, and set of best-practice guides. Just follow these guides, and you will find it easy and simple task to do.

Ontap Simulator, and some insights about NetApp

Tuesday, May 9th, 2006

First and foremost – the Ontap simulator, a great tool which surely can assist in learning NetApp interface and utilization, lacks in performance. It has some built-in limitations – No FCP, no disks (virtual disks) larger than 1GB (per my trial-and-error. I might find out I was wrong somehow, and put in on this website), and low performance. I’ve got about 300KB/s transfer rate both on iSCSI and on NFS. To make sure it was not due to some network hog hiding somewhere on my net(s), I’ve even tried it from the host of the simulator itself, but to no avail. Low performance. Don’t try to use it as your own home iSCSI Target. Better just use Linux for this purpose, with the drivers obtained from here (It’s one of my next steps into “shared storage(s) for all”).

Another issue – After much reading through NetApp documentation, I’ve reached the following concepts of the product. Please correct me if you see fit:

The older method was to create a volume (vol create) directly from disks. Either using raid_dp or raid4.

The current method is to create aggregations (aggr create) from disks. Each aggregate consists of raid groups. A raid group (rg) can be made up of up to eight physical disks. Each group of disks (an rg) has one or two parity disks, depending on the type of raid (raid 4 uses one parity, and raid_dp uses “double parity”, as its name can suggest).

Actually, I can assume that each aggregation is formatted using the WAFL filesystem, which leads to the conclusion that modern (flex) volumes are logical “chunks” of this whole WAFL layout. In the past, each volume was a separated WAFL formatted unit, and each size change required adding disks.

This separation of the flex volume from the aggregation suggests to me the possibility of multiple-root capable WAFL. It can explain the lack of requirement for a continuous space on the aggregation. This eases the space management, and allows for fast and easy “cloning” of volumes.

I believe that the new “clone” method is based on the WAFL built-in snapshot capabilities. Although WAFL Snapshots are supposed to be space conservatives, they require a guaranteed space on the aggregation prior to committing the clone itself. If the aggregation is too crowded, they will fail with the error message “not enough space”. If there is enough for snapshots, but not enough to guarantee a full clone, you’ll get a message saying “space not guaranteed”.

I see the flex volumes as some combination between filesystem (WAFL) and LVM, living together on the same level.

LUNs on NetApp: iSCSI and/or Fibre LUNs are actually managed as a single (per-LUN) large file contained within a volume. This file has special permissions (I was not able to copy it or modify it while it was online and I had root permissions. However, I am rather new to NetApp technology), and it is being exported as a disk outside. Much like an ISO image (which is a large file containing a whole filesystem layout) these files contain a whole disk layout, including partition tables, LVM headers, etc – just like a real disk.

Thinking about it, it’s neither impossible nor very surprising. A disk is no more than a container of data, of blocks, and if you can utilize the required communication protocol used for accessing it and managing its blocks (aka, the transport layer on which filesystem can access the block data), you can, with just a little translation interface, set up a virtual disk which will behave just like any regular disk.

This brings us to the advantages of NetApp’s WAFL – the ability to minimize I/O while maintaining a set of snapshots for the system – a list of per-block modification history. It means you can “snapshot” your LUN, being physically no more than a file on a WAFL-based volume, and you can go back with your data to a previous date – an hour, a day, a week. Time travel for your data.

There are, unfortunately, some major side effects. If you’ve read the WAFL description from NetApp, my summary will be inaccurate at best. If you haven’t, it will be enough, but still you are most encouraged to read it. The idea is that this filesystem is made out of multi-layers of pointers, and of blocks. A pointer can point to more than one block. When you commit a snapshot, you do not change the pointers, you do not move data, you just modify the set of pointers. When there is any change in the data (meaning a block is changed), the pointer points to the alternate block instead of the previous (historical) block, but keeps reference of the older block’s location. This way, only modified blocks are actually recreated, while any unmodified data remains on the same spot on the physical disk. An additional claim of NetApp is that their WAFL is optimized for the raid4 and raid_dp they use, and utilizes it in a smart manner.

The problem with WAFL, as can be easily seen, is fragmentation. For CIFS and NFS, it does not cause much of a problem, as the system is very capable of read-ahead just to solve this issue. However, A LUN (which is supposed to act as a continuous layout, just like any hard-drive or raid-array in the world and on which various file-system related operations occur) gets fragmented.

Unlike CIFS or NFS, LUN read-ahead is harder to predict, as the client tries to do just the same. Unlike real disks, NetApp LUNs do not behave, performance-wise, like the hard-drive layout any DB or FS has learned to expect and was best optimized for. It means, for my example, that on a DB with lots of small changes, that the DB itself would have tried to commit changes in large write operations, committed every so and so interval, and would thrive to commit them as close to each other, as continuous as possible. On NetApp LUN this will cause fragmentation, and will result in lower write (and later read) performance.

That’s all for today.

NetApp Ontap 7.1

Thursday, May 4th, 2006

I’ve had the pleasure of playing with Ontap Simulator. This is a marvelous tool, designed to simulate a real NetApp appliance, in an easy and affordable manner.

I’ve noticed a link to this simulator, in Oracle’s web site. I’m posting it here so you’ll get to know what I’m talking about. I’m playing with it on a Linux host. I’ve created virtual disks (small ones only. The simulator does not allow larger than 1G disks anyhow), and I’m playing with NFS, CIFS, Snapshots, etc.

If I have some surprising views on the matter, I will share them here.