Archive for the ‘bash’ Category

NetApp SnapMirror monitor script

Sunday, December 13th, 2009

I have had some work done lately with NetApp SnapMirror. I have snapped-mirrored some volumes and qtrees and I wanted to monitor their use and behavior over the line.

As you can expect, site-to-site replication of data is a fragile thing, especially when done on the level of the storage device, which is agnostic to the data kept on it. When replicating volumes, I should expect the relevant employees to be responsible regarding what’s placed there, because the storage does not filter out the junk. If someone had decided to add a new DVD image on the DB storage space, well – the DB won’t care, as long as there is enough free space, but the storage will attempt to replicate the added data to the alternate site, which means that if you are around your bandwidth limits, which is never a good thing, you will just create a delay gap you would hardly (if at all) be able to close.

For that, and since I don’t tend to trust people not to do stupid things, I have written this script.

What does it do?

This script will perform the following:

Alerting about non-idle SnapMirror session

Use with ‘-m alert’

Assuming SnapMirror is scheduled to a specific time, the script will alert if a session is active. With the flag ‘-a no’, it will not send an e-mail (if possible, see the configuration section below). With ‘-r yes’, it will react, setting throttle for each non-idle session, but then ‘-t VALUE’ should be specified, where VALUE is the numeric throttle in KB/s.

Limiting throttle to a SnapMirror session

Use with ‘-m throttle_limit’

The script will set a throttle for SnapMirror session(s). Setting limit by the flag ‘-t VALUE’, where VALUE is the numeric throttle in KB/s per each session.

Cancelling throttle limit

Use with ‘-m throttle_unlimit’

The script will set unlimited throttle for SnapMirror session(s).

Checking SnapMirror lag

Use with the ‘-m check_lag’

Since replication has a purpose of recovering, the lag of each SnapMirror session would show how far back we are. Use with ‘-d VALUE’, VALUE being numeric time in minutes to set alert threshold. The default threshold delay is one day (1440 minutes).

Checking snapshots size

Use with the ‘-m check_size’

This reports the expected delta to transfer. This can help estimate the success or failure of a future sync of data (snapmirror update) before it begins. Use with ‘-l’ flag to set it to log date/time of measure and the expected sizes into a file. By default, in /tmp/target_name.txt, where the target is the SnapMirror target.

General Options

Use with ‘-c filename’ for alternate configuration file.

Use with ‘-h’ to get general help.

Use with a list target names in the format of storage:/vol/volname/qtree or storage:volname to ignore targets in configuration file and use your own.

Configuration File

The configuration file is rather simple. By default it should be called “/etc/snapmirror_monitor.conf“. It consists of two main variables for the system:

TGTS=”storage2:/vol/volname/qtree

storage3:volname2

storage1:/vol/volnew/qtr2″

EMAIL=”user@domain.com another_user@domain.com”

Prerequisites

This script will run on any modern Linux machine. For it to communicate with the NetApp devices, you will need SSH enabled on the NetApps, and ssh key exchange so that the Linux would be able to access the NetApp without using passwords.

The Script

Below is the script. You can download it and use it as you like.

#!/bin/bash
# This script will monitor snapmirror status
# Assumption: Access through ssh to root on all storage devices involved
# This will also attempt to detect the diff which is to sync
 
# Written by Ez-Aton. Check http://run.tournament.org.il for updates or
# additional information
 
# Modes: 
# alert -> alert if snapmirror is still active
# throttle_limit -> Limit throttle to a given number (default or manually set)
# throttle_unlimit -> Open throttle limitation
# check_lag -> Report the snapmirror lage
# check_size -> Report the estimated data size to move
 
# Global variables
CONF=/etc/snapmirror_monitor.conf
LOG_PREFIX=/tmp
 
test_connection () {
        # Test to see that you can access the storage device
        # Arguments: NetApp name
        SSH_OPTS="-o ConnectTimeout=2"
        if ! ssh $SSH_OPTS $1 hostname &>/dev/null
        then
                echo "Cannot communicate via SSH to $1"
                exit 1
        fi
}
 
abort () {
        # Exit with a predefined error message
        echo $*
        exit 1
}
 
get_arguments () {
        # Get all arguments and define options
        # Argument: $@
        [ -z "$1" ] && set -- -h
        while [ -n "$1" ]
        do
                case "$1" in
                        -m)     shift
                                case "$1" in
                                        alert|throttle_limit|throttle_unlimit|check_lag|check_size)     MODE=$1
                                        ;;
                                        *)      abort "Mode is mandatory. Use -h flag to get list of avialable flags"
                                        ;;
                                esac
                                ;;
                        -a)     shift
                                case "$1" in
                                        [nN][oO])       NOMAIL=1
                                                        ;;
                                        *)              NOMAIL=0
                                                        ;;
                                esac
                                ;;
                        -r)     shift
                                case "$1" in
                                        [yY][eE][sS])   REACT=1
                                                        ;;
                                        *)              REACT=0
                                                        ;;
                                esac
                                ;;
                        -d)     shift
                                declare -i DELAY_TMP
                                DELAY_TMP=$1
                                [ "$DELAY_TMP" != "$1" ] && abort "Delay needs to be a number in minutes"
                                DELAY=$DELAY_TMP
                                ;;
                        -t)     shift
                                declare -i THROTTLE_TMP
                                THROTTLE_TMP=$1
                                [ "$THROTTLE_TMP" != "$1" ] && abort "Throttle needs to be a number"
                                THROTTLE=$THROTTLE_TMP
                                ;;
                        -c)     shift
                                [ -f "$1" ] || abort "Cannot find specified conf file"
                                CONF="$1"
                                ;;
                        -l)     LOG=1
                                ;;
                        -h)     echo "Usage: $0 -m [alert|throttle_limit|throttle_unlimit|check_lag|check_size] (-c CONF_FILE) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Alert if SnapMirror is still running: $0 -m alert [-a no] (-r yes) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Alert and throttle (react): $0 -m alert [-a no] -r yes -t [throttle_in_kb] [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Throttle a running SnapMirror: $0 -m throttle_limit -t throttle_in_kb [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Unlimit SnapMirror throttle: $0 -m throttle_unlimit [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "To check lag: $0 -m check_lag -d delay_in_minutes (-a no) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "To check delta: $0 -m check_size [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                exit 0
                                ;;
                        *)      [ -z "$MODE" ] && abort "$0 mode required"
                                TGTS="$*"
                                ;;
                esac
                shift
        done
}
 
notify () {
        # Send an e-mail notification
        # Arguments: $@ - the subject
        # Contents are empty
        # And yes - one e-mail per event
        mail -s "$@" $EMAIL < /dev/null
}
 
idle () {
        # Check if transaction is idle
        # Arguments: Target name (example: storage:/vol/volname/qtree
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        ssh $NETAPP snapmirror status $1 | tail -1 | grep Idle$ &>/dev/null #Checks if the snapmirror is idle. If so, return true
        return $?
}
 
set_throttle () {
        # Sets throttle for target
        # Arguments: $1 Target name (example: storage:/vol/volname/qtree)
        # Arguments: $2 throttle value (number)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        ssh $NETAPP snapmirror throttle $2 $1
}
 
get_lag () {
        # Gets the lag of snapmirror relationship in minutes
        # Arguments: Target name (example: storage:/vol/volname/qtree)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        LAG=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $4}'`
        # LAG is in hh:mm:ss. We need to transfer it to minutes only
        H=`echo $LAG | cut -f 1 -d :`
        M=`echo $LAG | cut -f 2 -d :`
        let M=$M+$H*60
        echo $M
}
 
check_size () {
        # Checks the size of the snapshot to copy (diff)
        # Arguments: Target name (example: storage:/vol/volname/qtree)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        # Get source storage name and path
        SRC=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $1}'`
        # Get the source filer and vol name from that
        NETAPP=${SRC%%:*}
        SPATH=${SRC##*:}
        SPATH=`echo $SPATH | sed s/'\/vol\/'//`
        SPATH=${SPATH%%/*}
 
        test_connection $NETAPP # Verify the target NetApp is accessible
        SNAP=`ssh $NETAPP snap list -n $SPATH | grep snapmirror | tail -1 | awk '{print $4}'`
        DELTA=`ssh $NETAPP snap delta $SPATH $SNAP | tail -2 | head -1 | awk '{print $5}'`
        echo "Snap delta for $1 is $DELTA KB"  
        LOG_TARGET=`echo $1 | tr / _`.txt
        [ -n "$LOG" ] && echo "`date` $DELTA" >> $LOG_PREFIX/$LOG_TARGET
}
 
 
### MAIN ###
get_arguments $@
. $CONF &>/dev/null
# if e-mail is not set, don't try to send
[ -z "$EMAIL" ] && NOMAIL=1
 
[ -z "$TGTS" ] && abort "You need at least one snapmirror target"
 
case $MODE in
        alert)  if [ "$REACT" == "1" ]
                then
                        [ -z "$THROTTLE" ] && abort "When setting 'react' flag, you must specify throttle"
                fi
                for i in $TGTS
                do
                        if ! idle $i
                        then
                                echo -n "$i is not idle. "
                                [ "$NOMAIL" != "1" ] && notify "$i is not idle"
                                if [ "$REACT" == "1" ]
                                then
                                        echo -n "We are set to react. Limiting throttle"
                                        set_throttle $i $THROTTLE
                                fi
                                echo
                        fi
                done
                ;;
        throttle_limit) [ -z "$THROTTLE" ] && abort "Throttle requires throttle value"
                        for i in $TGTS
                        do
                                echo "Setting throttle for $i to $THROTTLE"
                                set_throttle $i $THROTTLE
                        done
                        ;;
        throttle_unlimit)       for i in $TGTS
                                do
                                        echo "Setting throttle for $i to unlimited"
                                        set_throttle $i 0
                                done
                        ;;
        check_lag)      [ -z "$DELAY" ] && DELAY=1440
                        for i in $TGTS
                        do
                                LAG=`get_lag $i`
                                if [ "$LAG" -gt "$DELAY" ]
                                then
                                        echo "Failure: The delay for $i is $LAG minutes"
                                        [ "$NOMAIL" != "1" ] && notify "$i is lagged $LAG minutes, above the threshold $DELAY"
                                else
                                        echo "Normal: The delay for $i is $LAG minutes"
                                fi
                        done
                        ;;
        check_size)     for i in $TGTS
                        do
                                check_size $i
                        done
                        ;;
        *)      echo "Option $MODE is not implemented yet"
                exit 0
                ;;
esac

XenServer create snapshots for all machines

Friday, August 7th, 2009

XenServer is a wonderful tool. One of the better parts of it is its powerful scripting language, powered by the ‘xe’ command.

In order to capture a mass of snapshots, you can either do it manually from the GUI, or scripted. The script supplied below will include shell functions to capture Quiesce snapshots, and it that fails, normal snapshots of every running VM on the system.

Reason: NetApp SnapMirror, or other backup (maybe for later export) scheduled actions.

#!/bin/bash
# This script will supply functions for snapshotting and snapshot destroy including disks
# Written by Ez-Aton
# Visit my web blog for more stuff, at http://run.tournament.org.il
 
# Global variables:
UUID_LIST_FILE=/tmp/SNAP_UUIDS.txt
 
# Function
function assign_all_uuids () {
	# Construct artificial non-indexed list with name (removing annoying characters) and UUID
	LIST=""
	for UUID in `xe vm-list power-state=running is-control-domain=false | grep uuid | awk '{print $NF}'`
	do
		NAME=`xe vm-param-get param-name=name-label uuid=$UUID | tr ' ' _ | tr -d '(' | tr -d ')'`
		LIST="$LIST $NAME:$UUID"
	done
	echo $LIST
}
 
function take_snap_quiesce () {
	# We attempt to take a snapshot with quench
	# Arguments: $1 name ; $2 uuid
	# We attempt to snapshot the machine and set the value of snap_uuid to the snapshot uuid, if successful.
	# Return 1 if failed
 
	if SNAP_UUID=`xe vm-snapshot-with-quiesce vm=$2 new-name-label=${1}_snapshot`
	then
		# echo "Snapshot-with-quiesce for $1 successful"
		return 0
	else
		echo "Snapshot-with-quiesce for $1 failed"
		return 1
	fi
}
 
function take_snap () {
	# We attempt to take a snapshot
	# Arguments: $1 name ; $2 uuid
	# We attempt to snapshot the machine and set the value of snap_uuid to the snapshot uuid, if successful.
	# Return 1 if failed
 
	if SNAP_UUID=`xe vm-snapshot vm=$2 new-name-label=${1}_snapshot`
	then
		#echo "Snapshot for $1 successful"
		echo $SNAP_UUID
		return 0
	else
		echo "Snapshot-with-quiesce for $1 failed"
		return 1
	fi
}
 
function stop_ha_template () {
	# Templates inherit their settings from the origin
	# We need to turn off HA
	# $1 : Template UUID
	if [ -z "$1" ]
	then
		echo "Missing template UUID"
		return 1
	fi
	xe template-param-set ha-always-run=false uuid=$1
}
 
function get_vdi () {
	# This function will get a space delimited list of VDI UUIDs of a given snapshot/template UUID
	# Arguments: $1 template UUID
	# It will also verify that each VBD is an actual snapshot
	if [ -z "$1" ]
	then
		echo "No arguments? We need the template UUID"
		return 1
	fi
	VDIS=""
	for VBD in `xe vbd-list vm-uuid=$1 | grep ^uuid | awk '{print $NF}'`
	do
		echo "VBD: $VBD"
		if [ ! `xe vbd-param-get param-name=type uuid=$VBD` = "CD" ]
		then
			CUR_VDI=`xe vdi-list vbd-uuids=$VBD | grep ^uuid | awk '{print $NF}'`
			if `xe vdi-param-get uuid=$CUR_VDI param-name=is-a-snapshot`
			then
				VDIS="$VDIS $CUR_VDI"
			else
				echo "VDI is not a snapshot!"
				return 1
			fi
			CUR_VDI=""
		fi
	done
	echo $VDIS
}
 
function remove_vdi () {
	# This function will get a list of VDIs and remove them
	# Carefull!
	for VDI in $@
	do
		if xe vdi-destroy uuid=$VDI
		then
			echo "Success in removing VDI $VDI"
		else
			echo "Failure in removing VDI $VDI"
			return 1
		fi
	done
}
 
function remove_template () {
	# This funciton will remove a template
	# $1 template UUID
	if [ -z "$1" ]
	then
		echo "Required UUID"
		return 1
	fi
	xe template-param-set is-a-template=false uuid=$1
	if ! xe vm-uninstall force=true uuid=$1
	then
		echo "Failure to remove VM/Template"
		return 1
	fi
}
 
function remove_all_template () {
	# This function will completely remove a template
	# The steps are as follow:
	# $1 is the UUID of the template
	# Calculate its VDIs
	# Remove the template
	# Remove the VDIs
	if [ -z "$1" ]
	then
		echo "No Template UUID was supplied"
		return 1
	fi
	# We now collect the value of $VDIS
	get_vdi $1
	if [ "$?" -ne "0" ]
	then
		echo "Failed to get VDIs for Template $1"
		return 1
	fi
	if ! remove_template $1
	then
		echo "Failure to remove template $1"
		return 1
	fi
	if ! remove_vdi $VDIS
	then
		return 1
	fi
}
 
function create_all_snapshots () {
	# In this function we will run all over $LIST and create snapshots of each machine, keeping the UUID of it inside a file
	# $@ - list of machines in the $LIST format
	if [ -f $UUID_LIST_FILE ]
	then
		mv $UUID_LIST_FILE $UUID_LIST_FILE.$$
	fi
	for i in $@
	do
		SNAP_UUID=`take_snap_quiesce ${i%%:*} ${i##*:}`
		if [ "$?" -ne "0" ]
		then
			echo "Problem taking snapshot with quiesce for ${i%%:*}"
			echo "Attempting normal snapshot"
			SNAP_UUID=`take_snap ${i%%:*} ${i##*:}`
			if [ "$?" -ne "0" ]
                	then
                        	echo "Problem taking snapshot for ${i%%:*}"
				SNAP_UUID=""
			fi
		fi
		stop_ha_template $SNAP_UUID
		echo $SNAP_UUID >> $UUID_LIST_FILE
	done
}

Possible use will be like this:

. /usr/local/bin/xen_functions.sh

create_all_snapshots `assign_all_uuids` &> /tmp/snap_create.log

Ad-hoc remote backups to tape

Sunday, July 19th, 2009

I have a nice SCSI tape connected to a single server. This allows for on-demand backups, with the hope (and seldom, with the established knowledge) that I can recover the data I have there.

Old computers, decommissioned computers and systems I wish to erase and reuse are seldom backed-up, just because of the effort in doing it. I will need to manually run something or the other, and who wants this chore?

I know that there are many full-featured backup systems out there, OSS and all, with the capability of doing what I want to do, however, these commonly use backup agents, tape formats and what’s more, just to make a simple one-time backup (which is what I want) – it looked too bloated for my needs.

Again – my needs are: take this machine, run a simple script which can be obtained from an NFS share, wait for X minutes doing something else, and be assured your system is backed up.

I have written the script below to satisfy these requirements. Hope it helps others. Notice the single SSH leading connection and its functionality. It leaves a raw text file on tape with a simple description of the backup process, and the next tracks are the contents of each mount point.

I was a bit spartan with comments, but in general, this script should be quite self-explanatory:

#!/bin/bash
# This script will backup local disk to remote tape
# Written by Ez-Aton - http://run.tournament.org.il/
 
SERVER=kruvi # The name of the server with the direct attached tape
SRV_USER=root
TAPE=/dev/nst0 # Non-rewinding tape. We need to be able to add more tracks and not overwrite our own track
SSH="ssh -o StrictHostKeyChecking=no -o ControlMaster=auto -o ControlPath=~/.ssh/socket-%r@%h:%p"
WORK_FILE=/tmp/work.$$
TAR_LOG=/tmp/backup.log
TAR_ARG="czf - --one-file-system"
 
MOUNTS=`df -TlP | grep -v tmpfs | tail -n +2 | awk '{print $7}'`
# Assume nobody is stupid enough to use white spaces in mount paths
NUM_MOUNTS=`echo $MOUNTS | wc -w`
SUM_FILE=/tmp/summery.txt
 
clean_log () {
        : > $TAR_LOG
}
 
first_disk () {
        # Assume first disk is the first entry in /proc/partitions
        DISK="/dev/`cat /proc/partitions | head -n 3 | tail -n 1 | awk '{print $4}'`"
}
 
create_sum () {
        echo "Creating summery"
        # Collect information and place it in the file. It will be the first track of the tape
        echo "Hostname: `hostname`" > $SUM_FILE
        echo >> $SUM_FILE
        date >> $SUM_FILE
        echo >> $SUM_FILE
        for i in $MOUNTS; do df -h $i | tail -n +2 >> $SUM_FILE ; done
        echo >> $SUM_FILE
        echo "There will be $(($NUM_MOUNTS + 1)) tracks in addition to the first one" >> $SUM_FILE
}
 
create_leading_ssh () {
        # Use a nice trick for giving password only once:
        $SSH -f $SRV_USER@$SERVER 'while true; do sleep 100; done'
        echo "post leading"
}
 
monitor_proc () {
        # Monitor SSH process
        # Run in the background
        touch $WORK_FILE
        PID=`ps aux | grep "$SSH" | grep -v grep | awk '{print $2}'`
        if [ -z "$PID" ]
        then
                echo "Done so soon?"
                return 1
        fi
        while [ -f $WORK_FILE ]
        do
                sleep 10
        done
        kill $PID
}
 
test_tape_cmd () {
        CMD="mt -f $TAPE status"
}
 
remote_tape_append () {
        CMD="cat > $TAPE"
}
 
test_tape () {
        test_tape_cmd
        if ! $SSH $SRV_USER@$SERVER $CMD
        then
                echo "Tape on $SERVER is not ready"
                exit 1
        fi
}
 
backup_mount () {
        # Backup the actual mount
        # $1 - the path of the mount
        remote_tape_append
        if [ -z "$1" ]
        then
                echo "Mount path is empty?"
                exit 1
        fi
        echo "Backing up $1"
        cd "$1"
        tar $TAR_ARG . | $SSH $SRV_USER@$SERVER "$CMD" > $TAR_LOG 2>&1
}
 
append_header () {
        remote_tape_append
        cat $SUM_FILE | $SSH $SRV_USER@$SERVER "$CMD"
}
 
add_mbr () {
        remote_tape_append
        first_disk
        if [ -z "$DISK" ]
        then
                echo "Can't decide on the first boot disk. Exiting now"
                echo "No MBR backup exists"
                exit 0
        fi
        echo "Backing MBR"
        dd if=$DISK bs=1M count=1 | $SSH $SRV_USER@$SERVER "$CMD"
}
 
create_sum
create_leading_ssh
monitor_proc &
test_tape
append_header
for i in $MOUNTS
do
        backup_mount $i
done
add_mbr
rm $WORK_FILE

RedHat Cluster custom Oracle “Agent”/script V1.0

Friday, April 24th, 2009

Working with RH Cluster quite a lot, I have decided to create an online store of customer agents/scripts.

I have not, so far, invested the effort of making these agents accept settings from the cluster.conf file, but this might happen.

Let the library be!

Oracle DB script/agent:

Although I discovered (a bit late) that RH Cluster for Oracle Ent. Linux 5.2 does include oracle DB agent, this script should be good enough for RHEL4 RH Cluster versions as well.

This script only checks that the ’smon’ process is up. Nothing fancy. This script can include, in the future, the ability to check that Oracle responses to SQL queries (meaning – actually working).

#!/bin/bash
#Service script for Oracle DB under RH Cluster
#Written by Ez-Aton
#http://run.tournament.org.il
 
# Global variables
ORACLE_USER=oracle
HOMEDIR=/home/$ORACLE_USER
OVERRIDE_FILE=/var/tmp/oracle_override
REC_LIST="user@domain.com"
 
function override () {
	if [ -f $OVERRIDE_FILE ]
	then
		exit 0
	fi
}
 
function start () {
	su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; sqlplus / as sysdba << EOF
startup
EOF
"
	status
}
 
function stop () {
	su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; sqlplus / as sysdba << EOF
shutdown immediate
EOF
"
	status && return 1 || return 0
}
 
function status () {
	ps -afu $ORACLE_USER | grep -v grep | grep smon
	return $?
}
 
function notify () {
	mail -s "$1 oracle on `hostname`" $REC_LIST < /dev/null
}
 
override
case "$1" in
start)	start
	notify $1
	;;
stop)	stop
#	notify $1
	;;
status)	status
	;;
*)	echo "Usage: $0 start|stop|status"
	;;
esac

I usually place this script (with execution permissions, of course) in /usr/local/sbin and call it as a “script” from the cluster configuration. You will probably be required to alter the first few variable lines to match to your environment.

Listener Agent/script:

The tnslsnr should be started/stopped as well, if we want the $ORACLE_HOME to migrate as well. This is its agent/script:

#!/bin/bash
#Service script for Oracle DB under RH Cluster
#Written by Ez-Aton
#http://run.tournament.org.il
 
ORACLE_USER=oracle
HOMEDIR=/home/$ORACLE_USER
OVERRIDE_FILE=/var/tmp/oracle_override
 
function override () {
if [ -f $OVERRIDE_FILE ]
then
exit 0
fi
}
 
function start () {
su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; lsnrctl start"
status
}
 
function stop () {
su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; lsnrctl stop"
status && return 1 || return 0
}
 
function status () {
su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; lsnrctl status"
}
 
override
case "$1" in
start)    start
;;
stop)    stop
;;
status)    status
;;
*)    echo "Usage: $0 start|stop|status"
;;
esac

Again – place it in /usr/local/sbin and call it from the cluster configuration file as type “script”.

I will add more agents and more resources for RedHat Cluster in the future.

Relocating LVs with snapshots

Monday, February 2nd, 2009

Linux LVM is a wonderful thing. It is scalable, flexible, and truly, almost enterprise-class in every details. It lacks, of course, at IO performance for LVM snapshots, but this can be worked-around in several creative ways (if I haven’t shown here before, I will sometime).

What it can’t do is dealing with a mixture of Stripes, Mirrors and Snapshots in a single logical volume. It cannot allow you to mirror a stripped LV (even if you can follow the requirementes), it cannot allow you to snapshot a mirrored or a stripped volume. You get the idea. A volume you can protect, you cannot snapshot. A volume with snapshots cannot be mirrored or altered.

For the normal user, what you get is usually enough. For storage management per-se, this is just not enough. When I wanted to reduce a VG – remove a disk from an existing volume group,  I had to evacuate it from any existing logical volume. The command to perform this actions is ‘pvmove‘ which is capable of relocating data from within a PV to other PVs. This is done through mirroring each logical volume and then removing the origin.

Mirroring, however, cannot be performed on LVs with snapshots, or on an already mirrored LV, so these require different handling.

We can detect which LVs reside on our physical volume by issuing the following command

pvdisplay -m /dev/sdf1

/dev/sdf1 was only an example. You will see the contents of this PV. So next, performing

pvmove /dev/sdf1

would attempt to relocate every existing LV from this specific PV to any other available PV. We can use this command to change the disk balance and allocations on multi-disk volume groups. This will be discussed on a later post.

Following a ‘pvmove‘ command, all linear volumes are relocated, if space permits, to another PVs. The remaining LVs are either mirrored or LVs with snapshots.

To relocate a mirrored LV, you need to un-mirror it first. To do so, first detect using ‘pvdisplay‘ which LV is belongs to (the name should be easy to follow) and then change it to non-mirrored.

lvconvert -m0 /dev/VolGroup00/test-mirror

This will convert it to be a linear volume instead of a mirror, so you could move it, if it still resides on the PV you are to remove.

Snapshot volumes are more complicated, due to their nature. Since all my snapshots are of a filesystem, I could allow myself to use tar to perform the action.

The steps are as follow:

  1. tar the contents of the snapshot source to nowhere, but save an incremental file
  2. Copy the source incremental file to a new name, and tar the contents of a snapshot according to this copy.
  3. Repeat the previous step for each snapshot.
  4. Remove all snapshots
  5. Relocate the snapshot source using ‘pvmove
  6. Build the snapshots and then recover the data into them

This is a script to do steps 1 to 3. It will not remove LVs, for obvious reasons. This script was not tested, but should work, of course :-)

None of the LVs should be mounted for it to function. It’s better to have harder requirements than to destroy data by double-mounting it, or accessing it while it is being changed.

#!/bin/bash
# Get: VG Base-LV, snapshot name, snapshot name, snapshot name...
# Example:
# ./backup VolGroup00 base snap1 snap2 snap3
# Written by Ez-Aton
 
TARGET=/tmp
if [ "$@" -le 3 ]
then
   echo "Parameters: $0 VG base snap snap snap snap"
   exit 1
fi
VG=$1
BASE=$2
shift 2
 
function check_not_mounted () {
   # Check if partition is mounted
   if mount | grep /dev/mapper/${VG}-${1}
   then
      return 0
   else
      return 1
   fi
}
 
function create_base_diff () {
   # This function will create the diff file for the base
   mount /dev/${VG}/${BASE} $MNT
   if [ $? -ne 0 ]
   then
      echo "Failed to mount base"
      exit 1
   fi
   cd $MNT
   tar -g $TARGET/${BASE}.tar.gz.diff -czf - . &gt; /dev/null
   cd -
   umount $MNT
}
 
function create_snap_diff () {
   mount /dev/${VG}/${1} $MNT
   if [ $? -ne 0 ]
   then
      echo "Failed to mount base"
      exit 1
   fi
   cp $TARGET/${BASE}.tar.gz.diff $TARGET/$1.tar.gz.diff
   cd $MNT
   tar -g $TARGET/${1}.tar.gz.diff -czf $TARGET/${1}.tar.gz .
   cd -
   umount $MNT
}
 
function create_mount () {
   # Creates a temporary mount point
   if [ ! -d /mnt/$$ ]
   then
      mkdir /mnt/$$
   fi
   MNT=/mnt/$$
}
 
create_mount
if check_not_mounted $BASE
then
   create_base_diff
else
   echo "$BASE is mounted. Exiting now"
   exit 1
fi
for i in $@
do
   if check_not_mounted $i
   then
      create_snap_diff $i
   else
      echo "$i is mounted! I will not touch it!"
   fi
done

The remaining steps should be rather easy – just mount the newly created snapshots and restore the tar file on them.

Protect Vmware guest under RedHat Cluster

Monday, November 17th, 2008

Most documentation on the net is about how to run a cluster-in-a-box under Vmware. Very few seem to care about protecting Vmware guests under real RedHat cluster with a shared storage.

This article is just about it. While I would not recommend using Vmware in such a setup, it has been the case, and that Vmware guest actually resides on the shared storage. To relocate it is out of the question, so migrating it together with other resources is the only valid option.

To do so, I have created a simple script which will accept start/stop/status arguments. The Vmware guest VMX is hard-coded into the script, but in an easy-to-change format. This script will attempt to freeze the Vmware guest, and only if it fails, to shut it down. Mind you that the blog’s HTML formatting might alter quotation marks into UTF-8 marks which will not be understood well by shell.

#!/bin/bash
# This script will start/stop/status VMware machine
# Written by Ez-Aton
# http://www.tournament.org.il/run
 
# Hardcoded. Change to match your own settings!
VMWARE="/export/vmware/hosts/Windows_XP_Professional/Windows XP Professional.vmx"
VMRUN="/usr/bin/vmrun"
TIMEOUT=60
 
function status () {
  # This function will return success if the VM is up
  $VMRUN list | grep "$VMWARE" &amp;&gt;/dev/null
  if [[ "$?" -eq "0" ]]
  then
    echo "VM is up"
    return 0
  else
    echo "VM is down"
    return 1
  fi
}
 
function start () {
  # This function will start the VM
  $VMRUN start "$VMWARE"
  if [[ "$?" -eq "0" ]]
  then
    echo "VM is starting"
    return 0
  else
    echo "VM failed"
    return 1
  fi
}
 
function stop () {
  # This function will stop the VM
  $VMRUN suspend "$VMWARE"
  for i in `seq 1 $TIMEOUT`
  do
    if status
    then
      echo
    else
      echo "VM Stopped"
      return 0
    fi
    sleep 1
  done
  $VMRUN stop "$VMWARE" soft
}
 
case "$1" in
start)     start
        ;;
stop)      stop
        ;;
status)   status
        ;;
esac
RET=$?
 
exit $RET

Since the formatting is killed by the blog, you can find the script here: vmware1

I intend on building a “real” RedHat Cluster agent script, but this should do for the time being.

Enjoy!

Correction of a small but annoying error

Thursday, April 19th, 2007

For some reason (probably a typo) I’ve missed an important character in an example I gave here, but I have just recently fixed it. Anyhow, to clarify this, here is the extended description of the correction.

The $IFS Bash system variable defines what is the default separator between strings. Changing it can help when dealing with, for example, file names with spaces in them, variables which should be considered one unit, but are separated by semicolon, etc.

To change the default string separator from "space or tab or new-line" to new-line only. you need to set, in Bash the following parameter:

IFS=$’\n’

Bash – Handeling children and termination signals

Wednesday, March 21st, 2007

First and unrelated – this is my birthday. It reminds me that another year passed, and generally speaking, I do not take this too well…

Due to massive SPAM attacks, my commenting system is turned off for a while now, and I need to see how I can re-enable it safely.

Bash – here we go.

When you want a single script to spawn several commands in parallel, the best way is to use the ampersand at the end of each command, example:

/usr/bin/find / -name 123 &

/bin/grep -r abc / &

etc.

If you do not want the output from these commands to mix together, you would probably wish to redirect it to a file, for example (redirecting all outputs):

/usr/bin/find / -name 123 &>/tmp/find.out &

/bin/grep -r abc / &>/tmp/grep.out &

You can later “cat” the two files in your own desired order.

This adds two interesting issues – the first is about how you can tell that both commands finished. There are several methods, such as collecting their PIDs, and looping with “sleep” until they are no longer there. Alternate, and more elegant method is by using “wait“. This command will wait for both commands (in our example. As many commands as you have forked to the background) to finish, and only then continue. So we can add, in our example, the following lines:

wait

cat /tmp/find.out

cat /tmp/grep.out

This will insure that both outputs are not mixed together, and are readable.

The second issue caused by the output redirection we’ve added earlier is the handling of killing these commands. Let’s assume that our script is time-limited, and if it exceeds its given time limits, it gets killed. In this case, this script will be killed, however, its children will not die, and will become owned by init, PID 1. This will keep these commands running. Try to assume, for that matter, that every 10 minutes we run the main script, and that it is limited to these ten minutes. We might kill the system’s I/O performance since we might reach a case where several “find” commands are running in parallel – each invoked by our main script at a different time.

To handle such case, we can use the command “trap“. It allows us to handle signals in a method we desire. notice that if you capture SIGTERM (kill -15 – the default kill) and misuse it, the only method of stopping the main script will be by invoking SIGKILL (kill -9) on it, which bypasses all trap directives.

In our example, let’s add this (assume we are aware of each PID)

trap “kill $PID1 $PID2 ; exit 0″ SIGTERM

So we can sum up our example script to be like this:

/usr/bin/find / -name 123 &>/tmp/find.out &

PID1=$!

/bin/grep -r abc / &>/tmp/grep.out &

PID2=$!

trap “kill $PID1 $PID2 ; exit 0″ SIGTERM

wait

cat /tmp/find.out

cat /tmp/grep.out

This wraps it up. Hope it helps.

Bash – Variable indirection – Using variable contents as a(nother) variable name

Tuesday, March 20th, 2007

This was a tricky action. Assume I have a list of variables, obtained by an external source:

var1=a

var2=b

var3=c

I cannot use loop and in it the phrase ${var$i} (where i is the integer counter). It just doesn’t work. I used this instead to assign the values to an array:

var[$i]=$(eval echo "\${var${i}}")

That way, I was able to loop through these values later easily.

So… we can use assigned var names inside a var if we do it right: $(eval echo "\${var${i}}")

Bash – strings with multiple words inside them

Thursday, December 21st, 2006

Let’s assume we have a file containing lines such as this:

first last

one two

three four five

If we write a simple script to deal with each line in a turn, we would write something like this:

for i in `cat file`; do echo $i; done

This would echo, however, each word at its turn. If we want the whole line in this echo, we need to set BASH special variable: IFS.

Example:

export IFS=$’\n’ ; for i in `cat file`; do echo $i; done

This would do the trick. We define the Internal Field Separator to be newline only, and not newline, spaces and tabs, as the default goes.

The credit for this piece of information I can easily give to this site. Thanks, guys.