ZFS with Redhat Cluster Suite

This is a very nice project I have been working on. The hardware at hand – two servers, with a shared SAS bus containing several SAS disks. Since it’s a shared bus, no RAID solution would cut it, and as I don’t want to waste disks with ASM (“normal” redundancy meaning half the size…), I went to ZFS storage.

ZFS is a wonderful technology, with many advantages, but with some dangerous pitfalls. As I prefer Linux, I did not bother with any Sloaris solutions, and went directly to Centos 6. I will describe my cluster setup below.

I will disclose the entire setup, including hardware layout, Linux platform, ZFS module parameters, the Redhat Cluster Suite ZFS agent I wrote and the cluster.conf configuration file. I will also share my considerations regarding some of the choices I made. In addition, this system was designed to act as NFS storage for Citrix XenServer pool, so I will have to describe the changed I had to perform on the XenServer itself (which might make it unsupported, but I will have to live with it), to allow it to handle the timeouts resulting by server failover.

So first – the servers – each having a single CPU (quad core), 24GB RAM, and dual 1Gb/s NICs. Also – a tiny internal SATA disk is used for the OS. The shared disks – at the moment, 10 SAS disks, dual port (notice – older HP disks might mark in a very small letters that they are only a single-port SAS disks…), 72GB, 10K RPM. Zpool called ‘share’ with two 5 disks RaidZ1 vdevs. As I mentioned before – ZFS seemed like the best possible option allowing me to achieve my goals at minimal cost.

When I came to this project, I wanted to be able to use a native ZFS cluster agent, and not a ‘script’ agent, which takes a very long time to respond (30 seconds). Also – I wanted to be able to handle multiple storage pools concurrently – each floating on its own. While I have only one at the moment, I wanted the ability to have a fine-grained control over multiple pools. In addition – I am unable (or unwilling?) to handle the multiple filesystems introduced with each pool. I wanted to be able to import or export the pool silently, and with a clear head, thus I had to verify that the multiple filesystems are not in use as part of the export process.

As an agent, I wanted to comply with Redhat Cluster Suite (RHCS from now on) OCF syntax. I used the supplied fs.sh script as an inspiration for my agent script, so some of it might look familiar. All credit goes to the original authors, of course.

The operating system I selected was Centos 6. Centos is based on Redhat Linux, and I find it mature and stable, which is exactly what I want when I plan a production-ready, enterprise-class storage solution. The version had to be x86_64, due to ZFS requirements, and due to the amount of RAM in the server.

To handle ZFS options, I added a file called /etc/modprobe.d/zfs.conf, with the following content

install zfs /bin/rm -f /etc/zfs/zpool.cache && /sbin/modprobe –ignore-install zfs
options zfs zfs_arc_max=12593790976
options zfs zfs_arc_min=12593790976

I had to verify there is no zpool.cache file. Since my pool was rather small (planned for 24 disks max), I was not concerned by the longer import process caused by not having the zpool.cache file. I was more concerned with automatic import process which might happen, and had to prevent it at almost any cost. In addition, I learned from other systems that the arc memory should never exceed half the RAM, and it should be given just a little under that.

Of course, when changing such module settings, you need to recreate initrd (dracut -f) to be on the safe side later on.

The zfs.sh agent script was placed in /usr/share/cluster directory. You must have rgmanager installed for this directory to exist, and anyhow, without rgmanager, you will have no cluster whatsoever.

This is the contents of the zfs.sh file. Notice that it is not compatible with Luci, so if you’re using it – them kids won’t play well together.

#!/bin/bash
 
LC_ALL=C
LANG=C
PATH=/bin:/sbin:/usr/bin:/usr/sbin
export LC_ALL LANG PATH
# Private return codes
FAIL=2
NO=1
YES=0
YES_STR="yes"
 
. $(dirname $0)/ocf-shellfuncs
 
meta_data()
{
    cat <
 
    1.0
 
	This script will import and export ZFS storage pools
	It will make sure to mount and umount all child filesystems
 
        This is a ZFS pool
 
                Symbolic name for this zfs pool
 
                File System Name
 
		ZFS Pool name or ID
 
                ZFS pool name
 
		ZFS Pool alternate mount
 
                ZFS pool alternate mount
 
                If set, the cluster will kill all processes using 
                this file system when the resource group is 
                stopped.  Otherwise, the unmount will fail, and
                the resource group will be restarted.
 
                Force Unmount
 
                If set and unmounting the file system fails, the node will
                immediately reboot.  Generally, this is used in conjunction
                with force-unmount support, but it is not required.
 
                Seppuku Unmount
 
	<!-- Note: active monitoring is constant and supplants all              check depths -->
        <!-- Checks to see if we can read from the mountpoint -->
 
        <!-- Checks to see if we can write to the mountpoint (if !ROFS) -->
 
EOT
}
 
ocf_log()
{
        echo $*
}
 
verify_driver() {
	ocf_log info "Verifying ZFS driver"
	lsmod | grep -w zfs &gt; /dev/null 2&gt;&amp;1 &amp;&amp; return 0
	ocf_log err "ZFS driver is not loaded"
	return $OCF_ERR_ARGS
}
 
verify_poolname() {
	ocf_log info "Verify pool name "
	if [ -z "$OCF_RESKEY_pool" ]
	then
		ocf_log err "Missing pool name"
		return $OCF_ERR_ARGS
	fi
	zpool import | grep pool: | grep -w $OCF_RESKEY_pool &gt; /dev/null 2&gt;&amp;1 &amp;&amp; return 0
	ocf_log err "Cannot identify pool name"
	return $OCF_ERR_ARGS
}
 
verify_mounted_poolname() {
	ocf_log info "Verify pool name "
	if [ -z "$OCF_RESKEY_pool" ]
	then
		ocf_log err "Missing pool name"
		return $OCF_ERR_ARGS
	fi
	zpool list $OCF_RESKEY_pool &gt; /dev/null 2&gt;&amp;1 &amp;&amp; return 0
	ocf_log err "Cannot identify pool name"
	return $OCF_ERR_ARGS
}
 
verify_mountpath() {
	ocf_log info "Verifying alternate root mount path"
	[ -z "$OCF_RESKEY_mount" ] &amp;&amp; return 0
	declare mp="${OCF_RESKEY_mount}"
	case "$mp" in
		/*)    	# found it
                	;;
        	*)      # invalid format
			ocf_log err \
"verify_mountpath: Invalid mount point format (must begin with a '/'): \'$mp\'"
                return $OCF_ERR_ARGS
                ;;
        esac
}
 
pool_import() {
	ocf_log info "Importing pool"
	OPTS=""
	[ -n "$OCF_RESKEY_mount" ] &amp;&amp; OPTS="-R $OCF_RESKEY_mount"
	zpool import $OCF_RESKEY_pool $OPTS
	RET="$?"
	if [ "$RET" -ne "0" ]
	then
		ocf_log info "Cannot import without applying force"
		zpool import -f $OCF_RESKEY_pool $OPTS
		RET="$?"
	fi
	if [ "$RET" -ne "0" ]
	then
		ocf_log err "Pool import failed for $OCF_RESKEY_pool. error=$RET"
		return 1
	fi
	ocf_log info "Imported ZFS pool"
	return $RET
}
 
check_and_release_fs() {
	ocf_log info "Checking and releasing FS"
	FS=""
	case ${OCF_RESKEY_force_unmount} in
        $YES_STR|on|true|1)	force_umount=$YES ;;
        *)		        force_umount="" ;;
        esac
 
	RET=0
	for i in `zfs list -t filesystem | grep ^${OCF_RESKEY_pool} | awk '{print $NF}'`
	do
		# To be on the safe side. Why not?
		sleep 1
		# Is it mounted?
		if ! df -l | grep -w "$i" &gt; /dev/null 2&gt;&amp;1
		then
			ocf_log info "Filesystem $i is not mounted"
			continue
		fi 	
		if [ `lsof $i | wc -l` -gt "0" ]
		then
			ocf_log info "Filesystem $i is in use"
			if [ "$force_umount" ]
			then
				ocf_log info "Attempting to kill processes on $i filesystem"
				fuser -k $i
				sleep 2
				if [ `lsof $i | wc -l` -gt "0" ]
				then
					ocf_log err "Cannot umount filesystem $i - filesystem in use"
					return 1
				fi
			else
				ocf_log err "Cannot umount filesystem $i
 - filesystem in use"
                                return 1
			fi
		fi
	done
	return $RET	
}
 
self_fence() {
	ocf_log info "Should we validate and call self-fence?"
	case ${OCF_RESKEY_self_fence} in
		$YES_STR|on|true|1)       self_fence=$YES ;;
       		*)              self_fence="" ;;
        esac	
 
	if [ "$self_fence" ]; then
		ocf_log alert "umount failed - REBOOTING"
               	sync
                reboot -fn
	fi
	return $OCF_ERR_GENERIC
}
 
pool_export() {
	ocf_log info "Exporting zfs pool"
	check_and_release_fs || self_fence
	zpool export $OCF_RESKEY_pool
	RET="$?"
	if [ "$RET" -ne "0" ]
	then
		ocf_log err "Pool export failed for $OCF_RESKEY_pool. error=$RET"
		return 1
	fi
	return $RET
}
 
start() {
	ocf_log info "Starting ZFS"
	verify_driver || return $OCF_ERR_ARGS 
	verify_poolname || return $OCF_ERR_ARGS
	verify_mountpath || return $OCF_ERR_ARGS
	pool_import
	# Handle filesystem?
}
 
stop() {
	ocf_log info "Starting ZFS"
	verify_driver || return $OCF_ERR_ARGS 
	verify_mounted_poolname || return $OCF_ERR_ARGS
	verify_mountpath || return $OCF_ERR_ARGS
	# Handle filesystem?
	pool_export
}
 
is_imported() {
	ocf_log debug "Checking if $OCF_RESKEY_pool is imported"
	zpool list ${OCF_RESKEY_pool} &gt; /dev/null 2&gt;&amp;1
	return $?
}
 
is_alive() {
	ocf_log debug "Checking ZFS pool read/write"
	declare file=".writable_test.$(hostname)"
	declare TIMEOUT="10s"
	[ -z "$OCF_CHECK_LEVEL" ] &amp;&amp; export OCF_CHECK_LEVEL=0
	mount_point=`zfs list ${OCF_RESKEY_pool} | grep ${OCF_RESKEY_pool} | awk '{print $NF}'`
	test -d "$mount_point"
        if [ $? -ne 0 ]; then
                ocf_log err "${OCF_RESOURCE_INSTANCE}: is_alive: $mount_point is not a directory"
                return $FAIL
        fi
	[ $OCF_CHECK_LEVEL -lt 10 ] &amp;&amp; return $YES
 
        # depth 10 test (read test)
        timeout -s 9 $TIMEOUT ls "$mount_point" &gt; /dev/null 2&gt; /dev/null
        errcode=$?
        if [ $errcode -ne 0 ]; then
                ocf_log err "${OCF_RESOURCE_INSTANCE}: is_alive: failed read test on [$mount_point]. Return code: $errcode"
                return $NO
        fi
 
	[ $OCF_CHECK_LEVEL -lt 20 ] &amp;&amp; return $YES
 
        # depth 20 check (write test)
        rw=$YES
        for o in `echo $OCF_RESKEY_options | sed -e s/,/\ /g`; do
                if [ "$o" = "ro" ]; then
                        rw=$NO
                fi
        done
	if [ $rw -eq $YES ]; then
                file="$mount_point"/$file
                while true; do
                        if [ -e "$file" ]; then
                                file=${file}_tmp
                                continue
                        else
                                break
                        fi
                done
                timeout -s 9 $TIMEOUT touch $file &gt; /dev/null 2&gt; /dev/null
                errcode=$?
                if [ $errcode -ne 0 ]; then
                        ocf_log err "${OCF_RESOURCE_INSTANCE}: is_alive: failed write test on [$mount_point]. Return code: $errcode"
                        return $NO
                fi
                rm -f $file &gt; /dev/null 2&gt; /dev/null
        fi
 
	return $YES
}
 
monitor() {
	ocf_log debug "Checking ZFS pool $OCF_RESKEY_pool, Level $OCF_CHECK_LEVEL"
	verify_driver || return $OCF_ERR_ARGS 
	is_imported
	RET=$?
	if [ "$RET" -ne $YES ]; then
                ocf_log err "${OCF_RESOURCE_INSTANCE}: ${OCF_RESKEY_device} is not mounted on ${OCF_RESKEY_mountpoint}"
                return $OCF_NOT_RUNNING
        fi
	is_alive
	return $RET
}
 
if [ -z "$OCF_CHECK_LEVEL" ]; then
	OCF_CHECK_LEVEL=0
fi
 
case $1 in
start)
	ocf_log info "zfs start $OCF_RESKEY_pool\n"
	OCF_CHECK_LEVEL=0
	monitor
	[ "$?" -ne "0" ] &amp;&amp; start || ocf_log info "$OCF_RESKEY_pool is already mounted"
	exit $?
	;;
stop)
	ocf_log info "zfs stop $OCF_RESKEY_pool\n"
	OCF_CHECK_LEVEL=0
	monitor
	[ "$?" -eq "0" ] &amp;&amp; stop || ocf_log info "$OCF_RESKEY_pool is not mounted"
	exit $?
	;;
status|monitor)
	ocf_log debug "ZFS monitor $OCF_RESKEY_pool"
	monitor
	exit $?
	;;
meta-data)
	echo -e "zfs metadat $OCF_RESKEY_address\n" &gt;&gt;/tmp/out
	meta_data
	exit 0
	;;
validate-all)
	exit 0
	;;
*)
	echo "usage: $0 {start|stop|status|monitor|restart|meta-data|validate-all}"
	exit $OCF_ERR_UNIMPLEMENTED
	;;
esac

 

All I had to do now was to build the cluster.conf file.

This is a very nice project I have been working on. The hardware at hand – two servers, with a shared SAS bus containing several SAS disks. Since it’s a shared bus, no RAID solution would cut it, and as I don’t want to waste disks with ASM (“normal” redundancy meaning half the size…), I went to ZFS storage.

ZFS is a wonderful technology, with many advantages, but with some dangerous pitfalls. As I prefer Linux, I did not bother with any Sloaris solutions, and went directly to Centos 6. I will describe my cluster setup below.

I will disclose the entire setup, including hardware layout, Linux platform, ZFS module parameters, the Redhat Cluster Suite ZFS agent I wrote and the cluster.conf configuration file. I will also share my considerations regarding some of the choices I made. In addition, this system was designed to act as NFS storage for Citrix XenServer pool, so I will have to describe the changed I had to perform on the XenServer itself (which might make it unsupported, but I will have to live with it), to allow it to handle the timeouts resulting by server failover.

So first – the servers – each having a single CPU (quad core), 24GB RAM, and dual 1Gb/s NICs. Also – a tiny internal SATA disk is used for the OS. The shared disks – at the moment, 10 SAS disks, dual port (notice – older HP disks might mark in a very small letters that they are only a single-port SAS disks…), 72GB, 10K RPM. Zpool called ‘share’ with two 5 disks RaidZ1 vdevs. As I mentioned before – ZFS seemed like the best possible option allowing me to achieve my goals at minimal cost.

When I came to this project, I wanted to be able to use a native ZFS cluster agent, and not a ‘script’ agent, which takes a very long time to respond (30 seconds). Also – I wanted to be able to handle multiple storage pools concurrently – each floating on its own. While I have only one at the moment, I wanted the ability to have a fine-grained control over multiple pools. In addition – I am unable (or unwilling?) to handle the multiple filesystems introduced with each pool. I wanted to be able to import or export the pool silently, and with a clear head, thus I had to verify that the multiple filesystems are not in use as part of the export process.

As an agent, I wanted to comply with Redhat Cluster Suite (RHCS from now on) OCF syntax. I used the supplied fs.sh script as an inspiration for my agent script, so some of it might look familiar. All credit goes to the original authors, of course.

The operating system I selected was Centos 6. Centos is based on Redhat Linux, and I find it mature and stable, which is exactly what I want when I plan a production-ready, enterprise-class storage solution. The version had to be x86_64, due to ZFS requirements, and due to the amount of RAM in the server.

To handle ZFS options, I added a file called /etc/modprobe.d/zfs.conf, with the following content

install zfs /bin/rm -f /etc/zfs/zpool.cache && /sbin/modprobe –ignore-install zfs
options zfs zfs_arc_max=12593790976
options zfs zfs_arc_min=12593790975

I had to verify there is no zpool.cache file. Since my pool was rather small (planned for 24 disks max), I was not concerned by the longer import process caused by not having the zpool.cache file. I was more concerned with automatic import process which might happen, and had to prevent it at almost any cost. In addition, I learned from other systems that the arc memory should never exceed half the RAM, and it should be given just a little under that.

Of course, when changing such module settings, you need to recreate initrd (dracut -f) to be on the safe side later on.

The zfs.sh agent script was placed in /usr/share/cluster directory. You must have rgmanager installed for this directory to exist, and anyhow, without rgmanager, you will have no cluster whatsoever.

This is the contents of the zfs.sh file. Notice that it is not compatible with Luci, so if you’re using it – them kids won’t play well together.

All I had to do now was to build the cluster.conf file.

The reason I placed the IP address as the last to start and the first to stop was that the other way around, the NFS client would receive an ordered disconnection command, and would not bother to establish a connection with the remaining server. Abruptly taking away the clustered IP address causes the NFS clients to initiate a reconnection process, of which the systems are supposed to recover

I have left this article incomplete for a while now. It has some stuff I do like to share, so I am sharing it as-is. I will (some day) complete it.

Two advanced bash tricks

Well, tricks is not the right word to describe advanced shell scripting usage, however, it does make some sense. These two topics are relevant to Bash version 4.0 and above, which is common for all modern-enough Linux distributions. Yours probably.

These ‘tricks’ are for advanced Bash scripting, and will assume you know how to handle the other advanced Bash topics. I will not instruct the basics here.

Trick #1 – redirected variable

What it means is the following.

Let’s assume that I have a list of objects, say: ‘LIST=”a b c d”‘, and you want to create a set of new variables by these names, holding data. For example:

a=1
b=abc
c=3
d=$a

How can you iterate through the contents of $LIST, and do it right? If you’re having only four objects, you can live with stating them manually, however, for a dynamic list (example: the results of /dev/sd*1 in your system), you might find it a bit problematic.

A solution is to use redirected variables. Up until recently, the method involved a very complex ‘expr’ command which was unpleasant at best, and hard to figure at its worst. Now we can use normal redirected variables, using the exclamation mark. See here:

for OBJECT in $LIST
do
# Place data into the list
export $OBJECT=$RANDOM
done

for OBJECT in $LIST
do
# Read it!
echo ${!OBJECT}
done

Firstly – to assign value to the redirected variable, we must use ‘export’ prefix. $OBJECT=$RANDOM will not work.
Secondly – to show the content, we need to use exclamation mark inside the variable curly brackets, meaning we cannot call it $!OBJECT, but ${!OBJECT}.
We cannot dynamically create the variable name inside the curly brackets either, so ${!abc_$SUFFIX} won’t work either. We can create the name beforehand, and then use it, like this: DynName=abc_$SUFFIX ; echo ${!DynName}

Trick #2 – Using strings as an array index

It was impossible in the past, but now, one of the most useful features of having smart list is accessible in shell. We can now call an array with a label. For example:

for FILE in $( ls )
do
array["$FILE"]=$( ls -la $FILE | awk ‘{print $7}’ )
done

In this example we create array cells with the label being the name of the file, and populating them with the size (this is the result of ls -la 7th field) of this file.

This will work only if the array was declared beforehand using the following command (using the array name ‘array’ here):

declare -A array

Later on, it is easier to query data out of the array, as long as you know its index name. For example

FILE=ez-aton.txt
echo ${array[$FILE]}

Of course – assuming there is an entry for ez-aton.txt in this array.

The best use I found for this feature so far was for comparing large lists, without the need to reorder the objects in the array. I find it to boost the capabilities of arrays in Bash, and arrays, in general, are very powerful tools to handle long and complex lists, when you need to note the position.

That’s all fox. Note that the blog editor might change quites (single and double) and dashes to the UTF-8 versions, which will not go well in a copy/paste attempt to experiment with the code examples placed here. You might need to edit the contents and fix the quotes/dashes manually.

If you have any questions, comment here, I will be happy to elaborate. I hope to be able to add more complex Bash stuff I get into once a while :-)

NetApp internals – how to add SSH keys without C$ nor NFS shares

This post will describe the process of placing SSH keys using the internal ‘systemshell’ command of NetApp. As always – when doing something which the vendor did not intend you to do, do it very carefully. This data was obtained from NetApp forums, and while I do not have the original post to link (I usually link to the original, as a courtesy to the original author), this is the content, as is.

First, set to advanced mode:
filer> priv set advanced

Then, unlock and set a password to diag account:
filer*> useradmin diaguser unlock
filer*> useradmin diaguser password

Start the systemshell, create the directory you need and put the pubkey generated in the authorized_keys file:
filer*> systemshell

login: diag
Password: the same you set in the previous step

filer% mkdir -p /mroot/etc/sshd/root/.ssh
filer% vi /mroot/etc/sshd/root/.ssh/authorized_keys
filer% sudo chown -R root:wheel /mroot/etc/sshd/root
filer% sudo chmod -R 0600 /mroot/etc/sshd/root

Last, exit systemshell, lock diag account and exit advanced mode:
filer% exit
filer*> useradmin diaguser lock
filer*> priv set admin

If you want to do it for any other user, just replace the word ‘root’ with the said user.

An additional note – I had to create a user to perform ‘df’ operations only. The purpose was to be able to obtain data using ‘ssh’ without disclosing the keys used for root SSH access, by having a very limited user, designed to do that.

So the commands to create such a user are as follows:

useradmin role add df -a cli-df*,login-ssh
useradmin group add df_users -r df
useradmin user add df -g df_users
(here you will be asked to enter the user’s password)

Hope it helps!

 

 

Windows 7 hammering dnsmasq

I migrated to dnsmasq just yesterday, and discovered that a Windows 7 machine was hammering the server with messages like this:

Feb  1 11:06:07 dns dnsmasq-dhcp[1078]: DHCPINFORM(eth0) 192.168.1.77 91:de:87:7b:e5:a8
Feb  1 11:06:07 dns dnsmasq-dhcp[1078]: DHCPACK(eth0) 192.168.1.77 91:de:87:7b:e5:a8 winpc

Googling a bit, I found out this link (with an explanation). The solution is fairly simple. Add the following line to your dnsmasq.conf file to solve the problem:

dhcp-option=252,”\n”

 

XenServer and its damn too small system disks

I love XenServer. I love the product, I believe it to be a very good answer for SMBs, and enterprises. It lacks on external support, true, but the price tag for many of the ‘external capabilities’ on VMware, for instance, are very high, so many SMBs, especially, learn to live without them. XenServer gives a nice pack of features, at a very reasonable price.

One of the missing features is the management packs of hardware vendors, such as HP, Dell and IBM. Well, HP does have something, and its installation is always some sort of a challenge, but they do, so scratch that. Others, however, do not supply management packs. The bright side is that with Domain0 being a full featured i386 Centos 5 distribution, I can install the Centos/RHEL management packs, and have a ball. This brings us to another challenge there – the size of the system disk (root partition) by default is too small – 4GB, and while it works quite well without any external components, it tends to get filled very fast with external packages installed, like Dell tools, etc. Not only that, but on a system with many patches the patches backups take their toll, and consume valuable space. While my solution will not work for those who aim at the smallest possible space, such as SD or Disk-on-Key for the XenServer OS, it aims for the most of us all, where the system resides on several tenths of gigabytes at least, and is capable of sustaining the ‘loss’ of additional 4GB. This process modifies the install.img file, and authors the CD as a new one, your own privately-modified instance of XenServer installation. Mind you that this change will be effective only for new installations. I have not tested this as the upgrade path for existing systems, although I believe no harm will be done to those who upgrade. Also – it was performed and tested on XenServer 6.2, and not 6.2 SP1, or prior versions, although I believe that the process should look pretty similar in nature.

You will need a Linux machine to perform this operation, end to end. You could probably use some Windows applications on the way, but I have no idea as to which or what.

Step one: Open the ISO, and copy it to somewhere useful (assume /tmp is useful):

mkdir /tmp/ISO
mkdir /tmp/RW
mount -o loop /path/to/XenServer-6.2.0-install-cd.iso /tmp/ISO
cd /tmp/ISOtar cf – . | ( cd /tmp/RW ; tar xf – )

Step two: Extract the contents of the install.img file in the root of the CDROM:

mkdir /tmp/install
cd /tmp/install
cat /tmp/RW/install.img | gzip -dc | cpio -id

Step three: Edit the contents of the definitions file:

vi opt/xensource/installer/constants.py

Change the value of ‘root_size’ to something to your taste. Mind you that with 4GB it was tight, but still usable, even with additional 3rd party tools, so don’t become greedy. I defined it to be 6GB (6144)

Step four: Wrap it up:

cd /tmp/install ; find . | cpio -o -H newc | gzip -9 > /tmp/RW/install.img

Step five: Author the CD, and prepare it to be burned:

cd /tmp/RW
mkisofs -J -T -o /share/temp/XenServer-6.2-modified.iso -V “XenServer 6.2″ -volset “XenServer 6.2″ -A “XenServer 6.2″ \
-b boot/isolinux/isolinux.bin -no-emul-boot -boot-load-size 4 -boot-info-table -R -m TRANS.TBL .

You now have a file called ‘XenServer-6.2-modified.iso’ in your /tmp, which will install your XenServer with the disk partition size you have set it to install. Cheers.

BTW, and to make it entirely clear – I cannot be held responsible to any damage caused to any system you tweaked using this (or for that matter – any other) guide I published.

Enjoy your XenServer’s new apartment!

Extracting/Recreating RHEL/Centos6 initrd.img and install.img

A quick note about extracting and recreating RHEL6 or Centos6 (and their derivations) installation media components:

Initrd:

Extract:

mv initrd.img /tmp/initrd.img.xz
cd /tmp
xz –format=lzma initrd.img.xz –decompress
mkdir initrd
cd initrd
cpio -ivdum < ../initrd.img

Archive (after you applied your changes):

cd /tmp/initrd
find . | cpio -o -H newc | xz -9 –format=lzma > ../new-initrd.img

/images/install.img:

Extract:

mount -o loop install.img /mnt
mkdir /tmp/install.img.dir
cd /mnt ; tar cf – –one-file-system . | ( cd /tmp/install.img.dir ; tar xf – )
umount /mnt

Archive (after you applied your changes):

cd /tmp
mksquashfs install.img.dir/ install-new.img

Additional note for Anaconda installation parameters:

I did not test it, however there is a boot flag called stage2= which should lead to a new install.img file, other than the hardcoded one. I don’t if it will accept /images/install-new.img as its flag, but it can be a good start there.

One more thing:

Make sure that the vmlinuz and initrd used for any custom properties, in $CDROOT/isolinux do not exceed 8.3 format. Longer names didn’t work for me. I assume (without any further checks) that this is isolinux limitation.

Attach multiple Oracle ASM snapshots to the same host

The goal – connecting multiple Oracle ASM snapshots (same source LUNs, of course) to the same machine. The next process will demonstrate how to do it.

Problem: ASM disks use a disk label called ASMLib to maintain access even when the logical disk path might change (like adding a LUN with a lower ID and rebooting the server). This solves a major problem which was experienced with RAW devices, when order changed, and the ‘wrong’ disks took the place of others. ASM labels are a vital part in managing ASM disks and ASM DiskGroups. Also – the ASM DiskGroup name should be unique. You cannot have multiple DiskGroups with the same name.

Limitations – you cannot connect the snapshot LUNs to the same server which has access to the source LUNs.

Process:

  1. Take a snapshot of the source LUN. If the ASM DiskGroup spans across several LUNs, you must create a consistency group (each storage device has its own lingo for the task).
  2. Map the snapshots to the target server (EMC – prepare EMC Snapshot Mount Points (SMP) in advance. Other storage devices – depending)
  3. Perform partprobe on all target servers.
  4. Run ‘service oracleasm scandisks‘ to scan for the new ASM disk labels. We will need to change them now, so that the additional snapshot will not use duplicate ASM labels.
  5. For each of the new ASM disks, run ‘service oracleasm force-renamedisk SRC_NAME TGT_NAME‘. You will want to rename the source name (SRC_NAME) to a unique target name, with some correlation to the snapshot name/purpose. This is the reasonable way of making some sense of a possibly very messy setup.
  6. As the Oracle user, with the correct PATH variables ($ORACLE_HOME should point to the CRS_HOME) and the right ORACLE_SID (for example – +ASM1), run: ‘renamedg phase=both dgname=SRC_DG_NAME newdgname=NEW_DG_NAME verbose=true‘. The value ‘SRC_DG_NAME’ represents the original (on the source) DiskGroup name, and the NEW_DG_NAME represents the new name. Much like when renaming the disks – the name should have some relationship with either the snapshot name, so you can find your hands and legs in this mess (again – imagine having six snapshots, each of a DiskGroup with four LUNs. Now – this is a mess).
  7. You can now mount the DiskGroup (named NEW_DG_NAME in my example) on both nodes

Assumptions:

  1. Oracle GI is up and running all through this process
  2. I tested it with Oracle 11.2.0.3. Other versions of 11.2.0.x might work, I have no clue about previous 11.1.x versions, or any earlier versions.
  3. It was tested on Linux. My primary work platform. It was, to be exact, on RHEL 6.4, but it should work just the same on any RHEL-like platform. I believe it will work on other Linux platforms. I have no clue about running it on any other Unix/Windows platform.
  4. The DiskGroup should not be mounted (no reason for it to be mounted right on discovery). Do not manually mount it prior to performing this procedure.

Good luck, and post a comment if you find this explanation either unclear, or if you encounter any problem.

 

 

 

XenServer – increase LVM over iSCSI LUN size – online

The following procedure was tested by me, and was found to be working. The version of the XenServer I am using in this particular case is 6.1, however, I belive that this method is generic enough so that it could work for every version of XS, assuming you're using iSCSI and LVM (aka - not NetApp, CSLG, NFS and the likes). It might act as a general guideline for fiber channel communication, but this was not tested by me, and thus - I have no idea how it will work. It should work with some modifications when using Multipath, however, regarding multipath, you can find in this particular blog some notes on increasing multipath disks. Check the comments too - they might offer some better and simplified way of doing it.

So - let's begin.

First - increase the size of the LUN through the storage. For NetApp, it involves something like:

lun resize /vol/XenServer/luns/SR1.lun +1t

You should always make sure your storage volume, aggregate, raid group, pool or whatever is capable of holding the data, or - if using thin provisioning - that a well tested monitoring system is available to alert you when running low on storage disk space.

Now, we should identify the LUN. From now on - every action should be performed on all XS pool nodes, one after the other.

cat /proc/partitions

We should keep the output of this command somewhere. We will use it later on to identify the expanded LUN.

Now - let's scan for storage changes:

iscsiadm -m node -R

Now, running the previous command again will have a slightly different output. We can not identify the modified LUN

cat /proc/partitions

We should increase it in size. XenServer uses LVM, so we should harness it to our needs. Let's assume that the modified disk is /dev/sdd.

pvresize /dev/sdd

After completing this task on all pool hosts, we should run sr-scan command. Either by CLI, or through the GUI. When the scan operation completes, the new size would show.

Hope it helps!

BackupExec 2012 (14) on newer Linux

In particular – Oracle UEK, which “claims” to be 2.6.39-xxx, but is actually 3.0.x with a lower version number. Several misbehaviors (or differences) of version 3 can be found. One of them is related to BackupExec. The service would not start on OEL6 with UEK kernels. The cause of it is an incorrect use of a function – getIfAddrs. Everything can be seen in this amazing post. The described patch works, at least to allow the service to start. Check out the comments for some insights about how to identify the correct call.

I am re-posting it here, so it can be found for Oracle Universal Enterprise Kernel (UEK) as well.

NetApp “Broken disk label”

When using ‘disk show -v’ on a NetApp filer version 7.3.x, following replacement or addition of disk(s), you might see the above mentioned message. It is caused by incorrect disk label – of OnTap version 8, on an OnTap version 7.3.x system. The system cannot handle the incorrect label, and thus – ignores the disk.

A set of actions is required to clean the label and allow the NetApp to use this specific disk. The easiest method (although it will not be described here) would be to place the disk back in an OnTap 8 NetApp device, and clean the label from there, however, it is not always possible.

On your OnTap 7.3.x system, do the following (assuming you know the address of the disk, right?) – taken from NetApp’s forums here.

disk assign <diskid>
priv set diag
labelmaint isolate <diskid>
label wipe <diskid>
label wipev1 <diskid>
label makespare <diskid>
labelmaint unisolate
priv set

The fifth or sixth lines might fail to run, but still – the process will succeed as a whole.