Don’t try it at home – NetApp SnapMirror network-free Seeding

Well, this is rather common. Network-free seeding is being performed through using a tape device.
What happens if you do not have a tape device? We have the poor-man’s huge TB disks (SATA for the people) connected to simple PC systems, but no tape…
We perform network-free seeding into disk.

There is a utility called ‘lrep’ which is nice and effective. It forces the use of Qtree-based snapmirror (QSM), which has its own limitations. I advise you’ll read NetApp’s “SnapMirror Async Overview and Best Practices Guide” with id TR3446 for further details.

Limitations:

  • You have to have enough space on the source NetApp device to contain twice the volume you require to replicate
  • You can copy the output replica to an external Windows/Linux/Other system, of a movable type (could be a desktop), with large disks using CIFS or NFS.
  • HIgh CPU usage is guaranteed, as well as high disk usage.
  • You are aware to the fact that this is dangerous, and they will probably won’t like you just a little bit for doing this.

Concept:

Using the command ‘snapmirror store‘ you are able to store initial replica into a tape device for later seeding. The options (/man/Internet) describe how you should use your tape device. You don’t have one, or you don’t have one on each site.

Operation:

  • Create a volume or verify you have enough free space on some existing volume on the source NetApp device.
  • You will require the exact amount of used space on the source volume, give or give a little.
  • I recommend you’ll disable snapshots on the target volume for the duration of this operation, to save space.

Let’s assume that the source volume name is ‘vol2′, and that we have enough free space on /vol/vol1/free_space.

You need to perform the following:

Dangerous – change your privilege level:

priv set diag

Perform the initial SnapMirror

snapmirror store vol2 /vol/vol1/free_space/vol2

Remember – you have to be in ‘diag’ mode for it to work.

Reduce your privileges to normal:

priv set

Track the status through

snapmirror status

When the operation has completed, connect from your external Windows/Linux/Other machine to /vol/vol1/free_space and copy out the file ‘vol2′ which will probably be huge.

Transfer the Windows/Linux/Other system (or only its disk) to the alternate location, create a volume or verify you have enough free space on an existing volume to contain the entire file, and copy it to that location. I will assume it’s the same as on the source NetApp device – /vol/vol1/free_space/

Change your privileges level:

priv set diag

Create the target volume, and set it to be restricted (the values here are just an example):

vol create vol2 aggr0 100g

vol restrict vol2

Perform a ’snapmirror retrieve’ operation:

snapmirror retrieve vol2 /vol/vol1/free_space/vol2

Reduce your privileges to normal:

priv set

You can track the status through

snapmirror status

Following that, perform an update with the source NetApp real path (filer1:/vol/vol2) and you’re fine.

Remember – be very very careful when running under ‘diag’ privileges.

Oracle VM post-install check list

Following my experience with OracleVM, I am adding my post-install steps for your pleasure. These steps are not mandatory, by design, but will help you get up and running faster and easier. These steps are relevant to Oracle VM 2.2, but might work for older (and newer) versions as well.

Define bonding

You should read more about it in my past post.

Define storage multipathing

You can read about it here.

Define NTP

Define NTP servers for your Oracle VM host. Make sure the daemon ‘ntpd’ is running, and following an initial time update, via

ntpdate -u <server>

to set the clock right initially, perform a sync to the hardware clock, for good measures

hwclock –systohc

Make sure NTPD starts on boot:

chkconfig ntpd on

Install Linux VM

If the system is going to be stand-alone, you might like to run your VM Manager on it (we will deal with issues of it later). To do so, you will need to install your own Linux machine, since Oracle supplied image fails (or at least – failed for me!) for no apparent reason (kernel panic, to be exact, on a fully MD5 checked image). You could perform this action from the command line by running the command

virt-install -n linux_machine -r 1024 -p –nographics -l nfs://iso_server:/mount

This directive installs a VM called “linux_machine” from nfs iso_server:/mount, with 1GB RAM. You will be asked about where to place the VM disk, and you should place it in /OVS/running_pool/linux_machine , accordingly.

It assumes you have DHCP available for the install procedure, etc.

Install Oracle VM Manager on the virtual Linux machine

This should be performed if you select to manage your VMs from a VM. This is a bit tricky, as you are recommended not to do so if you designing HA-enabled server pool.

Define autostart to all your VMs

Or, at least, those you want to auto start. Create a link from /OVS/running_pool/<VM_NAME>/vm.cfg to /etc/xen/auto/

The order in which ‘ls’ command will see them in /etc/xen/auto/ is the order in which they will be called.

Disable or relocate auto-suspending

Auto-suspend is cool, but your default Oracle VM installation has shortage of space under /var/lib/xen/save/ directory, where persistent memory dumps are kept.  On a 16GB RAM system, this can get pretty high, which is far more than your space can contain.

Either increase the size (mount something else there, I assume), or edit /etc/sysconfig/xendomains and comment the line  with the directive XENDOMAINS_SAVE= . You could also change the desired path to somewhere you have enough space on.

Hashing this directive will force regular shutdown to your VMs following a power off/reboot command to the Oracle VM.

Make sure auto-start VMs actually start

This is an annoying bug. For auto-start of VMs, you need /OVS up and available. Since it’s OCFS2 file system, it takes a short while (being performed by ovs-agent).

Since ovs-agent takes a while, we need to implement a startup script after it and before xendomains. Since both are markes “S99″ (check /etc/rc3.d/ for details), we would add a script called “sleep”.

The script should be placed in /etc/init.d/

#!/bin/bash
#
# sleep     Workaround Oracle VM delay issues
#
# chkconfig: 2345 99 99
# description: Adds a predefined delay to the initialization process
#
 
DELAY=60
 
case "$1" in
start) sleep $DELAY
;;
esac
exit 0

Place the script as a file called “sleep” (omit the suffix I added in this post), set it to be executable, and then run

chkconfig –add sleep

This will solve VM startup problems.

Fix /etc/hosts file

If you are into multi-server pool, you will need that the host name would not be defined to 127.0.0.1 address. By default, Oracle VM defines it to match 127.0.0.1, which will result in a poor attempt to create multi-server pool.

This is all I have had in mind for now. It should solve most new-comer issues with Oracle VM, and allow you to make good use of it. It’s a nice system, albeit it’s ugly management.

Update the OracleVM

You could use Oracle’s unbreakable network, if you are a paying customer, or you could use the Public Yum Server for your system.

Updates to Oracle VM Manager

If you won’t use Oracle Grid Control (Enterprise Manager) to manage the pool, you will probably use Oracle VM Manager. You would need to update the ovs-console package, and you will probably want to add tightvnc-java package, so that IE users will be able to use the web-based VNC services. You would better grub these packages from here.

Citrix XenServer and DR

Assuming your storage is capable of replication, a bunch of VMs could be happily replicated to an alternate location, where you can start them on will (and on crisis, most likely).

This procedure, in theory, is rather simple. I have discovered that it is less so, especially if your system goes into testing once a while.

This leaves some “memories” with Citrix XenServer, as well as the underlayer OS.

The forums show examples of how to handle iSCSI re-attach, so I am adding here hot to handle fiber channel re-attach operation as well, especially if you’re using multipath.

Grab the SCSI ID

The SCSI ID is very important, as it is he basic identified of the LUN:

xe sr-proble type=lvmohba 2>&1

This will show a section (XML-like) under “SCSIid” containing the device’s SCSI ID. Also – grab the UUID of the device, as XenServer assigns. You could do this with:

xe sr-probe type=lvmohba device-config:SCSIid=<Enter SCSI ID here>

Introduce the Device

The system needs the device introduced. For that we’ll use the above captured UUID:

xe sr-introduce uuid=<Enter the UUID obtained before> shared=true typb=lvmohba namb-label=”Imported SR”

Keep the UUID returned by this command. I will reference to it as “SR UUID”

Create PBD for each host in the pool

This should be performed for each host, based on the UUID of all hosts:

xe pbd-create sr-uuid=<Enter the SR UUID obtained before> device-config=SCSIid=<Enter SCSI ID> host-uuid=<Enter UUID of host>

The result value, for each host, is the pbd UUID for that specific host. pbd is a physical Block Device, meaning – the “connection” of the Storage Repository (which is pool-wide) to the specific host.

Plug PBDs

For each of the pbd UUID obtained above, run the following command to plug it in:

xe pbd-plug uuid=<pbd UUID as obtained above>

That should be all.

For ease of repeat, below is a script to perform these exact actions automatically. It assumes one (1) unallocated LUN at the beginning of the process. It will probably behave differently for more than one.

#!/bin/bash
SCSIID=`xe sr-probe type=lvmohba 2>&1 | grep -A1 SCSIid | grep -v SCSIid | grep -v vendor | tail -1 | awk '{print $NF}'`
echo "scsi ID: $SCSIID"
UUID=`xe sr-probe type=lvmohba device-config:SCSIid=$SCSIID | grep -A1 UUID | grep -v UUID | grep -v Devlist | head -1 | awk '{print $NF}'`
echo "SR UUID $UUID"
SRID=`xe sr-introduce uuid=$UUID name-label="Imported SR" shared=true type=lvmohba`
echo "SR UID: $SRID"
HOSTID=`xe host-list  | grep uuid | awk '{print $NF}'`
PBDID=""
for i in $HOSTID
do 
   PBDID="$PBDID `xe pbd-create sr-uuid=$SRID device-config:SCSIid=$SCSIID host-uuid=$i`"
done
for i in $PBDID
do
   xe pbd-plug uuid=$i
done

Oracle VM and network bonding

Oracle VM, out of the box, does not allow network bonds. An excellent guide on how to enable bonding which I have partially followed, has convinced me that changing the relevant scripts would be better. That I have done, and reported in this wiki post.

To sum things up – configure bonding/VLAN tagging as you would have normally do on any RHEL-based Linux distro, and replace the script /etc/xen/scripts/network-bridges with the modified one mentioned below. This script will define bridge for each network interface node already defined as bridge, or slave of a bond connection, as part of the xend service.

#!/bin/bash 
# 
# Runs network-bridge against each ethernet card. 
# 
 
dir=$(dirname "$0") 
 
run_all_ethernets() 
{ 
    devnum=0
    for f in /sys/class/net/*; do 
        netdev=$(basename $f) 
        if [[ $netdev =~ "^bond[0-9.]+$" ]] || [[ $netdev =~ "^eth[0-9.]+$" ]]; then 
		# devnum=${netdev:3} 
		unset SLAVE MASTER BRIDGE
		. /etc/sysconfig/network-scripts/ifcfg-${netdev}
		if [[ `echo $SLAVE | egrep "yes|YES|1"` ]] || [[ -n "$BRIDGE" ]] ;then            
			echo "Interface ${netdev} is being defined and claimed by someone else"
		else	
			if [[ $netdev =~ "^((eth)|(bond))[0-9]+\." ]]; then
				# This is vlan tagging, and we want persistent vlan name!
				VLAN=$(echo $netdev | cut -f2 -d.)
				# Assume tags are unique on a server - no several bridges for a single tag.
				# If you intend on having eth2.3 and eth3.3 to your vms as bridges (and not deal with it
				# on the host level), this script is not for you
				$dir/network-bridge "$@" "netdev=${netdev}" "bridge=xenbrv${VLAN}"
			else
				$dir/network-bridge "$@" "netdev=${netdev}" "bridge=xenbr${devnum}" 
				let devnum++
			fi
	  	fi
        fi  
    done 
} 
 
run_all_ethernets "$@"

After a very long absence, Changing Linux HVM to PV on Xen

I have had a stressed time, and had no time to actually write down anything here. This is pity, since I have been doing so many things worth sharing.
I will start with a small one now – how to convert a physical machine into Xen-based VM. I assume you know the drill of how to do P2V in whatever method you like. My preferred method is of booting into a new system (virtual) and then manually building the partitions, LVM, etc, and using ‘tar’ with ‘nc’ to copy files from the source server to the target server. I might elaborate more about it some time, but this post is about the next phase – We have now previously physical server which used to use /dev/sdX or /dev/cciss/cXdXpX, or whatever else, and we need to make it Xen-friendly.
This procedure will apply to RedHat’s Xen-based virtualization, OracleVM or Citrix XenServer.
I assume that the target system maintains LVM settings and mount points as the original one.
These are the steps that need to be performed, manually.
I used CactiEZ image, installed as HVM on Citrix XenServer as my example. This procedure should apply to all Linux systems, with respect to their package manager.

Edit /etc/modprobe.conf

Change eth0,1,whatever alias to xennet

Change/add alias of scsi_hostadapter to xenblk

Edit /etc/blkid.tab

Run the command

sed -i ’s/hd/xvd/g’ /etc/blkid.tab

to change references if you use labels. These will be used by rc.sysinit later during boot sequence, and we need them configured correctly.

Get Paravirtualized Kernel

yum install kernel-xenU

Edit /etc/securetty

Add the line “xvc0″ to it. Notice, it’s zero and not ‘o’

Edit /etc/inittab

Replace tty1 with xvc0

Edit /boot/grub/menu.lst

Change the default entry to be the one with the xenU kernel

Replace ‘/dev/sda’ with ‘/dev/xvda’

Edit /boot/grub/device.map

Replace ‘/dev/sda’ with ‘/dev/xvda’

Edit /etc/fstab

Verify labels are used (and then – see changes to blkid.tab above), or devices are renamed to xvd[a-z]

Reboot into PV-enabled VM

This should do it. The VM will attempt to detect new hardware, network MAC changes, etc, but it will work fine.

NetApp SnapMirror monitor script

I have had some work done lately with NetApp SnapMirror. I have snapped-mirrored some volumes and qtrees and I wanted to monitor their use and behavior over the line.

As you can expect, site-to-site replication of data is a fragile thing, especially when done on the level of the storage device, which is agnostic to the data kept on it. When replicating volumes, I should expect the relevant employees to be responsible regarding what’s placed there, because the storage does not filter out the junk. If someone had decided to add a new DVD image on the DB storage space, well – the DB won’t care, as long as there is enough free space, but the storage will attempt to replicate the added data to the alternate site, which means that if you are around your bandwidth limits, which is never a good thing, you will just create a delay gap you would hardly (if at all) be able to close.

For that, and since I don’t tend to trust people not to do stupid things, I have written this script.

What does it do?

This script will perform the following:

Alerting about non-idle SnapMirror session

Use with ‘-m alert’

Assuming SnapMirror is scheduled to a specific time, the script will alert if a session is active. With the flag ‘-a no’, it will not send an e-mail (if possible, see the configuration section below). With ‘-r yes’, it will react, setting throttle for each non-idle session, but then ‘-t VALUE’ should be specified, where VALUE is the numeric throttle in KB/s.

Limiting throttle to a SnapMirror session

Use with ‘-m throttle_limit’

The script will set a throttle for SnapMirror session(s). Setting limit by the flag ‘-t VALUE’, where VALUE is the numeric throttle in KB/s per each session.

Cancelling throttle limit

Use with ‘-m throttle_unlimit’

The script will set unlimited throttle for SnapMirror session(s).

Checking SnapMirror lag

Use with the ‘-m check_lag’

Since replication has a purpose of recovering, the lag of each SnapMirror session would show how far back we are. Use with ‘-d VALUE’, VALUE being numeric time in minutes to set alert threshold. The default threshold delay is one day (1440 minutes).

Checking snapshots size

Use with the ‘-m check_size’

This reports the expected delta to transfer. This can help estimate the success or failure of a future sync of data (snapmirror update) before it begins. Use with ‘-l’ flag to set it to log date/time of measure and the expected sizes into a file. By default, in /tmp/target_name.txt, where the target is the SnapMirror target.

General Options

Use with ‘-c filename’ for alternate configuration file.

Use with ‘-h’ to get general help.

Use with a list target names in the format of storage:/vol/volname/qtree or storage:volname to ignore targets in configuration file and use your own.

Configuration File

The configuration file is rather simple. By default it should be called “/etc/snapmirror_monitor.conf“. It consists of two main variables for the system:

TGTS=”storage2:/vol/volname/qtree

storage3:volname2

storage1:/vol/volnew/qtr2″

EMAIL=”user@domain.com another_user@domain.com”

Prerequisites

This script will run on any modern Linux machine. For it to communicate with the NetApp devices, you will need SSH enabled on the NetApps, and ssh key exchange so that the Linux would be able to access the NetApp without using passwords.

The Script

Below is the script. You can download it and use it as you like.

#!/bin/bash
# This script will monitor snapmirror status
# Assumption: Access through ssh to root on all storage devices involved
# This will also attempt to detect the diff which is to sync
 
# Written by Ez-Aton. Check http://run.tournament.org.il for updates or
# additional information
 
# Modes: 
# alert -> alert if snapmirror is still active
# throttle_limit -> Limit throttle to a given number (default or manually set)
# throttle_unlimit -> Open throttle limitation
# check_lag -> Report the snapmirror lage
# check_size -> Report the estimated data size to move
 
# Global variables
CONF=/etc/snapmirror_monitor.conf
LOG_PREFIX=/tmp
 
test_connection () {
        # Test to see that you can access the storage device
        # Arguments: NetApp name
        SSH_OPTS="-o ConnectTimeout=2"
        if ! ssh $SSH_OPTS $1 hostname &>/dev/null
        then
                echo "Cannot communicate via SSH to $1"
                exit 1
        fi
}
 
abort () {
        # Exit with a predefined error message
        echo $*
        exit 1
}
 
get_arguments () {
        # Get all arguments and define options
        # Argument: $@
        [ -z "$1" ] && set -- -h
        while [ -n "$1" ]
        do
                case "$1" in
                        -m)     shift
                                case "$1" in
                                        alert|throttle_limit|throttle_unlimit|check_lag|check_size)     MODE=$1
                                        ;;
                                        *)      abort "Mode is mandatory. Use -h flag to get list of avialable flags"
                                        ;;
                                esac
                                ;;
                        -a)     shift
                                case "$1" in
                                        [nN][oO])       NOMAIL=1
                                                        ;;
                                        *)              NOMAIL=0
                                                        ;;
                                esac
                                ;;
                        -r)     shift
                                case "$1" in
                                        [yY][eE][sS])   REACT=1
                                                        ;;
                                        *)              REACT=0
                                                        ;;
                                esac
                                ;;
                        -d)     shift
                                declare -i DELAY_TMP
                                DELAY_TMP=$1
                                [ "$DELAY_TMP" != "$1" ] && abort "Delay needs to be a number in minutes"
                                DELAY=$DELAY_TMP
                                ;;
                        -t)     shift
                                declare -i THROTTLE_TMP
                                THROTTLE_TMP=$1
                                [ "$THROTTLE_TMP" != "$1" ] && abort "Throttle needs to be a number"
                                THROTTLE=$THROTTLE_TMP
                                ;;
                        -c)     shift
                                [ -f "$1" ] || abort "Cannot find specified conf file"
                                CONF="$1"
                                ;;
                        -l)     LOG=1
                                ;;
                        -h)     echo "Usage: $0 -m [alert|throttle_limit|throttle_unlimit|check_lag|check_size] (-c CONF_FILE) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Alert if SnapMirror is still running: $0 -m alert [-a no] (-r yes) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Alert and throttle (react): $0 -m alert [-a no] -r yes -t [throttle_in_kb] [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Throttle a running SnapMirror: $0 -m throttle_limit -t throttle_in_kb [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Unlimit SnapMirror throttle: $0 -m throttle_unlimit [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "To check lag: $0 -m check_lag -d delay_in_minutes (-a no) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "To check delta: $0 -m check_size [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                exit 0
                                ;;
                        *)      [ -z "$MODE" ] && abort "$0 mode required"
                                TGTS="$*"
                                ;;
                esac
                shift
        done
}
 
notify () {
        # Send an e-mail notification
        # Arguments: $@ - the subject
        # Contents are empty
        # And yes - one e-mail per event
        mail -s "$@" $EMAIL < /dev/null
}
 
idle () {
        # Check if transaction is idle
        # Arguments: Target name (example: storage:/vol/volname/qtree
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        ssh $NETAPP snapmirror status $1 | tail -1 | grep Idle$ &>/dev/null #Checks if the snapmirror is idle. If so, return true
        return $?
}
 
set_throttle () {
        # Sets throttle for target
        # Arguments: $1 Target name (example: storage:/vol/volname/qtree)
        # Arguments: $2 throttle value (number)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        ssh $NETAPP snapmirror throttle $2 $1
}
 
get_lag () {
        # Gets the lag of snapmirror relationship in minutes
        # Arguments: Target name (example: storage:/vol/volname/qtree)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        LAG=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $4}'`
        # LAG is in hh:mm:ss. We need to transfer it to minutes only
        H=`echo $LAG | cut -f 1 -d :`
        M=`echo $LAG | cut -f 2 -d :`
        let M=$M+$H*60
        echo $M
}
 
check_size () {
        # Checks the size of the snapshot to copy (diff)
        # Arguments: Target name (example: storage:/vol/volname/qtree)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        # Get source storage name and path
        SRC=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $1}'`
        # Get the source filer and vol name from that
        NETAPP=${SRC%%:*}
        SPATH=${SRC##*:}
        SPATH=`echo $SPATH | sed s/'\/vol\/'//`
        SPATH=${SPATH%%/*}
 
        test_connection $NETAPP # Verify the target NetApp is accessible
        SNAP=`ssh $NETAPP snap list -n $SPATH | grep snapmirror | tail -1 | awk '{print $4}'`
        DELTA=`ssh $NETAPP snap delta $SPATH $SNAP | tail -2 | head -1 | awk '{print $5}'`
        echo "Snap delta for $1 is $DELTA KB"  
        LOG_TARGET=`echo $1 | tr / _`.txt
        [ -n "$LOG" ] && echo "`date` $DELTA" >> $LOG_PREFIX/$LOG_TARGET
}
 
 
### MAIN ###
get_arguments $@
. $CONF &>/dev/null
# if e-mail is not set, don't try to send
[ -z "$EMAIL" ] && NOMAIL=1
 
[ -z "$TGTS" ] && abort "You need at least one snapmirror target"
 
case $MODE in
        alert)  if [ "$REACT" == "1" ]
                then
                        [ -z "$THROTTLE" ] && abort "When setting 'react' flag, you must specify throttle"
                fi
                for i in $TGTS
                do
                        if ! idle $i
                        then
                                echo -n "$i is not idle. "
                                [ "$NOMAIL" != "1" ] && notify "$i is not idle"
                                if [ "$REACT" == "1" ]
                                then
                                        echo -n "We are set to react. Limiting throttle"
                                        set_throttle $i $THROTTLE
                                fi
                                echo
                        fi
                done
                ;;
        throttle_limit) [ -z "$THROTTLE" ] && abort "Throttle requires throttle value"
                        for i in $TGTS
                        do
                                echo "Setting throttle for $i to $THROTTLE"
                                set_throttle $i $THROTTLE
                        done
                        ;;
        throttle_unlimit)       for i in $TGTS
                                do
                                        echo "Setting throttle for $i to unlimited"
                                        set_throttle $i 0
                                done
                        ;;
        check_lag)      [ -z "$DELAY" ] && DELAY=1440
                        for i in $TGTS
                        do
                                LAG=`get_lag $i`
                                if [ "$LAG" -gt "$DELAY" ]
                                then
                                        echo "Failure: The delay for $i is $LAG minutes"
                                        [ "$NOMAIL" != "1" ] && notify "$i is lagged $LAG minutes, above the threshold $DELAY"
                                else
                                        echo "Normal: The delay for $i is $LAG minutes"
                                fi
                        done
                        ;;
        check_size)     for i in $TGTS
                        do
                                check_size $i
                        done
                        ;;
        *)      echo "Option $MODE is not implemented yet"
                exit 0
                ;;
esac

Rapid-guide – Updating RedHat initrd

Warning: This is not the recommended method if you’re not sure you know what you’re doing.

Linux Initial Ram Disk (initrd) is a mechanism to perform disk-independent actions before attempting to mount the ‘/’ disk. These actions usually include loading disk drivers, setting up LVM or software RAID, etc.

The reason these actions are performed within initrd is that it is all based on Ram Disk loaded by the boot loader, and thus it breaks the loop of “how would I load storage drivers without storage access?”

It happens that due to some special even we need to modify it manually. To do so we need first to open it, and then to close it back in, replacing (backup the old one, will you?) the previous one.

This is rather simple. The tools used by us will be ‘gzip’ and ‘cpio’.

Lets begin.

First – create a temporary directory:

mkdir /tmp/initrd

Extracting

We have our temporary directory, so now, we need to extract the initrd into it. I assume the name of the file is /boot/initrd.img. You should replace my line with whatever the name of your initrd file:

cd /tmp/initrd

cat /boot/initrd.img | gzip -dc |  cpio -id

This will extract the contents of the initrd into /tmp/initrd.

Now you can edit its contents directly.

Package

To package initrd back in, we will need to perform the following actions.

Warning – before you do it, make sure you have an available copy of your original initrd file, in case you have created some damage.

cd /tmp/initrd

find . | cpio -o -H newc | gzip -9 > /boot/initrd.img

This line packages the initrd, and replaces the old one.

That’s all for today :-)

Quickly install Xen Community Linux VM

On RHEL-type of systems, with virt-manager (libvirt), you can make use of virt-manager to easy your life. I, for myself, prefer to work with ‘xm‘ tools, but for the initial install, virt-manager is the quickest and most simple available tool.

To install a new Linux VM, all you need to follow this flow

Create an LV for your VM (I use LVs because it’s easier to manage). If not LV, use a file. To create an LV, run the following command

lvcreate -L 10G -n new_vm1 VolGroup00

I assume that the name you wish to grant is ‘new_vm1′ (better maintain order there, else you will find yourself with hundreds of small LVs you have no idea what to do with), and that the name of the volume group is ‘VolGroup00′. Change to different values to match your environment.

Next, make sure you have your ISO contents unpacked (you can use loop device) and exported via NFS (my favorite method).

To mount a CD/DVD ISO, you should use ‘mount’ command with the ‘loop’ options. This would look like this:

mount -o loop my_iso.iso /mnt/temp

Again, I assume the name of the ISO is my_iso.iso and that the target directory /mnt/temp is available.

Now, export your newly created directory. If you have NFS already running, you can either add to /etc/exports the newly mounted directory /mnt/temp and restart the ‘nfs’ service, or you could use ‘exportfs’ to add it:

exportfs -o no_root_squash *:/mnt/temp

would probably do the trick. I added ‘no_root_squash’ to make sure no permission/access problems present themselves during the installation phase. Test your export to verify it’s working.

Now you could begin your installation. Run the following command:

virt-install -n new_vm1 -r 512 -p -f /dev/VolGroup00/new_vm1 –nographics nfs://nfs_server:/mnt/temp

The name follows the ‘-n’ flag. The amount of RAM to give is 512MB. The -p means it’s paravirtualized. The -f shows which device will be the block device, and the last argument is the source of the installation. Do not use local files, as the VM installer should be able to access the installation source.

Following that, you should have a very nice TUI installation experience.

Now – let’s make this machine ‘xm’ compatible.

Currently, the VM is virt-manager compatible. It means you need virt-manager to start/stop it correctly. Since I prefer ‘xm’ commands, I will show you how to convert this machine to VM.

First – export its XML file:

virsh dumpxml new_vm1 > /tmp/new_vm1.xml

virsh domxml-to-native xen-xm /tmp/new_vm1.xml > /etc/xen/new_vm1

This should do the trick.

Now you can turn the newly created VM off, and remove the VM from virt-manager using

virsh undefine new_vm1

and you’re back to ‘xm’-only interface.

iSCSI persistent configurations agains us all

Using iSCSI with dm-multipath is rather common setup. With iSCSI running over Ethernet cables, which are too easy to disconnect (either on purpose or by mistake), being cheap and common technology – multipath becomes a must. If you have multiple network links, this is only expected that you use multipath for your iSCSI configuration. It’s cheap, it’s easy and it works.

This, however, comes with a price tag. Not money – the components are cheap and common, but there are configuration acts which should take place.

It is easy to find info either in the open-iscsi documentations, the Internet, whatever, and I will go over them just below, but there are some catches which one should be aware of.

Per the common documentation, unlike regular iSCSI communication, when dealing with multipath, you would like iSCSI to fail rather quickly and let the SCSI layer handle the errors, thus letting dm-multipath handle the errors and do its work.

The configuration directives are rather simple. In the iscsid.conf file (on RHEL5 located in /etc/iscsi/iscsid.conf ), you need to change the value

node.session.timeo.replacement_timeout =

To a very short period of time. By default, it is set to 120 seconds, which are two minutes before anyone will notify the SCSI subsystem of any disk IO errors. A good value would be 5 seconds, which should allow for very short network disconnection (which could happen) and still – let the dm-multipath manage errors fast enough so that applications would not fail on disk timeouts.

Another two parameters which should be defined are the following (read the comments above them in the config file):

node.conn[0].timeo.noop_out_interval =
node.conn[0].timeo.noop_out_timeout =

These values control the interval in which the iSCSI layer tests communication to the targets.

Also, in multipath.conf you will need to set the following feature, so that IOPs will not be lost:

features		"1 queue_if_no_path"

These configuration directives can be found in these two pages from RedHat:

iscsi-modifying-link-loss-behavior-dmmultipath

iscsi-replacements_timeout

This is nice and pretty. However, if you have failed to do so at start, and defined your iSCSI targets based on the default configurations, you will notice that it still takes very long for iSCSI to notify the SCSI subsystem of the errors. You could check the values used by iSCSI through running the command:

iscsiadm -m node -T <target name>

Check out especially the line called node.session.timeo.replacement_timeout. Its value is the one which decides the actual behavior of iSCSI.

To change it, there are several methods. One of them is to clean up the iSCSI persistent configurations, located in /var/lib/iscsi and then re-login to the iSCSI targets. Only then you will have the new target configuration.

Check again with iscsiadm as described above, and check that this value matches.

XenServer “Internal error: Failure… no loader found”

It has been long since I had the time to write here. I have recently been involved more and more with XenServer virtualization, as you might see in the blogs, and following a solution to a rather common problem, I have decided to post it here.

The problem: When attempting to boot a Linux VM on XenServer (5.0 and 5.5), you get the following error message:

Error: Starting VM ‘Cacti’ – Internal error: Failure(“Error from xenguesthelper: caught exception: Failure(\\\”Subprocess failure: Failure(\\\\\\\”xc_dom_linux_build: [2] xc_dom_find_loader: no loader found\\\\\\\\n\\\\\\\”)\\\”)”)

This is very common with Linux VMs which were converted from physical (or other, non-PV virtualization) to XenServer.

This will probably either happen during the P2V process, or after a successful update to the Linux VM.

The cause is that the original kernel, non PV-aware one, has not been removed, and GRUB likes to load from it. XenServer will use the GRUB menu, but will not display it to us to select our desired kernel.

With no chance to intervene, XenServer will attempt to load a PV-enabled machine using non-PV kernel, and will fail.

Preventing the problem is quite simple – remove your non-PV kernel (non-xen) so that future updates will not attempt to update it as well and set it to be the default kernel. Very simple.

Solving the problem in less than two minutes is a bit more tricky. Let’s see how to solve it.

All operations are performed from within the control domain. This guide does not apply to StorageLink or NetApp/Equalogic devices, as they behave differently. This applies only to LVM-over-something, whatever it may be.

First, we will need to find the name of the VDI we are to work on. Use xe in the following manner, using the VM’s name:

xe vbd-list vm-name-label=Cacti

uuid ( RO)             : 128f29dc-4a14-1a2d-75d1-8674d3d2403b
vm-uuid ( RO): eae053de-4a20-28a5-f335-f5a18dd79993
vm-name-label ( RO): Cacti
vdi-uuid ( RO): 90524af4-5b20-4412-9bfe-f1fe27f220b1
empty ( RO): false
device ( RO): xvda

uuid ( RO)             : de177727-b28a-8b79-e73e-d08366d56277
vm-uuid ( RO): eae053de-4a20-28a5-f335-f5a18dd79993
vm-name-label ( RO): Cacti
vdi-uuid ( RO): <not in database>
empty ( RO): true
device ( RO): xvdd

It is very common that xvdd is used for CDROM, so we can safely ignore the second section. The first section is the more interesting one. There is a correlation between the name of the VDI and the name of the LVM on the disk. We can find this specific LV using the following command. Notice that the name of the VDI is used here as the argument for the ‘grep’ command:

lvs | grep 90524af4-5b20-4412-9bfe-f1fe27f220b1

LV-90524af4-5b20-4412-9bfe-f1fe27f220b1 VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b -wi-a-  10.00G

We now have our LV path! As you can see, its status is offline. We need to set it to online state. Using both the LV and the VG name, we can do it like that:

lvchange -ay /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

Now we can access the volume. We can actually check that the problem is the one we look for, using pygrub:

pygrub /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

We should now see the GRUB menu of the VM at question. If you don’t see any menu, either you have missed a step or used the wrong disk.

The menu should show you all the list of kernels. The default one is the one highlighted, and if it doesn’t include the word “xen” with it, most likely that we have found the problem.

We now need to change to a PV-capable kernel. We will need to access the “/boot” partition of the Linux VM, and change GRUB’s options there.

First we map the disk to a loop device, so we can access its partitions:

losetup /dev/loop1 /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

Notice that you need to use the entire path to the LV, that the LV is online, and that loop1 is not in use. If it is, you will have a message saying something like “LOOP_SET_FD: Device or resource busy”

Now we need to access its partitions. We will map them using ‘kpartx’ to /dev/mapper/ devices. Notice we’re using the same loop device name:

kaprtx -a /dev/loop1

Now, new files present themselves in /dev/mapper:

ls -la /dev/mapper/
total 0
drwxr-xr-x  2 root root     220 Oct 24 12:39 .
drwxr-xr-x 14 root root   16560 Oct 24 12:31 ..
crw——-  1 root root  10, 62 Sep 29 10:15 control
brw-rw—-  1 root disk 252,  5 Oct 24 12:39 loop1p1
brw-rw—-  1 root disk 252,  6 Oct 24 12:39 loop1p2
brw-rw—-  1 root disk 252,  7 Oct 24 12:39 loop1p3

Usually, the first partition represents /boot, so we can now mount it and work on it:

mount /dev/mapper/loop1p1 /mnt

All we need to do is edit /mnt/grub/menu.lst to match our requirements, and then wrap everything back up:

umount /mnt

kpartx -u /dev/loop1

losetup -d /dev/loop1

We don’t have to change the LV to offline, because the XenServer will activate it if it’s not, however, we could do it, to be on the safe side:

lvchange -an /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

Now we can activate the VM, and see it boot successfully.

This whole process takes several minutes the first time, and even less later.

I hope this helps.