Posts Tagged ‘Linux’

RedHat cluster on RHEL6 and KVM-based VMs

Wednesday, August 1st, 2012

The concept of running a virtual machine, KVM-based, in this case, under RHCS is acceptable and reasonable. The interesting part is that the <vm/> directive replaces the <service/> directive and acts as a high-level directive for VMs. This allows for things which cannot be performed with regular 'service', such as live migration. There are probably more, but this is not the current issue.

An example of how it can be done can be shown in this excellent explanation. You can grab whatever parts of it relevant to you, as there is an excellent combination of DRBD, CLVM, GFS and of course, KVM-based VMs.

This whole guide assumes that the VMs reside on a shared storage, which is concurrently accessible by both (all?) hosts. When this is not the case, like when the shared filesystem is ext3/4 and not GFS, and the virtual disk image file is located on it. In this particular case, you would want to connect the VM to the mount. This cannot be performed, however, when using the <vm/> as a top directive (like <service/>), as it does not allow for child-resources.

As the <vm/> directive allows to be defined (with some limitations) as a child resource in a <service/> group, it inherits some properties from its parent (the <service/> directive), while some other properties are not mandatory and will be ignored. A sample configuration would be this:

<resources>
     <fs device="/dev/mapper/mpathap1" force_fsck="1" force_unmount="1" fstype="ext4" mountpoint="/images" name="vmfs" self_fence="0"/>
</resources>
<service autostart="1" domain="vm1_domain" max_restarts="2" name="vm1" recovery="restart">
     <fs ref="vmfs"/>
     <vm migrate="pause" name="vm1" restart_expire_time="600" use_virsh="1" xmlfile="/images/vm1.xml"/>
</service>

This would do the trick. However, the VM will not be able to live migrate, but will have to shutdown/startup for each cluster takeover.

Attach USB disks to XenServer VM Guest

Saturday, May 5th, 2012

There is a very nice script for Windows dealing with attaching XenServer USB disk to a guest. It can be found here.

This script has several problems, as I see it. The first – this is a Windows batch script, which is a very limited language, and it can handle only a single VDI disk in the SR group called “Removable Storage”.

As I am a *nix guy, and can hardly handle Windows batch scripts, I have rewritten this script to run from Linux CLI (focused on running from the XenServer Domain0), and allowed it to handle multiple USB disks. My assumption is that running this script will map/unmap *all* local USB disks to the VM.

Following downloading this script, you should make sure it is executable, and run it with the arguments “attach” or “detach”, per your needs.

And here it is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#!/bin/bash
# This script will map USB devices to a specific VM
# Written by Ez-Aton, http://run.tournament.org.il , with the concepts
# taken from http://jamesscanlonitkb.wordpress.com/2012/03/11/xenserver-mount-usb-from-host/
# and http://support.citrix.com/article/CTX118198
 
# Variables
# Need to change them to match your own!
REMOVABLE_SR_UUID=d03f247d-6fc6-a396-e62b-a4e702aabcf0
VM_UUID=b69e9788-8cd2-0074-5bc1-63cf7870fa0d
DEVICE_NAMES="hdc hde" # Local disk mapping for the VM
XE=/opt/xensource/bin/xe
 
function attach() {
        # Here we attach the disks
        # Check if storage is attached to VBD
        VBDS=`$XE vdi-list sr-uuid=${REMOVABLE_SR_UUID} params=vbd-uuids --minimal | tr , ' '`
        if [ `echo $VBDS | wc -w` -ne 0 ]
        then
                echo "Disks are allready attached. Check VBD $VBDS for details"
                exit 1
        fi
        # Get devices!
        VDIS=`$XE vdi-list sr-uuid=${REMOVABLE_SR_UUID} --minimal | tr , ' '`
        INDEX=0
        DEVICE_NAMES=( $DEVICE_NAMES )
        for i in $VDIS
        do
                VBD=`$XE vbd-create vm-uuid=${VM_UUID} device=${DEVICE_NAMES[$INDEX]} vdi-uuid=${i}`
                if [ $? -ne 0 ]
                then
                        echo "Failed to connect $i to ${DEVICE_NAMES[$INDEX]}"
                        exit 2
                fi
                $XE vbd-plug uuid=$VBD
                if [ $? -ne 0 ]
                then
                        echo "Failed to plug $VBD"
                        exit 3
                fi
                let INDEX++
        done
}
 
function detach() {
        # Here we detach the disks
        VBDS=`$XE vdi-list sr-uuid=${REMOVABLE_SR_UUID} params=vbd-uuids --minimal | tr , ' '`
        for i in $VBDS
        do
                $XE vbd-unplug uuid=${i}
                $XE vbd-destroy uuid=${i}
        done
        echo "Storage Detached from VM"
}
case "$1" in
        attach) attach
                ;;
        detach) detach
                ;;
        *)      echo "Usage: $0 [attach|detach]"
                exit 1
esac

 

Cheers!

Bonding + VLAN tagging + Bridge – updated

Wednesday, April 25th, 2012

In the past I hacked around a problem with the order of starting (and with several bugs) a network stack combined of network bonding (teaming) + VLAN tagging, and then with network bridging (aka – Xen bridges). This kind of setup is very useful for introducing VLAN networks to guest VMs. This works well on Xen (community, Server), however, on RHEL/Centos 5 versions, the startup scripts (ifup and ifup-eth) are buggy, and do not handle this operation correctly. It means that, depending on the update release you use, results might vary from “everything works” to “I get bridges without VLANs” to “I get VLANs without bridges”.

I have hacked a solution in the past, modifying /etc/sysconfig/network-scripts/ifup-eth and fixing some bugs in it, however, both maintaining the fix on every release of ‘initscripts’ package has proven, well, not to happen…

So, instead, I present you with a smarter solution, better adept to updates supplied from time to time by RedHat or Centos, using predefined ‘hooks’ in the ifup scripts.

Create the file /sbin/ifup-pre-local with the following contents:

 

#!/bin/bash
# $1 is the config file
# $2 is not interesting
# We will start the vlan bonding before any bridge
 
DIR=/etc/sysconfig/network-scripts
 
[ -z "$1" ] &amp;&amp; exit 0
. $1
 
if [ "${DEVICE%%[0-9]*}" == "xenbr" ]
then
    for device in $(LANG=C egrep -l "^[[:space:]]*BRIDGE=\"?${DEVICE}\"?" /etc/sysconfig/network-scripts/ifcfg-*) ; do
        /sbin/ifup $device
    done
fi

You can download this scrpit. Don’t forget to change it to be executable. It will call ifup for any parent device of xenbr* device called at. If the parent device is already up, no harm is done. If the parent device is not up, it will be brought up, and then the xenbr device can start normally.

Things to remember…

Monday, October 24th, 2011

As my work takes me to various places (where technology is concerned), I collect lots of browser tab of things I want to keep for later reference.
I have to admit, sadly, that I lack the time to sort them out, to make a real good and nice post about them. I do not want to lose them, however, so I am posting now those which I find or found in the past as more useful to me. I might expand either of them one day into a full post, or elaborate further on them. Either or none. For now – let’s clean up some tab space:
Reading IPMI sensors. Into Cacti, and into Nagios, with some minor modifications by myself (to be disclosed later, I believe):
Cacti
Nagios
This is somewhat info of the plugin check_ipmi_sensor
And its wiki (in German. Use Google for translation)
XenServer checks:
check_xen_pool
Checking XenServer using NRPE
But I did not care about Dom0 performance parameters, as they meant very little regarding the hypervisor’s behavior. So I have combined into it the following XenServer License Check. Unfortunately, I could run it only on the XenServer domain0, due to python version limitations on my Cacti /Nagios server.
You can obtain XenServer SDK
This plugin looks interesting for various XenServer checks, but I have never tried it myself.
Backing up (exporting) XenServer VMs as a scheduled task. I have had it modified extensively to match my requirements, but I am allowed to, it has some of its sources based on my blog :-)
Installing Dell OpenManage on XenServer 5.6.1, and the nice thing is that it works fine on XenServer 6 as well.
Oracle ASM recovery tips . One day I will take it further, and investigate possible human errors and methods of fixing them. Experience, they say, has a value :-)
A guide dealing with changing from raw to block devices in Oracle ASM . This is only a small part of it, but it’s the thing that interests me.
Understanding Steal Time in Linux Xen-based VMs.
Because I always forget, and I’m too lazy to search again and again (and reach the same page again and again): Upgrading PHP to 5.2 on Centos 5
And last – a very nice remote-control software fomr my Android phone. Don’t leave home without it. Seriously.

Reduced to only 23 tabs is excellent. This was a very nice job, and these links will be useful. To me, for sure. I hope that to you as well.

Hot resize Multipath Disk – Linux

Friday, August 19th, 2011

This post is for the users of the great dm-multipath system in Linux, who encounter a major availability problem when attempting a resize of mpath devices (and their partitions), and find themselves scheduling a reboot.

This documented is based on a document created by IBM called "Hot Resize Multipath Storage Volume on Linux with SVC", and its contents are good for any other storage. However - it does not cover the procedure required in case of a partition on the mpath device (for example - mpath1p1 device).

I will demonstrate with only two paths, but, with understanding this process, it can be well used for any amount of paths for a device.

I do not explain how to reduce a LUN size, but the apt viewer will be able to generate a method out of this document. I, for myself, try to avoid as much as I can from shrinking LUNs. I prefer backup, LUN recreation, and then restore. In many case - it's just faster.

So - back to our topic - first - increase the size of your LUN on the storage.

Now, you need to collect the paths used for your mpath device. Check this example:

mpath1 (360a980005033644b424a6276516c4251) dm-2 NETAPP,LUN
[size=200G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=4][active]
\_ 2:0:0:0 sdc 8:32  [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 1:0:0:0 sdb 8:16  [active][ready]

The devices marked in bold are the ones we will need to change. Lets get their current size:

blockdev --getsz /dev/sdb
419430400

Keep this number somewhere safe. We can (and should!) assume that sdc has the same values, otherwise, this is not the same exact path.

Collect this info for the partition as well. It will be smaller by a tiny bit:

blockdev --getsz /dev/sdb1
419424892

Keep this number as well.

Now we need to reread the current (storage-based) size parameters of the devices. We will run

blockdev --rereadpt /dev/sdb
blockdev --rereadpt /dev/sdc

Now, our size will be slightly different:

blockdev --getsz /dev/sdb
734003200

Of course, the partition size will not change. We will deal with it later. Keep the updated values as well. Of course, the multipath still holds the disks with their original size values, so running 'multipath -ll' will not reveal any size change. Not yet.

We now need to create editable dmsetup map. Use the current command to create two files: cur and org containing this map:

dmsetup table mpath1 | tee org cur
0 419424892 multipath 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:32 128 round-robin 0 1 1 8:16 128

Important part - explaining some of these values. The map shows the device's size in blocks - 419424892. It shows some parameters, it shows path groups info (0 2 1), and both sub devices - sdc being 8:32 and sdb being 8:16. Try it with 'ls -la /dev/sdb' to see the minor and major. At this point, if you are not familiar with majors and minors, I would recommend you do some reading about it. Not mandatory, but will make your life here safer.

We need to delete one of the paths, so we can refresh it. I have decided to remove sdb first:

multipathd -k"del path sdb"

Now, running the multipath command, we will get:

mpath1 (360a980005033644b424a6276516c4251) dm-2 NETAPP,LUN
[size=200G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=4][active]
\_ 2:0:0:0 sdc 8:32  [active][ready]

Only one path. Good. We will need to edit the 'cur' file created earlier to reflect the new settings we are to introduce:

0 419424892 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 8:32 128

The only group left was the one containing 'sdc' (8:32), and since one group down, the bold number was changed from 2 to 1 (as there is only a single path group now!)

We need to reload multipath with these settings:

dmsetup suspend mpath1; dmsetup reload mpath1 cur; dmsetup resume mpath1

The correct response for this line is 'ok'. We pause mpath1, reload and then resume it. It is best to be in a single line, as this process freezes IO for a short period of time on the device, and we prefer it to be as short as possible.

Now, as /dev/sdb is not a part of the multipath managed devices, we can modify it. I usually use 'fdisk' - deleting the old partition, and recreating it in the new size, but you must make sure, if your device requires LUN alignment, that you recreated the partition from the same start point. I will dedicate a post some time to LUN alignment, but not at this particular time. Just a hint - if you're not sure, run fdisk in expert mode and get a printout of your partition table (fdisk /dev/sdb and then x and then p). If your partition starts at 128 or 64, it is aligned. If not (usually for large LUNs - at 63), you are not, and you should either be worried about it, but not now, or should not care at all.

Back to our task.

We need to grab the size of the newly created partition, for later use. Write it down somewhere.

blockdev --getsz /dev/sdb1
733993657

Following the partition recreation, we need to introduce the device to the multipath daemon. We do this by:

multipathd -k"add path sdb"

followed by immediately removing the remaining device:

multipathd -k"del path sdc"

We need to have our 'cur' file updated, so we can release the device to our uses. This time, we update both the size section with the new size, and the new, remaining path. Now, the file looks like this:

0 734003200 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 8:16 128

As mentioned before - the large number in bold is the new size of the block device. The amount of failure groups is one (1), also in bold, and the device name is 'sdb' which is 8:16. Save this modified file, and run:

dmsetup suspend mpath1; dmsetup reload mpath1 cur; dmsetup resume mpath1

Running the command 'multipath -ll' you will get the real size of the device.

mpath1 (360a980005033644b424a6276516c4251) dm-2 NETAPP,LUN
[size=350G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
\_ 1:0:0:0 sdb 8:16  [active][ready]

We will need to reread the partition layout of /dev/sdc. The quickest way is by running:

partprobe

This should do it. We can now add it back in:

multipathd -k"add path sdc"

and then run

multipath

(which should result in all the available paths, and the correct size).

Our last task is to update the partition size. The partition, normally, is called mpath1p1, so we need to read its parameters. Lets keep it in a file:

dmsetup table mpath1p1 | tee partorg partcur

We should now edit the newly created file 'partcur' with the new size. You should not change anything else. Originally, it looked like this:

0 419424892 linear 253:2 128

and it was modified to look like this:

0 733993657 linear 253:2 128

Notice that the size (in bold) is the one obtained from /dev/sdb1 (!!!) and not /dev/sdb.

We need to reload the device. Again - attempt to do it in a single line, or else you will freeze IO for a long time (which might cause things to crush):

dmsetup suspend mpath1p1; dmsetup reload mpath1p1 partcur; dmsetup resume mpath1p1

Do not mistaked mpath1 with mpath1p1.

Our device is updated, our paths are online. We are happy. All left to do is to online resize the file system. With ext3, this is done like this:

resize2fs /dev/mapper/mpath1p1

The mount will increase in size online, and all left for us is to wait for it to complete, and then go home.

I hope this helps. It helped me.

Stateless Systems (diskless, boot from net) on RHEL5/Centos5

Thursday, July 21st, 2011

I have encountered several methods of doing stateless RedHat Linux systems. Some of them are over-sophisticated, and it doesn’t work. Some of them are too old, and you either have to fix half the scripts, or give up (which I did, BTW), and after long period of attempts, I have found my simple-yet-working-well-enough solution. It goes like that (drums please!)

yum install system-config-netboot

Doesn’t it sound funny? So simple, and yet – so working. So I have decided to supply you with the ultimate, simple, working guide to this method – how to make it work – fast and easy.

You will need the following:

  • RHEL5/Centos5 with ability to run yum client, and enough disk space to contain an entire system image, and more. Lots more. About it later. This machine will be called “server” in this guide.
  • A single “Golden Image” system – the base of your system-to-replicate. A system you will, with some (minor) modifications, use as the source of all your future stateless systems. Usually – resides on another machine (physical or virtual, doesn’t matter much. More about it later). This machine will be called “GI” in this guide.
  • A test system. Better be diskless, for assurance. Better also be able to boot from net, otherwise, we miss something here (although hybrid boot methods are possible, I will not discuss them here, for the time being). Can be virtual, as long as it is full hardware virtualization, as you cannot net-boot, except for the latest Xen Community versions, in PV mode. It will be called “net-client” or “client” in this guide.

Our flow:

  • Install required server-side tools
  • Setup server configuration parameters
  • Image the GI into the server
  • Run this and that
  • Boot our net-client happily

Let’s start!

On the server, run:

yum install -y system-config-netboot xorg-x11-xinit dhcp

and then, run:

chkconfig dhcpd on
chkconfig tftp on
chkconfig xinetd on
chkconfig nfs on

We will now perform configurations for the above mentioned services.

Your /etc/dhcpd.conf should look like this:

ddns-update-style interim;
ignore client-updates;

subnet $NETWORK netmask $NETMASK {
option routers              $GATEWAY;
option subnet-mask          $NETMASK;
option domain-name-servers  $DNS;
ignore unknown-clients;
filename “linux-install/pxelinux.0″;
next-server $SERVERIP;
}

You should replace all these variables with the ones matching your own layout. In my case

NETWORK=10.0.0.0
NETMASK=255.0.0.0
GATEWAY=10.200.1.1
DNS=10.100.1.4
SERVERIP=10.100.1.8

We will include hosts later. Notice that this DHCP server will refuse to serve unknown hosts. This will prevent the “Oops” factor…

We can start our DHCPD now. It won’t do anything, and it’s a good opportunity to test our configuration:

service dhcpd start

We need to create the directory structure. I have selected /export/NFSroots as my base structure. I will follow this structure here:

mkdir -p /export/NFSroots

We will proceed with the NFS server settings later.

Imaging the GI to the server is quite simple. We will begin by creating a directory in /export/NFSroots with the name of the imaged system type. In my case:

mkdir /export/NFSroots/centos5.6

However (and this is the tricky part!), we will create a directory under this location, called ‘root’. We will image the entire contents of our GI to this root directory. This is how this mechanism works, and I have no desire of bending it. So

mkdir /export/NFSroots/centos5.6/root

Now we copy the contents of the GI to this directory. There are several methods, but in this particular case, I have chosen to use ‘rsync’ over ‘ssh’. There are other methods just as good. Just one note – on the GI, before this copy, we will need to have busybox-anaconda package. So make sure you have it:

yum install -y busybox-anaconda

Now we can create an image from the GI:

rsync -e ssh -av –exclude ‘/proc/*’ –exclude ‘/sys/*’ –exclude ‘/dev/shm/*’ $GI_IP:/ /export/NFSroots/centos5.6/root

You must include as many exclude entries as required. You should never cause the system to attempt to grab the company’s huge NFS share, just because you have forgotten some ‘exclude’ entry. This will cause major loss of time, and possibly – some outage to some network resource. Be careful!

This is the best time to grub a cup of coffee/tee/cocoa/coke/whatever, and chit-chat with your friends. I have had about 10 minutes for 1.8GB image, via 1Gb/s network link, so you can do some math and guesswork, and probably – go grab launch.

When this operation is complete, your next mission is to configure your NFS server. Order does matter. You should create a read-only entry for the image root directory, and a full-access entry for the above structure. Well – I can probably narrow it down, but I did not really bother. During pre-production, I will test how to narrow permissions down, without getting myself into management hell. So – our entries are looking like this in /etc/exports :

/export/NFSroots/centos5.6/root *(ro,no_root_squash)
/export/NFSroots *(rw,no_root_squash,async)

I would love to hear comments by people who did narrow security down to the required minimum, and how they managed to handle tenths and hundreds of clients, or several dozens of RO images without much hassle. Really. Comment, people, if you have something to add. I would love to modify this document accordingly.

Now we need to start our NFS server:

service nfs start

We are ready for the big entry. You will need GUI here. We have installed xorg-x11-xauth which will allow us, if invoked, remote X. I use X over SSH, so it’s a breeze. Run the command:

system-config-netboot

I will not bother with screenshots, as this is not my style, but the following entries will be required:

  • First Time Druid: Diskless
  • Name: Easy identified. In my case: Centos5.6
  • Server IP address: $SERVERIP
  • Directory: /export/NFSroots/centos5.6
  • Kernel: Select the desired kernel
  • Apply

The system might take several minutes to complete. When done, you will be presented with a blank table. We need to populate it now.

For that to be easy, we better have all our clients in /etc/hosts file. While DNS does help, a lot, this specific tool, for it to work like charm, requires /etc/hosts to have the entry, or else, it just won’t be very readable. Below is a one-liner script to create a set of entries in /etc/hosts. I have started with .100 and completed with .120. This can be changed easily to match your requirements:

for i in {100..120} ; do echo “10.100.1.$i   pxe${i}” >> /etc/hosts ; done

This way we can refer to our clients as pxe100, pxe101, and so on.

Let’s create a new client!

Select “new” from the GUI menu, and fill in the name ‘pxe100′. If you have multiple images, this is a good time to select which one you will be using. If you have followed this guide to the letter, you have only a single option. Press on “OK” and you’re done.

We need to setup MAC and IP addresses into the DHCP server. I have written a script which assists with this greatly. For this script to work, you will need to run the following commands:

echo “” > /etc/dhcpd.conf.bootnet
echo ‘include “/etc/dhcpd.d/dhcpd.conf.bootnet”;’ >> /etc/dhcpd.conf

#!/bin/bash
# Creates and removes netboot nodes
 
# Check arguments
 
# VARIABLES
DHCPD_CONF=/etc/dhcpd.conf.bootnet
SERVICE=dhcpd
DATE=`date +"%H%M%S%d%m%y"`
BACKUPS=${DHCPD_CONF}.backups
 
function show_help() {
	echo "Usage: $0"
	echo "add NAME MAC IP"
	echo "where NAME is the name of the node"
	echo "MAC is the hardware address of the netboot interface (usually eth0)"
	echo "IP is the designated IP address of the node"
	echo
	echo
	echo "del ARGUMENT"
	echo "Where ARGUMENT can be either name, MAC address or IP address"
	echo "The script will attempt to remove whatever match it finds, so be careful"
	echo
	echo
	echo "check ARGUMENT"
	echo "Where ARGUMENT can be either name, MAC address or IP address"
        echo "The script will list any matching entries. This is a recommended"
	echo "action prior to removing a node"
	echo "The same logic used for finding entries to remove is used here"
	echo "so it is rather safe to run 'del' after a successful 'check' operation"
	echo
	exit 0
}
 
function check_inside() {
	# Will be used to either show or find the amount of matching entries"
	# Arguments: 	$1 - silent - 0 ; loud - 1
	# 		$2 - the string to search
	# The function will search the string as a stand-alone word only
	# so it will not find abc in abcde, but only in "abc"
 
	case "$1" in
		0) 	# silent
			RET=`grep -iwc $2 $DHCPD_CONF`
			;;
		1) 	# loud
			grep -iw $2 $DHCPD_CONF
			;;
 
		*)	echo "Something is very wrong with $0"
			echo "Exiting now"
			exit 2
			;;
	esac
 
	return $RET
}
 
function add_to() {
	# This function will add to the conf file
	# Arguments: $1 host ; $2 MAC ; $3 IP address
	echo "host $1 { hardware ethernet $2; fixed-address $3; }" >> $DHCPD_CONF
	[ "$?" -ne "0" ] && echo "Something wrong happened when attempted to add entry" && exit 3
}
 
function del_from() {
	# This function will delete a matching entry from the conf file
	# Arguments: $1 - expression
	[ -z "$1" ] && echo "Missing argument to del function" && exit 4
	if cat $DHCPD_CONF | grep -vw $1 > $DHCPD_CONF.$$
	then
		\mv $DHCPD_CONF.$$ $DHCPD_CONF
	else
		echo "Failed to create temp DHCPD file. Aborting"
		exit 5
	fi
}
 
function backup_conf() {
	cp $DHCPD_CONF $BACKUPS/`basename $DHCPD_CONF.$DATE`
}
 
function restore_conf() {
	\cp $BACKUPS/`basename $DHCPD_CONF.$DATE` $DHCPD_CONF
}
 
function check_wrapper() {
	# Perform check. Loud one
	echo "Searching for $REGEXP in configuration"
	check_inside 1 $REGEXP
	exit 0
}
 
function del_wrapper() {
	# Performs delete
	[ -z "$REGEXP" ] && echo "Missing argument for delete action" && exit 6
	backup_conf
	echo "Removing all invocations which include $REGEXP from config"
	del_from $REGEXP
 
	if /sbin/service $SERVICE restart
        then
                echo "Done"
        else
                restore_conf
                /sbin/service $SERVICE restart
                echo "Failed to update. Syntax error"
        fi
}
 
function add_wrapper() {
	# Adds to config file
	[ -z "$NAME" -o -z "$MAC" -o -z "$IP" ] && echo "Missing argument for add action" && exit 7
 
	for i in $NAME $MAC $IP
	do
		if check_inside 0 $i
		then
			echo -n .
		else
			echo "Value $i already exists"
			echo "Will not update duplicate value"
			exit 7
		fi
	done
	echo
 
	backup_conf
	add_to $NAME $MAC $IP
 
	if /sbin/service $SERVICE restart
	then
		echo "Done"
	else
		restore_conf
		/sbin/service $SERVICE restart
		echo "Failed to update. Syntax error"
	fi
}
 
function prep() {
	# Make sure everything is as expected
	[ ! -d ${BACKUPS} ] && mkdir ${BACKUPS}
}
 
prep
 
case "$1" in
	add)	NAME="$2"
		MAC="$3"
		IP="$4"
		add_wrapper
		;;
	check)	REGEXP="$2"
		check_wrapper
		;;
	del)	REGEXP="$2"
		del_wrapper
		;;
	help)	show_help
		;;
	*)	echo "Usage: $0 [add|del|check|help]"
		exit 1
		;;
esac

This script will update your /etc/dhcpd.conf.bootnet with new nodes. Place it somewhere in your path (for example: /usr/local/bin/ )

We will need the MAC address of a node. Run

net-node.sh add pxe100 00:16:3E:00:82:7A 10.100.1.100

This will add the node pxe100 with that specific MAC address to the DHCPD configuration.

Now, all you need to do is boot your client, and see how it works. Remember to disable other DHCP servers which might serve this node, or blacklist its MAC address from them.

Our next chapters will deal with my (future) attempt to make RHEL6 work under this setup (as a GI, and client, not as a server), and all kind of mass-deployment methods. If and when :-)

Oracle VM post-install check list

Saturday, May 22nd, 2010

Following my experience with OracleVM, I am adding my post-install steps for your pleasure. These steps are not mandatory, by design, but will help you get up and running faster and easier. These steps are relevant to Oracle VM 2.2, but might work for older (and newer) versions as well.

Define bonding

You should read more about it in my past post.

Define storage multipathing

You can read about it here.

Define NTP

Define NTP servers for your Oracle VM host. Make sure the daemon ‘ntpd’ is running, and following an initial time update, via

ntpdate -u <server>

to set the clock right initially, perform a sync to the hardware clock, for good measures

hwclock –systohc

Make sure NTPD starts on boot:

chkconfig ntpd on

Install Linux VM

If the system is going to be stand-alone, you might like to run your VM Manager on it (we will deal with issues of it later). To do so, you will need to install your own Linux machine, since Oracle supplied image fails (or at least – failed for me!) for no apparent reason (kernel panic, to be exact, on a fully MD5 checked image). You could perform this action from the command line by running the command

virt-install -n linux_machine -r 1024 -p –nographics -l nfs://iso_server:/mount

This directive installs a VM called “linux_machine” from nfs iso_server:/mount, with 1GB RAM. You will be asked about where to place the VM disk, and you should place it in /OVS/running_pool/linux_machine , accordingly.

It assumes you have DHCP available for the install procedure, etc.

Install Oracle VM Manager on the virtual Linux machine

This should be performed if you select to manage your VMs from a VM. This is a bit tricky, as you are recommended not to do so if you designing HA-enabled server pool.

Define autostart to all your VMs

Or, at least, those you want to auto start. Create a link from /OVS/running_pool/<VM_NAME>/vm.cfg to /etc/xen/auto/

The order in which ‘ls’ command will see them in /etc/xen/auto/ is the order in which they will be called.

Disable or relocate auto-suspending

Auto-suspend is cool, but your default Oracle VM installation has shortage of space under /var/lib/xen/save/ directory, where persistent memory dumps are kept.  On a 16GB RAM system, this can get pretty high, which is far more than your space can contain.

Either increase the size (mount something else there, I assume), or edit /etc/sysconfig/xendomains and comment the line  with the directive XENDOMAINS_SAVE= . You could also change the desired path to somewhere you have enough space on.

Hashing this directive will force regular shutdown to your VMs following a power off/reboot command to the Oracle VM.

Make sure auto-start VMs actually start

This is an annoying bug. For auto-start of VMs, you need /OVS up and available. Since it’s OCFS2 file system, it takes a short while (being performed by ovs-agent).

Since ovs-agent takes a while, we need to implement a startup script after it and before xendomains. Since both are markes “S99″ (check /etc/rc3.d/ for details), we would add a script called “sleep”.

The script should be placed in /etc/init.d/

#!/bin/bash
#
# sleep     Workaround Oracle VM delay issues
#
# chkconfig: 2345 99 99
# description: Adds a predefined delay to the initialization process
#
 
DELAY=60
 
case "$1" in
start) sleep $DELAY
;;
esac
exit 0

Place the script as a file called “sleep” (omit the suffix I added in this post), set it to be executable, and then run

chkconfig –add sleep

This will solve VM startup problems.

Fix /etc/hosts file

If you are into multi-server pool, you will need that the host name would not be defined to 127.0.0.1 address. By default, Oracle VM defines it to match 127.0.0.1, which will result in a poor attempt to create multi-server pool.

This is all I have had in mind for now. It should solve most new-comer issues with Oracle VM, and allow you to make good use of it. It’s a nice system, albeit it’s ugly management.

Update the OracleVM

You could use Oracle’s unbreakable network, if you are a paying customer, or you could use the Public Yum Server for your system.

Updates to Oracle VM Manager

If you won’t use Oracle Grid Control (Enterprise Manager) to manage the pool, you will probably use Oracle VM Manager. You would need to update the ovs-console package, and you will probably want to add tightvnc-java package, so that IE users will be able to use the web-based VNC services. You would better grub these packages from here.

NetApp SnapMirror monitor script

Sunday, December 13th, 2009

I have had some work done lately with NetApp SnapMirror. I have snapped-mirrored some volumes and qtrees and I wanted to monitor their use and behavior over the line.

As you can expect, site-to-site replication of data is a fragile thing, especially when done on the level of the storage device, which is agnostic to the data kept on it. When replicating volumes, I should expect the relevant employees to be responsible regarding what’s placed there, because the storage does not filter out the junk. If someone had decided to add a new DVD image on the DB storage space, well – the DB won’t care, as long as there is enough free space, but the storage will attempt to replicate the added data to the alternate site, which means that if you are around your bandwidth limits, which is never a good thing, you will just create a delay gap you would hardly (if at all) be able to close.

For that, and since I don’t tend to trust people not to do stupid things, I have written this script.

What does it do?

This script will perform the following:

Alerting about non-idle SnapMirror session

Use with ‘-m alert’

Assuming SnapMirror is scheduled to a specific time, the script will alert if a session is active. With the flag ‘-a no’, it will not send an e-mail (if possible, see the configuration section below). With ‘-r yes’, it will react, setting throttle for each non-idle session, but then ‘-t VALUE’ should be specified, where VALUE is the numeric throttle in KB/s.

Limiting throttle to a SnapMirror session

Use with ‘-m throttle_limit’

The script will set a throttle for SnapMirror session(s). Setting limit by the flag ‘-t VALUE’, where VALUE is the numeric throttle in KB/s per each session.

Cancelling throttle limit

Use with ‘-m throttle_unlimit’

The script will set unlimited throttle for SnapMirror session(s).

Checking SnapMirror lag

Use with the ‘-m check_lag’

Since replication has a purpose of recovering, the lag of each SnapMirror session would show how far back we are. Use with ‘-d VALUE’, VALUE being numeric time in minutes to set alert threshold. The default threshold delay is one day (1440 minutes).

Checking snapshots size

Use with the ‘-m check_size’

This reports the expected delta to transfer. This can help estimate the success or failure of a future sync of data (snapmirror update) before it begins. Use with ‘-l’ flag to set it to log date/time of measure and the expected sizes into a file. By default, in /tmp/target_name.txt, where the target is the SnapMirror target.

General Options

Use with ‘-c filename’ for alternate configuration file.

Use with ‘-h’ to get general help.

Use with a list target names in the format of storage:/vol/volname/qtree or storage:volname to ignore targets in configuration file and use your own.

Configuration File

The configuration file is rather simple. By default it should be called “/etc/snapmirror_monitor.conf“. It consists of two main variables for the system:

TGTS=”storage2:/vol/volname/qtree

storage3:volname2

storage1:/vol/volnew/qtr2″

EMAIL=”user@domain.com another_user@domain.com”

Prerequisites

This script will run on any modern Linux machine. For it to communicate with the NetApp devices, you will need SSH enabled on the NetApps, and ssh key exchange so that the Linux would be able to access the NetApp without using passwords.

The Script

Below is the script. You can download it and use it as you like.

#!/bin/bash
# This script will monitor snapmirror status
# Assumption: Access through ssh to root on all storage devices involved
# This will also attempt to detect the diff which is to sync
 
# Written by Ez-Aton. Check http://run.tournament.org.il for updates or
# additional information
 
# Modes: 
# alert -> alert if snapmirror is still active
# throttle_limit -> Limit throttle to a given number (default or manually set)
# throttle_unlimit -> Open throttle limitation
# check_lag -> Report the snapmirror lage
# check_size -> Report the estimated data size to move
 
# Global variables
CONF=/etc/snapmirror_monitor.conf
LOG_PREFIX=/tmp
 
test_connection () {
        # Test to see that you can access the storage device
        # Arguments: NetApp name
        SSH_OPTS="-o ConnectTimeout=2"
        if ! ssh $SSH_OPTS $1 hostname &>/dev/null
        then
                echo "Cannot communicate via SSH to $1"
                exit 1
        fi
}
 
abort () {
        # Exit with a predefined error message
        echo $*
        exit 1
}
 
get_arguments () {
        # Get all arguments and define options
        # Argument: $@
        [ -z "$1" ] && set -- -h
        while [ -n "$1" ]
        do
                case "$1" in
                        -m)     shift
                                case "$1" in
                                        alert|throttle_limit|throttle_unlimit|check_lag|check_size)     MODE=$1
                                        ;;
                                        *)      abort "Mode is mandatory. Use -h flag to get list of avialable flags"
                                        ;;
                                esac
                                ;;
                        -a)     shift
                                case "$1" in
                                        [nN][oO])       NOMAIL=1
                                                        ;;
                                        *)              NOMAIL=0
                                                        ;;
                                esac
                                ;;
                        -r)     shift
                                case "$1" in
                                        [yY][eE][sS])   REACT=1
                                                        ;;
                                        *)              REACT=0
                                                        ;;
                                esac
                                ;;
                        -d)     shift
                                declare -i DELAY_TMP
                                DELAY_TMP=$1
                                [ "$DELAY_TMP" != "$1" ] && abort "Delay needs to be a number in minutes"
                                DELAY=$DELAY_TMP
                                ;;
                        -t)     shift
                                declare -i THROTTLE_TMP
                                THROTTLE_TMP=$1
                                [ "$THROTTLE_TMP" != "$1" ] && abort "Throttle needs to be a number"
                                THROTTLE=$THROTTLE_TMP
                                ;;
                        -c)     shift
                                [ -f "$1" ] || abort "Cannot find specified conf file"
                                CONF="$1"
                                ;;
                        -l)     LOG=1
                                ;;
                        -h)     echo "Usage: $0 -m [alert|throttle_limit|throttle_unlimit|check_lag|check_size] (-c CONF_FILE) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Alert if SnapMirror is still running: $0 -m alert [-a no] (-r yes) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Alert and throttle (react): $0 -m alert [-a no] -r yes -t [throttle_in_kb] [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Throttle a running SnapMirror: $0 -m throttle_limit -t throttle_in_kb [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "Unlimit SnapMirror throttle: $0 -m throttle_unlimit [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "To check lag: $0 -m check_lag -d delay_in_minutes (-a no) [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                echo "To check delta: $0 -m check_size [tgt_filer:volume tgt_filer:/vol/vol/qtree]"
                                exit 0
                                ;;
                        *)      [ -z "$MODE" ] && abort "$0 mode required"
                                TGTS="$*"
                                ;;
                esac
                shift
        done
}
 
notify () {
        # Send an e-mail notification
        # Arguments: $@ - the subject
        # Contents are empty
        # And yes - one e-mail per event
        mail -s "$@" $EMAIL < /dev/null
}
 
idle () {
        # Check if transaction is idle
        # Arguments: Target name (example: storage:/vol/volname/qtree
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        ssh $NETAPP snapmirror status $1 | tail -1 | grep Idle$ &>/dev/null #Checks if the snapmirror is idle. If so, return true
        return $?
}
 
set_throttle () {
        # Sets throttle for target
        # Arguments: $1 Target name (example: storage:/vol/volname/qtree)
        # Arguments: $2 throttle value (number)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        ssh $NETAPP snapmirror throttle $2 $1
}
 
get_lag () {
        # Gets the lag of snapmirror relationship in minutes
        # Arguments: Target name (example: storage:/vol/volname/qtree)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        LAG=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $4}'`
        # LAG is in hh:mm:ss. We need to transfer it to minutes only
        H=`echo $LAG | cut -f 1 -d :`
        M=`echo $LAG | cut -f 2 -d :`
        let M=$M+$H*60
        echo $M
}
 
check_size () {
        # Checks the size of the snapshot to copy (diff)
        # Arguments: Target name (example: storage:/vol/volname/qtree)
 
        # Get the storage name out
        NETAPP=${1%%:*}
        test_connection $NETAPP #Verify this netapp is accessible
        # Get source storage name and path
        SRC=`ssh $NETAPP snapmirror status $1 | tail -1 | awk '{print $1}'`
        # Get the source filer and vol name from that
        NETAPP=${SRC%%:*}
        SPATH=${SRC##*:}
        SPATH=`echo $SPATH | sed s/'\/vol\/'//`
        SPATH=${SPATH%%/*}
 
        test_connection $NETAPP # Verify the target NetApp is accessible
        SNAP=`ssh $NETAPP snap list -n $SPATH | grep snapmirror | tail -1 | awk '{print $4}'`
        DELTA=`ssh $NETAPP snap delta $SPATH $SNAP | tail -2 | head -1 | awk '{print $5}'`
        echo "Snap delta for $1 is $DELTA KB"  
        LOG_TARGET=`echo $1 | tr / _`.txt
        [ -n "$LOG" ] && echo "`date` $DELTA" >> $LOG_PREFIX/$LOG_TARGET
}
 
 
### MAIN ###
get_arguments $@
. $CONF &>/dev/null
# if e-mail is not set, don't try to send
[ -z "$EMAIL" ] && NOMAIL=1
 
[ -z "$TGTS" ] && abort "You need at least one snapmirror target"
 
case $MODE in
        alert)  if [ "$REACT" == "1" ]
                then
                        [ -z "$THROTTLE" ] && abort "When setting 'react' flag, you must specify throttle"
                fi
                for i in $TGTS
                do
                        if ! idle $i
                        then
                                echo -n "$i is not idle. "
                                [ "$NOMAIL" != "1" ] && notify "$i is not idle"
                                if [ "$REACT" == "1" ]
                                then
                                        echo -n "We are set to react. Limiting throttle"
                                        set_throttle $i $THROTTLE
                                fi
                                echo
                        fi
                done
                ;;
        throttle_limit) [ -z "$THROTTLE" ] && abort "Throttle requires throttle value"
                        for i in $TGTS
                        do
                                echo "Setting throttle for $i to $THROTTLE"
                                set_throttle $i $THROTTLE
                        done
                        ;;
        throttle_unlimit)       for i in $TGTS
                                do
                                        echo "Setting throttle for $i to unlimited"
                                        set_throttle $i 0
                                done
                        ;;
        check_lag)      [ -z "$DELAY" ] && DELAY=1440
                        for i in $TGTS
                        do
                                LAG=`get_lag $i`
                                if [ "$LAG" -gt "$DELAY" ]
                                then
                                        echo "Failure: The delay for $i is $LAG minutes"
                                        [ "$NOMAIL" != "1" ] && notify "$i is lagged $LAG minutes, above the threshold $DELAY"
                                else
                                        echo "Normal: The delay for $i is $LAG minutes"
                                fi
                        done
                        ;;
        check_size)     for i in $TGTS
                        do
                                check_size $i
                        done
                        ;;
        *)      echo "Option $MODE is not implemented yet"
                exit 0
                ;;
esac

Rapid-guide – Updating RedHat initrd

Saturday, December 5th, 2009

Warning: This is not the recommended method if you’re not sure you know what you’re doing.

Linux Initial Ram Disk (initrd) is a mechanism to perform disk-independent actions before attempting to mount the ‘/’ disk. These actions usually include loading disk drivers, setting up LVM or software RAID, etc.

The reason these actions are performed within initrd is that it is all based on Ram Disk loaded by the boot loader, and thus it breaks the loop of “how would I load storage drivers without storage access?”

It happens that due to some special even we need to modify it manually. To do so we need first to open it, and then to close it back in, replacing (backup the old one, will you?) the previous one.

This is rather simple. The tools used by us will be ‘gzip’ and ‘cpio’.

Lets begin.

First – create a temporary directory:

mkdir /tmp/initrd

Extracting

We have our temporary directory, so now, we need to extract the initrd into it. I assume the name of the file is /boot/initrd.img. You should replace my line with whatever the name of your initrd file:

cd /tmp/initrd

cat /boot/initrd.img | gzip -dc |  cpio -id

This will extract the contents of the initrd into /tmp/initrd.

Now you can edit its contents directly.

Package

To package initrd back in, we will need to perform the following actions.

Warning – before you do it, make sure you have an available copy of your original initrd file, in case you have created some damage.

cd /tmp/initrd

find . | cpio -o -H newc | gzip -9 > /boot/initrd.img

This line packages the initrd, and replaces the old one.

That’s all for today :-)

Quickly install Xen Community Linux VM

Saturday, December 5th, 2009

On RHEL-type of systems, with virt-manager (libvirt), you can make use of virt-manager to easy your life. I, for myself, prefer to work with ‘xm‘ tools, but for the initial install, virt-manager is the quickest and most simple available tool.

To install a new Linux VM, all you need to follow this flow

Create an LV for your VM (I use LVs because it’s easier to manage). If not LV, use a file. To create an LV, run the following command

lvcreate -L 10G -n new_vm1 VolGroup00

I assume that the name you wish to grant is ‘new_vm1′ (better maintain order there, else you will find yourself with hundreds of small LVs you have no idea what to do with), and that the name of the volume group is ‘VolGroup00′. Change to different values to match your environment.

Next, make sure you have your ISO contents unpacked (you can use loop device) and exported via NFS (my favorite method).

To mount a CD/DVD ISO, you should use ‘mount’ command with the ‘loop’ options. This would look like this:

mount -o loop my_iso.iso /mnt/temp

Again, I assume the name of the ISO is my_iso.iso and that the target directory /mnt/temp is available.

Now, export your newly created directory. If you have NFS already running, you can either add to /etc/exports the newly mounted directory /mnt/temp and restart the ‘nfs’ service, or you could use ‘exportfs’ to add it:

exportfs -o no_root_squash *:/mnt/temp

would probably do the trick. I added ‘no_root_squash’ to make sure no permission/access problems present themselves during the installation phase. Test your export to verify it’s working.

Now you could begin your installation. Run the following command:

virt-install -n new_vm1 -r 512 -p -f /dev/VolGroup00/new_vm1 –nographics nfs://nfs_server:/mnt/temp

The name follows the ‘-n’ flag. The amount of RAM to give is 512MB. The -p means it’s paravirtualized. The -f shows which device will be the block device, and the last argument is the source of the installation. Do not use local files, as the VM installer should be able to access the installation source.

Following that, you should have a very nice TUI installation experience.

Now – let’s make this machine ‘xm’ compatible.

Currently, the VM is virt-manager compatible. It means you need virt-manager to start/stop it correctly. Since I prefer ‘xm’ commands, I will show you how to convert this machine to VM.

First – export its XML file:

virsh dumpxml new_vm1 > /tmp/new_vm1.xml

virsh domxml-to-native xen-xm /tmp/new_vm1.xml > /etc/xen/new_vm1

This should do the trick.

Now you can turn the newly created VM off, and remove the VM from virt-manager using

virsh undefine new_vm1

and you’re back to ‘xm’-only interface.