XenServer 6.0 with DRBD

DRBD is a low-cost shared-SAN-like solution, which has several great benefits, among which are no single point of failure, and very low cost (local storage and network cable). Its main disadvantages are in the need to constantly monitor it, and make sure it does what’s expected. Also – in some cases – performance might be affected greatly.

If you need XenServer pool with VMs XenMotion (used to call it LiveMigration. I liked it better then…), but you cannot afford or do not want classic shared storage acting a single point of failure, DRBD could be for you. You have to understand the limitations, however.

The most important limitation is with data consistency. If you aim at using it as Active/Active, as I have, you need to make sure that under any circumstance you will not have split brain, as it will mean losing data (you will recover to an older point in time). If you aim at Active/Passive, or all your VMs will run on a single host, then the danger is lower, however – for A/A, and VMs spread across both hosts – the danger is imminent, and you should be aware of it.

This does not mean that you will have to run crying in case of split brain. It means you might be required to export/import VMs to maintain consistent data, and that you will have a very long downtime. Kinda defies the purpose of XenMotion and all…

Using the DRBD guid here, you will find an excellent solution, but not a complete one. I will describe my additions to this document.

So, first, you need to download the DRBD packages. I have re-packaged them, as they did not match XenServer with XS60E003 update. You can grub this particular tar.gz here: drbd-8.3.12-xenserver6.0-xs003.tar.gz . I did not use DRBD 8.4.1, as it has shown great instability and liked getting split-brained all the time. Don’t want it with our system, do we?

Make sure you have defined the private link between your hosts, both as a network interface, as described, and in both servers’ /etc/hosts file. It will be easier later. Verify that the host hostname matches the configuration file, else DRBD will not start.

Next, follow the mentioned guide.

Unlike this guide, I did not define DRBD to be Active/Active in the configuration file. I have noticed that upon reboot of the pool master (and always it), probably due to timing issues, as the XE Toolstack did not release the DRBD device, it would have started in split-brain mode, and I was incapable of handling it correctly. No matter when I have tried to set the service to start, as early as possible, it would have always start in split-brain mode.

The workaround was to let it start in passive mode, and while being read-only device, XE Toolstack cannot use it. Then I wait (in /etc/rc.local) for it to complete sync, and connect the PBD.

You will need each host PBD for this specific SR.

You can do it by running:

1
2
3
4
for i in `xe host-list --minimal` ; do \
echo -n "host `xe host-param-get param-name=hostname uuid=$i`  "
echo "PBD `xe pbd-list sr-uuid=$(xe  sr-list name-label=drbd-sr1 --minimal) --minimal`"
done

This will result in a line per host with the DRBD PBD uuid. Replace drbd-sr1 with your actual DRBD SR name.

You will require this info later.

My drbd.conf file looks like this:

?Download drbd.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example
 
#include "drbd.d/global_common.conf";
#include "drbd.d/*.res";
 
resource drbd-sr1 {
protocol C;
startup {
degr-wfc-timeout 120; # 2 minutes.
outdated-wfc-timeout 2; # 2 seconds.
#become-primary-on both;
}
 
handlers {
    split-brain "/usr/lib/drbd/notify.sh root";
}
 
disk {
max-bio-bvecs 1;
no-md-flushes;
no-disk-flushes;
no-disk-barrier;
}
 
net {
allow-two-primaries;
cram-hmac-alg "sha1";
shared-secret "Secr3T";
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-1pri consensus;
after-sb-2pri disconnect;
#after-sb-2pri call-pri-lost-after-sb;
max-buffers 8192;
max-epoch-size 8192;
sndbuf-size 1024k;
}
 
syncer {
rate 1G;
al-extents 2099;
}
 
on xenserver1 {
device /dev/drbd1;
disk /dev/sda3;
address 10.1.1.1:7789;
meta-disk internal;
}
on xenserver2 {
device /dev/drbd1;
disk /dev/sda3;
address 10.1.1.2:7789;
meta-disk internal;
}
}

I did not force them both to become primary, as split-brain handling in A/A mode is very complex. I have forced them to start as secondary.

Then, in /etc/rc.local, I have added the following lines:

?Download rc.local
1
2
3
4
5
6
7
echo 1 > /sys/devices/system/cpu/cpu1/online
while grep sync /proc/drbd > /dev/null 2>&1
do
        sleep 5
done
/sbin/drbdadm primary all
/opt/xensource/bin/xe pbd-plug uuid=dfb02709-2483-a11a-cb0e-eac0fb05d8e2

This performs the following:

  • Add an additional core to Domain 0, to reduce chances of CPU overload with DRBD
  • Waits for any sync to complete (if DRBD failed, it will continue, but you will have a split brain, or no DRBD at all)
  • Brings the DRBD device to primary mode. I have had only one DRBD device, but this can be performed selectively for each device
  • Reconnects the PBD which, till this point in the boot sequence, was disconnected. An important note - replace the uuid with the one discovered above for each host - each host should unplug its own PBD.

To sum it up - until sync has been completed, the PBD will not be plugged, and until then, no VMs can run on this SR. Split brain handling for A/P configuration is so much easier.

Some additional notes:

  • I have failed horribly when the interconnect cable was down. I did not implement hardware fencing mechanisms, but it would probably be a very good practice for production systems. Disconnecting the cross cable will result in a split brain.
  • For this system to be worthy, it has to have external monitoring. DRBD must be monitored at all times.
  • Practice and document cases of single node failure, both nodes failure, host master failure, etc. Make sure you know how to react before it happens in real-life.
  • Performance was measured on a Linux RHEL6 VM to be about 82MB/s. The hardware it was tested on was Dell PE R610 with a very nice RAID5 array, etc. When the 2nd host was down, performance resulted in abour 450MB/s, so the bandwidth, in this particular case, matters.
  • Performance test was done using the command:
    dd if=/dev/zero bs=1M of=/tmp/test_file.dd oflag=direct
    Without the oflag=direct, the system will overload the disk write cache of the OS, and not the disk itself (at least - not immediately).
  • I did not test random-access performance.
Hope it helps

Getting rid of some junk…

It wasn’t junk once…

This stuff used to be useful, but it has been laying around for a long while now, doing nothing but collect some dust. I have ‘donated’ some stuff to someone who actually wanted it, but neither him nor me can find a use, nowadays, to a monochrome display card with an ISA 8bit connector, or about 20 56k ISA and PCI modems, or all kind of other old, and once useful junk. So I have let go, and the picture you see is of most of the stuff I have taken to the electronics recycle facility nearby. That was a hack of a work.

Remaining: Some desktops (maybe I will throw away some more, donno yet), Sun Ultra 80 (Since you can never get a SPARC platform when you need it), an IBM FastT200 (one day I will do something with it. Donate it to a better cause than to the junk yard), and, well, that’s all actually.

BTW – not in the picture – two IBM servers and two Dell servers, and many hard drives, which were given to the helpful man who assisted me with carrying this all to the recycle center.

 

 

 

 

 

 

Things to remember…

As my work takes me to various places (where technology is concerned), I collect lots of browser tab of things I want to keep for later reference.
I have to admit, sadly, that I lack the time to sort them out, to make a real good and nice post about them. I do not want to lose them, however, so I am posting now those which I find or found in the past as more useful to me. I might expand either of them one day into a full post, or elaborate further on them. Either or none. For now – let’s clean up some tab space:
Reading IPMI sensors. Into Cacti, and into Nagios, with some minor modifications by myself (to be disclosed later, I believe):
Cacti
Nagios
This is somewhat info of the plugin check_ipmi_sensor
And its wiki (in German. Use Google for translation)
XenServer checks:
check_xen_pool
Checking XenServer using NRPE
But I did not care about Dom0 performance parameters, as they meant very little regarding the hypervisor’s behavior. So I have combined into it the following XenServer License Check. Unfortunately, I could run it only on the XenServer domain0, due to python version limitations on my Cacti /Nagios server.
You can obtain XenServer SDK
This plugin looks interesting for various XenServer checks, but I have never tried it myself.
Backing up (exporting) XenServer VMs as a scheduled task. I have had it modified extensively to match my requirements, but I am allowed to, it has some of its sources based on my blog :-)
Installing Dell OpenManage on XenServer 5.6.1, and the nice thing is that it works fine on XenServer 6 as well.
Oracle ASM recovery tips . One day I will take it further, and investigate possible human errors and methods of fixing them. Experience, they say, has a value :-)
A guide dealing with changing from raw to block devices in Oracle ASM . This is only a small part of it, but it’s the thing that interests me.
Understanding Steal Time in Linux Xen-based VMs.
Because I always forget, and I’m too lazy to search again and again (and reach the same page again and again): Upgrading PHP to 5.2 on Centos 5
And last – a very nice remote-control software fomr my Android phone. Don’t leave home without it. Seriously.

Reduced to only 23 tabs is excellent. This was a very nice job, and these links will be useful. To me, for sure. I hope that to you as well.

Hot resize Multipath Disk – Linux

This post is for the users of the great dm-multipath system in Linux, who encounter a major availability problem when attempting a resize of mpath devices (and their partitions), and find themselves scheduling a reboot.

This documented is based on a document created by IBM called “Hot Resize Multipath Storage Volume on Linux with SVC”, and its contents are good for any other storage. However – it does not cover the procedure required in case of a partition on the mpath device (for example – mpath1p1 device).

I will demonstrate with only two paths, but, with understanding this process, it can be well used for any amount of paths for a device.

I do not explain how to reduce a LUN size, but the apt viewer will be able to generate a method out of this document. I, for myself, try to avoid as much as I can from shrinking LUNs. I prefer backup, LUN recreation, and then restore. In many case – it’s just faster.

So – back to our topic – first – increase the size of your LUN on the storage.

Now, you need to collect the paths used for your mpath device. Check this example:

mpath1 (360a980005033644b424a6276516c4251) dm-2 NETAPP,LUN
[size=200G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=4][active]
\_ 2:0:0:0 sdc 8:32  [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 1:0:0:0 sdb 8:16  [active][ready]

The devices marked in bold are the ones we will need to change. Lets get their current size:

blockdev –getsz /dev/sdb
419430400

Keep this number somewhere safe. We can (and should!) assume that sdc has the same values, otherwise, this is not the same exact path.

Collect this info for the partition as well. It will be smaller by a tiny bit:

blockdev –getsz /dev/sdb1
419424892

Keep this number as well.

Now we need to reread the current (storage-based) size parameters of the devices. We will run

blockdev –rereadpt /dev/sdb
blockdev –rereadpt /dev/sdc

Now, our size will be slightly different:

blockdev –getsz /dev/sdb
734003200

Of course, the partition size will not change. We will deal with it later. Keep the updated values as well. Of course, the multipath still holds the disks with their original size values, so running ‘multipath -ll’ will not reveal any size change. Not yet.

We now need to create editable dmsetup map. Use the current command to create two files: cur and org containing this map:

dmsetup table mpath1 | tee org cur
0 419424892 multipath 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:32 128 round-robin 0 1 1 8:16 128

Important part – explaining some of these values. The map shows the device’s size in blocks – 419424892. It shows some parameters, it shows path groups info (0 2 1), and both sub devices – sdc being 8:32 and sdb being 8:16. Try it with ‘ls -la /dev/sdb’ to see the minor and major. At this point, if you are not familiar with majors and minors, I would recommend you do some reading about it. Not mandatory, but will make your life here safer.

We need to delete one of the paths, so we can refresh it. I have decided to remove sdb first:

multipathd -k”del path sdb”

Now, running the multipath command, we will get:

mpath1 (360a980005033644b424a6276516c4251) dm-2 NETAPP,LUN
[size=200G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=4][active]
\_ 2:0:0:0 sdc 8:32  [active][ready]

Only one path. Good. We will need to edit the ‘cur’ file created earlier to reflect the new settings we are to introduce:

0 419424892 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 8:32 128

The only group left was the one containing ‘sdc’ (8:32), and since one group down, the bold number was changed from 2 to 1 (as there is only a single path group now!)

We need to reload multipath with these settings:

dmsetup suspend mpath1; dmsetup reload mpath1 cur; dmsetup resume mpath1

The correct response for this line is ‘ok’. We pause mpath1, reload and then resume it. It is best to be in a single line, as this process freezes IO for a short period of time on the device, and we prefer it to be as short as possible.

Now, as /dev/sdb is not a part of the multipath managed devices, we can modify it. I usually use ‘fdisk’ – deleting the old partition, and recreating it in the new size, but you must make sure, if your device requires LUN alignment, that you recreated the partition from the same start point. I will dedicate a post some time to LUN alignment, but not at this particular time. Just a hint – if you’re not sure, run fdisk in expert mode and get a printout of your partition table (fdisk /dev/sdb and then x and then p). If your partition starts at 128 or 64, it is aligned. If not (usually for large LUNs – at 63), you are not, and you should either be worried about it, but not now, or should not care at all.

Back to our task.

We need to grab the size of the newly created partition, for later use. Write it down somewhere.

blockdev –getsz /dev/sdb1
733993657

Following the partition recreation, we need to introduce the device to the multipath daemon. We do this by:

multipathd -k”add path sdb”

followed by immediately removing the remaining device:

multipathd -k”del path sdc”

We need to have our ‘cur’ file updated, so we can release the device to our uses. This time, we update both the size section with the new size, and the new, remaining path. Now, the file looks like this:

0 734003200 multipath 1 queue_if_no_path 0 1 1 round-robin 0 1 1 8:16 128

As mentioned before – the large number in bold is the new size of the block device. The amount of failure groups is one (1), also in bold, and the device name is ‘sdb’ which is 8:16. Save this modified file, and run:

dmsetup suspend mpath1; dmsetup reload mpath1 cur; dmsetup resume mpath1

Running the command ‘multipath -ll’ you will get the real size of the device.

mpath1 (360a980005033644b424a6276516c4251) dm-2 NETAPP,LUN
[size=350G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
\_ 1:0:0:0 sdb 8:16  [active][ready]

We will need to reread the partition layout of /dev/sdc. The quickest way is by running:

partprobe

This should do it. We can now add it back in:

multipathd -k”add path sdc”

and then run

multipath

(which should result in all the available paths, and the correct size).

Our last task is to update the partition size. The partition, normally, is called mpath1p1, so we need to read its parameters. Lets keep it in a file:

dmsetup table mpath1p1 | tee partorg partcur

We should now edit the newly created file ‘partcur‘ with the new size. You should not change anything else. Originally, it looked like this:

0 419424892 linear 253:2 128

and it was modified to look like this:

0 733993657 linear 253:2 128

Notice that the size (in bold) is the one obtained from /dev/sdb1 (!!!) and not /dev/sdb.

We need to reload the device. Again – attempt to do it in a single line, or else you will freeze IO for a long time (which might cause things to crush):

dmsetup suspend mpath1p1; dmsetup reload mpath1p1 partcur; dmsetup resume mpath1p1

Do not mistaked mpath1 with mpath1p1.

Our device is updated, our paths are online. We are happy. All left to do is to online resize the file system. With ext3, this is done like this:

resize2fs /dev/mapper/mpath1p1

The mount will increase in size online, and all left for us is to wait for it to complete, and then go home.

I hope this helps. It helped me.

Stateless Systems (diskless, boot from net) on RHEL5/Centos5

I have encountered several methods of doing stateless RedHat Linux systems. Some of them are over-sophisticated, and it doesn’t work. Some of them are too old, and you either have to fix half the scripts, or give up (which I did, BTW), and after long period of attempts, I have found my simple-yet-working-well-enough solution. It goes like that (drums please!)

yum install system-config-netboot

Doesn’t it sound funny? So simple, and yet – so working. So I have decided to supply you with the ultimate, simple, working guide to this method – how to make it work – fast and easy.

You will need the following:

  • RHEL5/Centos5 with ability to run yum client, and enough disk space to contain an entire system image, and more. Lots more. About it later. This machine will be called “server” in this guide.
  • A single “Golden Image” system – the base of your system-to-replicate. A system you will, with some (minor) modifications, use as the source of all your future stateless systems. Usually – resides on another machine (physical or virtual, doesn’t matter much. More about it later). This machine will be called “GI” in this guide.
  • A test system. Better be diskless, for assurance. Better also be able to boot from net, otherwise, we miss something here (although hybrid boot methods are possible, I will not discuss them here, for the time being). Can be virtual, as long as it is full hardware virtualization, as you cannot net-boot, except for the latest Xen Community versions, in PV mode. It will be called “net-client” or “client” in this guide.

Our flow:

  • Install required server-side tools
  • Setup server configuration parameters
  • Image the GI into the server
  • Run this and that
  • Boot our net-client happily

Let’s start!

On the server, run:

yum install -y system-config-netboot xorg-x11-xinit dhcp

and then, run:

chkconfig dhcpd on
chkconfig tftp on
chkconfig xinetd on
chkconfig nfs on

We will now perform configurations for the above mentioned services.

Your /etc/dhcpd.conf should look like this:

ddns-update-style interim;
ignore client-updates;

subnet $NETWORK netmask $NETMASK {
option routers              $GATEWAY;
option subnet-mask          $NETMASK;
option domain-name-servers  $DNS;
ignore unknown-clients;
filename “linux-install/pxelinux.0″;
next-server $SERVERIP;
}

You should replace all these variables with the ones matching your own layout. In my case

NETWORK=10.0.0.0
NETMASK=255.0.0.0
GATEWAY=10.200.1.1
DNS=10.100.1.4
SERVERIP=10.100.1.8

We will include hosts later. Notice that this DHCP server will refuse to serve unknown hosts. This will prevent the “Oops” factor…

We can start our DHCPD now. It won’t do anything, and it’s a good opportunity to test our configuration:

service dhcpd start

We need to create the directory structure. I have selected /export/NFSroots as my base structure. I will follow this structure here:

mkdir -p /export/NFSroots

We will proceed with the NFS server settings later.

Imaging the GI to the server is quite simple. We will begin by creating a directory in /export/NFSroots with the name of the imaged system type. In my case:

mkdir /export/NFSroots/centos5.6

However (and this is the tricky part!), we will create a directory under this location, called ‘root’. We will image the entire contents of our GI to this root directory. This is how this mechanism works, and I have no desire of bending it. So

mkdir /export/NFSroots/centos5.6/root

Now we copy the contents of the GI to this directory. There are several methods, but in this particular case, I have chosen to use ‘rsync’ over ‘ssh’. There are other methods just as good. Just one note – on the GI, before this copy, we will need to have busybox-anaconda package. So make sure you have it:

yum install -y busybox-anaconda

Now we can create an image from the GI:

rsync -e ssh -av –exclude ‘/proc/*’ –exclude ‘/sys/*’ –exclude ‘/dev/shm/*’ $GI_IP:/ /export/NFSroots/centos5.6/root

You must include as many exclude entries as required. You should never cause the system to attempt to grab the company’s huge NFS share, just because you have forgotten some ‘exclude’ entry. This will cause major loss of time, and possibly – some outage to some network resource. Be careful!

This is the best time to grub a cup of coffee/tee/cocoa/coke/whatever, and chit-chat with your friends. I have had about 10 minutes for 1.8GB image, via 1Gb/s network link, so you can do some math and guesswork, and probably – go grab launch.

When this operation is complete, your next mission is to configure your NFS server. Order does matter. You should create a read-only entry for the image root directory, and a full-access entry for the above structure. Well – I can probably narrow it down, but I did not really bother. During pre-production, I will test how to narrow permissions down, without getting myself into management hell. So – our entries are looking like this in /etc/exports :

/export/NFSroots/centos5.6/root *(ro,no_root_squash)
/export/NFSroots *(rw,no_root_squash,async)

I would love to hear comments by people who did narrow security down to the required minimum, and how they managed to handle tenths and hundreds of clients, or several dozens of RO images without much hassle. Really. Comment, people, if you have something to add. I would love to modify this document accordingly.

Now we need to start our NFS server:

service nfs start

We are ready for the big entry. You will need GUI here. We have installed xorg-x11-xauth which will allow us, if invoked, remote X. I use X over SSH, so it’s a breeze. Run the command:

system-config-netboot

I will not bother with screenshots, as this is not my style, but the following entries will be required:

  • First Time Druid: Diskless
  • Name: Easy identified. In my case: Centos5.6
  • Server IP address: $SERVERIP
  • Directory: /export/NFSroots/centos5.6
  • Kernel: Select the desired kernel
  • Apply

The system might take several minutes to complete. When done, you will be presented with a blank table. We need to populate it now.

For that to be easy, we better have all our clients in /etc/hosts file. While DNS does help, a lot, this specific tool, for it to work like charm, requires /etc/hosts to have the entry, or else, it just won’t be very readable. Below is a one-liner script to create a set of entries in /etc/hosts. I have started with .100 and completed with .120. This can be changed easily to match your requirements:

for i in {100..120} ; do echo “10.100.1.$i   pxe${i}” >> /etc/hosts ; done

This way we can refer to our clients as pxe100, pxe101, and so on.

Let’s create a new client!

Select “new” from the GUI menu, and fill in the name ‘pxe100′. If you have multiple images, this is a good time to select which one you will be using. If you have followed this guide to the letter, you have only a single option. Press on “OK” and you’re done.

We need to setup MAC and IP addresses into the DHCP server. I have written a script which assists with this greatly. For this script to work, you will need to run the following commands:

echo “” > /etc/dhcpd.conf.bootnet
echo ‘include “/etc/dhcpd.d/dhcpd.conf.bootnet”;’ >> /etc/dhcpd.conf

#!/bin/bash
# Creates and removes netboot nodes
 
# Check arguments
 
# VARIABLES
DHCPD_CONF=/etc/dhcpd.conf.bootnet
SERVICE=dhcpd
DATE=`date +"%H%M%S%d%m%y"`
BACKUPS=${DHCPD_CONF}.backups
 
function show_help() {
	echo "Usage: $0"
	echo "add NAME MAC IP"
	echo "where NAME is the name of the node"
	echo "MAC is the hardware address of the netboot interface (usually eth0)"
	echo "IP is the designated IP address of the node"
	echo
	echo
	echo "del ARGUMENT"
	echo "Where ARGUMENT can be either name, MAC address or IP address"
	echo "The script will attempt to remove whatever match it finds, so be careful"
	echo
	echo
	echo "check ARGUMENT"
	echo "Where ARGUMENT can be either name, MAC address or IP address"
        echo "The script will list any matching entries. This is a recommended"
	echo "action prior to removing a node"
	echo "The same logic used for finding entries to remove is used here"
	echo "so it is rather safe to run 'del' after a successful 'check' operation"
	echo
	exit 0
}
 
function check_inside() {
	# Will be used to either show or find the amount of matching entries"
	# Arguments: 	$1 - silent - 0 ; loud - 1
	# 		$2 - the string to search
	# The function will search the string as a stand-alone word only
	# so it will not find abc in abcde, but only in "abc"
 
	case "$1" in
		0) 	# silent
			RET=`grep -iwc $2 $DHCPD_CONF`
			;;
		1) 	# loud
			grep -iw $2 $DHCPD_CONF
			;;
 
		*)	echo "Something is very wrong with $0"
			echo "Exiting now"
			exit 2
			;;
	esac
 
	return $RET
}
 
function add_to() {
	# This function will add to the conf file
	# Arguments: $1 host ; $2 MAC ; $3 IP address
	echo "host $1 { hardware ethernet $2; fixed-address $3; }" >> $DHCPD_CONF
	[ "$?" -ne "0" ] && echo "Something wrong happened when attempted to add entry" && exit 3
}
 
function del_from() {
	# This function will delete a matching entry from the conf file
	# Arguments: $1 - expression
	[ -z "$1" ] && echo "Missing argument to del function" && exit 4
	if cat $DHCPD_CONF | grep -vw $1 > $DHCPD_CONF.$$
	then
		\mv $DHCPD_CONF.$$ $DHCPD_CONF
	else
		echo "Failed to create temp DHCPD file. Aborting"
		exit 5
	fi
}
 
function backup_conf() {
	cp $DHCPD_CONF $BACKUPS/`basename $DHCPD_CONF.$DATE`
}
 
function restore_conf() {
	\cp $BACKUPS/`basename $DHCPD_CONF.$DATE` $DHCPD_CONF
}
 
function check_wrapper() {
	# Perform check. Loud one
	echo "Searching for $REGEXP in configuration"
	check_inside 1 $REGEXP
	exit 0
}
 
function del_wrapper() {
	# Performs delete
	[ -z "$REGEXP" ] && echo "Missing argument for delete action" && exit 6
	backup_conf
	echo "Removing all invocations which include $REGEXP from config"
	del_from $REGEXP
 
	if /sbin/service $SERVICE restart
        then
                echo "Done"
        else
                restore_conf
                /sbin/service $SERVICE restart
                echo "Failed to update. Syntax error"
        fi
}
 
function add_wrapper() {
	# Adds to config file
	[ -z "$NAME" -o -z "$MAC" -o -z "$IP" ] && echo "Missing argument for add action" && exit 7
 
	for i in $NAME $MAC $IP
	do
		if check_inside 0 $i
		then
			echo -n .
		else
			echo "Value $i already exists"
			echo "Will not update duplicate value"
			exit 7
		fi
	done
	echo
 
	backup_conf
	add_to $NAME $MAC $IP
 
	if /sbin/service $SERVICE restart
	then
		echo "Done"
	else
		restore_conf
		/sbin/service $SERVICE restart
		echo "Failed to update. Syntax error"
	fi
}
 
function prep() {
	# Make sure everything is as expected
	[ ! -d ${BACKUPS} ] && mkdir ${BACKUPS}
}
 
prep
 
case "$1" in
	add)	NAME="$2"
		MAC="$3"
		IP="$4"
		add_wrapper
		;;
	check)	REGEXP="$2"
		check_wrapper
		;;
	del)	REGEXP="$2"
		del_wrapper
		;;
	help)	show_help
		;;
	*)	echo "Usage: $0 [add|del|check|help]"
		exit 1
		;;
esac

This script will update your /etc/dhcpd.conf.bootnet with new nodes. Place it somewhere in your path (for example: /usr/local/bin/ )

We will need the MAC address of a node. Run

net-node.sh add pxe100 00:16:3E:00:82:7A 10.100.1.100

This will add the node pxe100 with that specific MAC address to the DHCPD configuration.

Now, all you need to do is boot your client, and see how it works. Remember to disable other DHCP servers which might serve this node, or blacklist its MAC address from them.

Our next chapters will deal with my (future) attempt to make RHEL6 work under this setup (as a GI, and client, not as a server), and all kind of mass-deployment methods. If and when :-)

Disable IPv6 on RHEL5 or Centos5

Recently, bonding module requires IPv6 module. There are two possible solutions. The first is to define zeroconf to off, by adding the following line to /etc/sysconfig/network :
NOZEROCONF=yes

The other method, which actually disables IPv6 is to insert into /etc/modprobe.conf the following line:
options ipv6 disable=1

That should do the trick. You will need to restart/reload your modules, of course.

Putty (plink) and unknown host key

Putty has announced that they will not allow automatic import of host key. You should press the damn ‘y’ yourself, or else, you will fail. A quote from their FAQ says: “A.2.9 Is there an option to turn off the annoying host key prompts?

No, there isn’t. And there won’t be. Even if you write it yourself
and send us the patch, we won’t accept it.
…”

And it goes on, blah, blah, blah. While I agree with posing limitations in the name of security, I do not agree with complete block of options. See, for example, the use of “–use-the-force-luke” option in growisofs. This is a great example of “We know you want it. We don’t like it, but we’ll still allow you, although we will make sure you understand that it’s something out of the ordinary”. I respect that. Putty creators, however, did not want it to be done, and will do their best to make sure it won’t be possible in the future.

A workaround goes like this:
Create a file called “yes.txt” with the following content:
y

(Notice that you need a ‘y’, and a new line). Save the file.

Run the following command, for your viewing pleasure:
plink.exe -l user -pw password server "date" < yes.txt
Your plink will get its ‘y’ answer, and be able to run the command.

Hope it helps. I’ve seen so many people asking about it on the net…

Don’t try it at home – NetApp SnapMirror network-free Seeding

Well, this is rather common. Network-free seeding is being performed through using a tape device.
What happens if you do not have a tape device? We have the poor-man’s huge TB disks (SATA for the people) connected to simple PC systems, but no tape…
We perform network-free seeding into disk.

There is a utility called ‘lrep’ which is nice and effective. It forces the use of Qtree-based snapmirror (QSM), which has its own limitations. I advise you’ll read NetApp’s “SnapMirror Async Overview and Best Practices Guide” with id TR3446 for further details.

Limitations:

  • You have to have enough space on the source NetApp device to contain twice the volume you require to replicate
  • You can copy the output replica to an external Windows/Linux/Other system, of a movable type (could be a desktop), with large disks using CIFS or NFS.
  • HIgh CPU usage is guaranteed, as well as high disk usage.
  • You are aware to the fact that this is dangerous, and they will probably won’t like you just a little bit for doing this.

Concept:

Using the command ‘snapmirror store‘ you are able to store initial replica into a tape device for later seeding. The options (/man/Internet) describe how you should use your tape device. You don’t have one, or you don’t have one on each site.

Operation:

  • Create a volume or verify you have enough free space on some existing volume on the source NetApp device.
  • You will require the exact amount of used space on the source volume, give or give a little.
  • I recommend you’ll disable snapshots on the target volume for the duration of this operation, to save space.

Let’s assume that the source volume name is ‘vol2′, and that we have enough free space on /vol/vol1/free_space.

You need to perform the following:

Dangerous – change your privilege level:

priv set diag

Perform the initial SnapMirror

snapmirror store vol2 /vol/vol1/free_space/vol2

Remember – you have to be in ‘diag’ mode for it to work.

Reduce your privileges to normal:

priv set

Track the status through

snapmirror status

When the operation has completed, connect from your external Windows/Linux/Other machine to /vol/vol1/free_space and copy out the file ‘vol2′ which will probably be huge.

Transfer the Windows/Linux/Other system (or only its disk) to the alternate location, create a volume or verify you have enough free space on an existing volume to contain the entire file, and copy it to that location. I will assume it’s the same as on the source NetApp device – /vol/vol1/free_space/

Change your privileges level:

priv set diag

Create the target volume, and set it to be restricted (the values here are just an example):

vol create vol2 aggr0 100g

vol restrict vol2

Perform a ‘snapmirror retrieve’ operation:

snapmirror retrieve vol2 /vol/vol1/free_space/vol2

Reduce your privileges to normal:

priv set

You can track the status through

snapmirror status

Following that, perform an update with the source NetApp real path (filer1:/vol/vol2) and you’re fine.

Remember – be very very careful when running under ‘diag’ privileges.

Oracle VM post-install check list

Following my experience with OracleVM, I am adding my post-install steps for your pleasure. These steps are not mandatory, by design, but will help you get up and running faster and easier. These steps are relevant to Oracle VM 2.2, but might work for older (and newer) versions as well.

Define bonding

You should read more about it in my past post.

Define storage multipathing

You can read about it here.

Define NTP

Define NTP servers for your Oracle VM host. Make sure the daemon ‘ntpd’ is running, and following an initial time update, via

ntpdate -u <server>

to set the clock right initially, perform a sync to the hardware clock, for good measures

hwclock –systohc

Make sure NTPD starts on boot:

chkconfig ntpd on

Install Linux VM

If the system is going to be stand-alone, you might like to run your VM Manager on it (we will deal with issues of it later). To do so, you will need to install your own Linux machine, since Oracle supplied image fails (or at least – failed for me!) for no apparent reason (kernel panic, to be exact, on a fully MD5 checked image). You could perform this action from the command line by running the command

virt-install -n linux_machine -r 1024 -p –nographics -l nfs://iso_server:/mount

This directive installs a VM called “linux_machine” from nfs iso_server:/mount, with 1GB RAM. You will be asked about where to place the VM disk, and you should place it in /OVS/running_pool/linux_machine , accordingly.

It assumes you have DHCP available for the install procedure, etc.

Install Oracle VM Manager on the virtual Linux machine

This should be performed if you select to manage your VMs from a VM. This is a bit tricky, as you are recommended not to do so if you designing HA-enabled server pool.

Define autostart to all your VMs

Or, at least, those you want to auto start. Create a link from /OVS/running_pool/<VM_NAME>/vm.cfg to /etc/xen/auto/

The order in which ‘ls’ command will see them in /etc/xen/auto/ is the order in which they will be called.

Disable or relocate auto-suspending

Auto-suspend is cool, but your default Oracle VM installation has shortage of space under /var/lib/xen/save/ directory, where persistent memory dumps are kept.  On a 16GB RAM system, this can get pretty high, which is far more than your space can contain.

Either increase the size (mount something else there, I assume), or edit /etc/sysconfig/xendomains and comment the line  with the directive XENDOMAINS_SAVE= . You could also change the desired path to somewhere you have enough space on.

Hashing this directive will force regular shutdown to your VMs following a power off/reboot command to the Oracle VM.

Make sure auto-start VMs actually start

This is an annoying bug. For auto-start of VMs, you need /OVS up and available. Since it’s OCFS2 file system, it takes a short while (being performed by ovs-agent).

Since ovs-agent takes a while, we need to implement a startup script after it and before xendomains. Since both are markes “S99″ (check /etc/rc3.d/ for details), we would add a script called “sleep”.

The script should be placed in /etc/init.d/

#!/bin/bash
#
# sleep     Workaround Oracle VM delay issues
#
# chkconfig: 2345 99 99
# description: Adds a predefined delay to the initialization process
#
 
DELAY=60
 
case "$1" in
start) sleep $DELAY
;;
esac
exit 0

Place the script as a file called “sleep” (omit the suffix I added in this post), set it to be executable, and then run

chkconfig –add sleep

This will solve VM startup problems.

Fix /etc/hosts file

If you are into multi-server pool, you will need that the host name would not be defined to 127.0.0.1 address. By default, Oracle VM defines it to match 127.0.0.1, which will result in a poor attempt to create multi-server pool.

This is all I have had in mind for now. It should solve most new-comer issues with Oracle VM, and allow you to make good use of it. It’s a nice system, albeit it’s ugly management.

Update the OracleVM

You could use Oracle’s unbreakable network, if you are a paying customer, or you could use the Public Yum Server for your system.

Updates to Oracle VM Manager

If you won’t use Oracle Grid Control (Enterprise Manager) to manage the pool, you will probably use Oracle VM Manager. You would need to update the ovs-console package, and you will probably want to add tightvnc-java package, so that IE users will be able to use the web-based VNC services. You would better grub these packages from here.

Citrix XenServer and DR

Assuming your storage is capable of replication, a bunch of VMs could be happily replicated to an alternate location, where you can start them on will (and on crisis, most likely).

This procedure, in theory, is rather simple. I have discovered that it is less so, especially if your system goes into testing once a while.

This leaves some “memories” with Citrix XenServer, as well as the underlayer OS.

The forums show examples of how to handle iSCSI re-attach, so I am adding here hot to handle fiber channel re-attach operation as well, especially if you’re using multipath.

Grab the SCSI ID

The SCSI ID is very important, as it is he basic identified of the LUN:

xe sr-proble type=lvmohba 2>&1

This will show a section (XML-like) under “SCSIid” containing the device’s SCSI ID. Also – grab the UUID of the device, as XenServer assigns. You could do this with:

xe sr-probe type=lvmohba device-config:SCSIid=<Enter SCSI ID here>

Introduce the Device

The system needs the device introduced. For that we’ll use the above captured UUID:

xe sr-introduce uuid=<Enter the UUID obtained before> shared=true typb=lvmohba namb-label=”Imported SR”

Keep the UUID returned by this command. I will reference to it as “SR UUID”

Create PBD for each host in the pool

This should be performed for each host, based on the UUID of all hosts:

xe pbd-create sr-uuid=<Enter the SR UUID obtained before> device-config=SCSIid=<Enter SCSI ID> host-uuid=<Enter UUID of host>

The result value, for each host, is the pbd UUID for that specific host. pbd is a physical Block Device, meaning – the “connection” of the Storage Repository (which is pool-wide) to the specific host.

Plug PBDs

For each of the pbd UUID obtained above, run the following command to plug it in:

xe pbd-plug uuid=<pbd UUID as obtained above>

That should be all.

For ease of repeat, below is a script to perform these exact actions automatically. It assumes one (1) unallocated LUN at the beginning of the process. It will probably behave differently for more than one.

#!/bin/bash
SCSIID=`xe sr-probe type=lvmohba 2>&1 | grep -A1 SCSIid | grep -v SCSIid | grep -v vendor | tail -1 | awk '{print $NF}'`
echo "scsi ID: $SCSIID"
UUID=`xe sr-probe type=lvmohba device-config:SCSIid=$SCSIID | grep -A1 UUID | grep -v UUID | grep -v Devlist | head -1 | awk '{print $NF}'`
echo "SR UUID $UUID"
SRID=`xe sr-introduce uuid=$UUID name-label="Imported SR" shared=true type=lvmohba`
echo "SR UID: $SRID"
HOSTID=`xe host-list  | grep uuid | awk '{print $NF}'`
PBDID=""
for i in $HOSTID
do 
   PBDID="$PBDID `xe pbd-create sr-uuid=$SRID device-config:SCSIid=$SCSIID host-uuid=$i`"
done
for i in $PBDID
do
   xe pbd-plug uuid=$i
done