Archive for the ‘Virtualization’ Category

Quickly install Xen Community Linux VM

Saturday, December 5th, 2009

On RHEL-type of systems, with virt-manager (libvirt), you can make use of virt-manager to easy your life. I, for myself, prefer to work with ‘xm‘ tools, but for the initial install, virt-manager is the quickest and most simple available tool.

To install a new Linux VM, all you need to follow this flow

Create an LV for your VM (I use LVs because it’s easier to manage). If not LV, use a file. To create an LV, run the following command

lvcreate -L 10G -n new_vm1 VolGroup00

I assume that the name you wish to grant is ‘new_vm1′ (better maintain order there, else you will find yourself with hundreds of small LVs you have no idea what to do with), and that the name of the volume group is ‘VolGroup00′. Change to different values to match your environment.

Next, make sure you have your ISO contents unpacked (you can use loop device) and exported via NFS (my favorite method).

To mount a CD/DVD ISO, you should use ‘mount’ command with the ‘loop’ options. This would look like this:

mount -o loop my_iso.iso /mnt/temp

Again, I assume the name of the ISO is my_iso.iso and that the target directory /mnt/temp is available.

Now, export your newly created directory. If you have NFS already running, you can either add to /etc/exports the newly mounted directory /mnt/temp and restart the ‘nfs’ service, or you could use ‘exportfs’ to add it:

exportfs -o no_root_squash *:/mnt/temp

would probably do the trick. I added ‘no_root_squash’ to make sure no permission/access problems present themselves during the installation phase. Test your export to verify it’s working.

Now you could begin your installation. Run the following command:

virt-install -n new_vm1 -r 512 -p -f /dev/VolGroup00/new_vm1 –nographics nfs://nfs_server:/mnt/temp

The name follows the ‘-n’ flag. The amount of RAM to give is 512MB. The -p means it’s paravirtualized. The -f shows which device will be the block device, and the last argument is the source of the installation. Do not use local files, as the VM installer should be able to access the installation source.

Following that, you should have a very nice TUI installation experience.

Now – let’s make this machine ‘xm’ compatible.

Currently, the VM is virt-manager compatible. It means you need virt-manager to start/stop it correctly. Since I prefer ‘xm’ commands, I will show you how to convert this machine to VM.

First – export its XML file:

virsh dumpxml new_vm1 > /tmp/new_vm1.xml

virsh domxml-to-native xen-xm /tmp/new_vm1.xml > /etc/xen/new_vm1

This should do the trick.

Now you can turn the newly created VM off, and remove the VM from virt-manager using

virsh undefine new_vm1

and you’re back to ‘xm’-only interface.

XenServer “Internal error: Failure… no loader found”

Saturday, October 24th, 2009

It has been long since I had the time to write here. I have recently been involved more and more with XenServer virtualization, as you might see in the blogs, and following a solution to a rather common problem, I have decided to post it here.

The problem: When attempting to boot a Linux VM on XenServer (5.0 and 5.5), you get the following error message:

Error: Starting VM ‘Cacti’ – Internal error: Failure(“Error from xenguesthelper: caught exception: Failure(\\\”Subprocess failure: Failure(\\\\\\\”xc_dom_linux_build: [2] xc_dom_find_loader: no loader found\\\\\\\\n\\\\\\\”)\\\”)”)

This is very common with Linux VMs which were converted from physical (or other, non-PV virtualization) to XenServer.

This will probably either happen during the P2V process, or after a successful update to the Linux VM.

The cause is that the original kernel, non PV-aware one, has not been removed, and GRUB likes to load from it. XenServer will use the GRUB menu, but will not display it to us to select our desired kernel.

With no chance to intervene, XenServer will attempt to load a PV-enabled machine using non-PV kernel, and will fail.

Preventing the problem is quite simple – remove your non-PV kernel (non-xen) so that future updates will not attempt to update it as well and set it to be the default kernel. Very simple.

Solving the problem in less than two minutes is a bit more tricky. Let’s see how to solve it.

All operations are performed from within the control domain. This guide does not apply to StorageLink or NetApp/Equalogic devices, as they behave differently. This applies only to LVM-over-something, whatever it may be.

First, we will need to find the name of the VDI we are to work on. Use xe in the following manner, using the VM’s name:

xe vbd-list vm-name-label=Cacti

uuid ( RO)             : 128f29dc-4a14-1a2d-75d1-8674d3d2403b
vm-uuid ( RO): eae053de-4a20-28a5-f335-f5a18dd79993
vm-name-label ( RO): Cacti
vdi-uuid ( RO): 90524af4-5b20-4412-9bfe-f1fe27f220b1
empty ( RO): false
device ( RO): xvda

uuid ( RO)             : de177727-b28a-8b79-e73e-d08366d56277
vm-uuid ( RO): eae053de-4a20-28a5-f335-f5a18dd79993
vm-name-label ( RO): Cacti
vdi-uuid ( RO): <not in database>
empty ( RO): true
device ( RO): xvdd

It is very common that xvdd is used for CDROM, so we can safely ignore the second section. The first section is the more interesting one. There is a correlation between the name of the VDI and the name of the LVM on the disk. We can find this specific LV using the following command. Notice that the name of the VDI is used here as the argument for the ‘grep’ command:

lvs | grep 90524af4-5b20-4412-9bfe-f1fe27f220b1

LV-90524af4-5b20-4412-9bfe-f1fe27f220b1 VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b -wi-a-  10.00G

We now have our LV path! As you can see, its status is offline. We need to set it to online state. Using both the LV and the VG name, we can do it like that:

lvchange -ay /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

Now we can access the volume. We can actually check that the problem is the one we look for, using pygrub:

pygrub /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

We should now see the GRUB menu of the VM at question. If you don’t see any menu, either you have missed a step or used the wrong disk.

The menu should show you all the list of kernels. The default one is the one highlighted, and if it doesn’t include the word “xen” with it, most likely that we have found the problem.

We now need to change to a PV-capable kernel. We will need to access the “/boot” partition of the Linux VM, and change GRUB’s options there.

First we map the disk to a loop device, so we can access its partitions:

losetup /dev/loop1 /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

Notice that you need to use the entire path to the LV, that the LV is online, and that loop1 is not in use. If it is, you will have a message saying something like “LOOP_SET_FD: Device or resource busy”

Now we need to access its partitions. We will map them using ‘kpartx’ to /dev/mapper/ devices. Notice we’re using the same loop device name:

kaprtx -a /dev/loop1

Now, new files present themselves in /dev/mapper:

ls -la /dev/mapper/
total 0
drwxr-xr-x  2 root root     220 Oct 24 12:39 .
drwxr-xr-x 14 root root   16560 Oct 24 12:31 ..
crw——-  1 root root  10, 62 Sep 29 10:15 control
brw-rw—-  1 root disk 252,  5 Oct 24 12:39 loop1p1
brw-rw—-  1 root disk 252,  6 Oct 24 12:39 loop1p2
brw-rw—-  1 root disk 252,  7 Oct 24 12:39 loop1p3

Usually, the first partition represents /boot, so we can now mount it and work on it:

mount /dev/mapper/loop1p1 /mnt

All we need to do is edit /mnt/grub/menu.lst to match our requirements, and then wrap everything back up:

umount /mnt

kpartx -u /dev/loop1

losetup -d /dev/loop1

We don’t have to change the LV to offline, because the XenServer will activate it if it’s not, however, we could do it, to be on the safe side:

lvchange -an /dev/VG_XenStorage-4aa20fc2-fd92-20c2-c549-bed2597c622b/LV-90524af4-5b20-4412-9bfe-f1fe27f220b1

Now we can activate the VM, and see it boot successfully.

This whole process takes several minutes the first time, and even less later.

I hope this helps.

Citrix XenServer 5.0 cannot cooperate with NetApp SnapMirror

Tuesday, September 8th, 2009

It has been a long while, I know. I was busy with life, work and everything around it. Not much worth mentioning.

This, however, is something else.

I have discovered an issue with Citrix XenServer 5.0 (probably the case with 5.5, but I have other issues with that release) using NetApp through NetApp API SR – Any non XenServer-generated snapshot will be deleted as soon as any snapshot-related action would be performed on that volume. Meaning that if I had manually created a snapshot called “1111″ (short and easy to recognize, especially with all these UUID-based volumes, LUNs and snapshot names XenServer uses…), the next time anyone would create a snapshot of a machine which has a disk (VDI) on this specific volume, the snapshot, my snapshot, “1111″ will be removed under that specific volume. The message seen in /var/log/SMlog would look like this:

Removing unused snap (1111)

While under normal operation, this does not matter much, as non-XenServer snapshots have little value, when using NetApp SnapMirror technology, the mechanism works a bit differently.

It appears that the SnapMirror system takes snapshots with predefined names (non-XenServer UUID type, luckily for us all). These snapshots include the entire changes performed since the last SnapMirror snapshots, and are used for replication. Unfortunately, XenServer deletes them. No SnapMirror snapshots, well, this is quite obvious, is it not? No SnapMirror…

We did not detect this problem immediately, and I should take the blame for that. I had to define a set of simple trial and error tests, as described above, instead of battling with a system I did not quite follow at that time – NetApp SnapMirror. Now I do, however, and I have this wonderful insight which can make your personal life, if you had issues with SnapMirror and XenServer, and did not know how to make it work, better. This solution cannot be an official one, due to its nature, which you will understand shortly. This is a personal patch for your pleasure, based on the hard fact that SnapMirror uses a predefined name for its snapshots. This name, in my case, is the name of the DR storage device. You must figure out what name is being used as part of the snapshot naming convention on your own site. Search for my ’storagedr’ phrase, and replace it with yours.

This is the diff file for /opt/xensource/sm/NETAPPSR.py . Of course – back up your original file. Also – this is not an official patch. It was tested to function correctly on XenServer 5.0, and it will not work on XenServer 5.5 (since NETAPPSR.py is different). Last warning – it might break on the next update or upgrade you have for your XenServer environment, and if that happens, you better monitor your SnapMirror status closely then.

400,403c400,404
<                     util.SMlog("Removing unused snap (%s)" % val)
<                     out = netapplib.fvol_snapdelete_wrapper(self.sv, val, volname)
<                     if not na_test_result(out):
<                         pass
---
> 		    if 'storagedr' not in val:
>                     	util.SMlog("Removing unused snap (%s)" % val)
>                     	out = netapplib.fvol_snapdelete_wrapper(self.sv, val, volname)
>                     	if not na_test_result(out):
>                         	pass

Hope it helps!

XenServer create snapshots for all machines

Friday, August 7th, 2009

XenServer is a wonderful tool. One of the better parts of it is its powerful scripting language, powered by the ‘xe’ command.

In order to capture a mass of snapshots, you can either do it manually from the GUI, or scripted. The script supplied below will include shell functions to capture Quiesce snapshots, and it that fails, normal snapshots of every running VM on the system.

Reason: NetApp SnapMirror, or other backup (maybe for later export) scheduled actions.

#!/bin/bash
# This script will supply functions for snapshotting and snapshot destroy including disks
# Written by Ez-Aton
# Visit my web blog for more stuff, at http://run.tournament.org.il
 
# Global variables:
UUID_LIST_FILE=/tmp/SNAP_UUIDS.txt
 
# Function
function assign_all_uuids () {
	# Construct artificial non-indexed list with name (removing annoying characters) and UUID
	LIST=""
	for UUID in `xe vm-list power-state=running is-control-domain=false | grep uuid | awk '{print $NF}'`
	do
		NAME=`xe vm-param-get param-name=name-label uuid=$UUID | tr ' ' _ | tr -d '(' | tr -d ')'`
		LIST="$LIST $NAME:$UUID"
	done
	echo $LIST
}
 
function take_snap_quiesce () {
	# We attempt to take a snapshot with quench
	# Arguments: $1 name ; $2 uuid
	# We attempt to snapshot the machine and set the value of snap_uuid to the snapshot uuid, if successful.
	# Return 1 if failed
 
	if SNAP_UUID=`xe vm-snapshot-with-quiesce vm=$2 new-name-label=${1}_snapshot`
	then
		# echo "Snapshot-with-quiesce for $1 successful"
		return 0
	else
		echo "Snapshot-with-quiesce for $1 failed"
		return 1
	fi
}
 
function take_snap () {
	# We attempt to take a snapshot
	# Arguments: $1 name ; $2 uuid
	# We attempt to snapshot the machine and set the value of snap_uuid to the snapshot uuid, if successful.
	# Return 1 if failed
 
	if SNAP_UUID=`xe vm-snapshot vm=$2 new-name-label=${1}_snapshot`
	then
		#echo "Snapshot for $1 successful"
		echo $SNAP_UUID
		return 0
	else
		echo "Snapshot-with-quiesce for $1 failed"
		return 1
	fi
}
 
function stop_ha_template () {
	# Templates inherit their settings from the origin
	# We need to turn off HA
	# $1 : Template UUID
	if [ -z "$1" ]
	then
		echo "Missing template UUID"
		return 1
	fi
	xe template-param-set ha-always-run=false uuid=$1
}
 
function get_vdi () {
	# This function will get a space delimited list of VDI UUIDs of a given snapshot/template UUID
	# Arguments: $1 template UUID
	# It will also verify that each VBD is an actual snapshot
	if [ -z "$1" ]
	then
		echo "No arguments? We need the template UUID"
		return 1
	fi
	VDIS=""
	for VBD in `xe vbd-list vm-uuid=$1 | grep ^uuid | awk '{print $NF}'`
	do
		echo "VBD: $VBD"
		if [ ! `xe vbd-param-get param-name=type uuid=$VBD` = "CD" ]
		then
			CUR_VDI=`xe vdi-list vbd-uuids=$VBD | grep ^uuid | awk '{print $NF}'`
			if `xe vdi-param-get uuid=$CUR_VDI param-name=is-a-snapshot`
			then
				VDIS="$VDIS $CUR_VDI"
			else
				echo "VDI is not a snapshot!"
				return 1
			fi
			CUR_VDI=""
		fi
	done
	echo $VDIS
}
 
function remove_vdi () {
	# This function will get a list of VDIs and remove them
	# Carefull!
	for VDI in $@
	do
		if xe vdi-destroy uuid=$VDI
		then
			echo "Success in removing VDI $VDI"
		else
			echo "Failure in removing VDI $VDI"
			return 1
		fi
	done
}
 
function remove_template () {
	# This funciton will remove a template
	# $1 template UUID
	if [ -z "$1" ]
	then
		echo "Required UUID"
		return 1
	fi
	xe template-param-set is-a-template=false uuid=$1
	if ! xe vm-uninstall force=true uuid=$1
	then
		echo "Failure to remove VM/Template"
		return 1
	fi
}
 
function remove_all_template () {
	# This function will completely remove a template
	# The steps are as follow:
	# $1 is the UUID of the template
	# Calculate its VDIs
	# Remove the template
	# Remove the VDIs
	if [ -z "$1" ]
	then
		echo "No Template UUID was supplied"
		return 1
	fi
	# We now collect the value of $VDIS
	get_vdi $1
	if [ "$?" -ne "0" ]
	then
		echo "Failed to get VDIs for Template $1"
		return 1
	fi
	if ! remove_template $1
	then
		echo "Failure to remove template $1"
		return 1
	fi
	if ! remove_vdi $VDIS
	then
		return 1
	fi
}
 
function create_all_snapshots () {
	# In this function we will run all over $LIST and create snapshots of each machine, keeping the UUID of it inside a file
	# $@ - list of machines in the $LIST format
	if [ -f $UUID_LIST_FILE ]
	then
		mv $UUID_LIST_FILE $UUID_LIST_FILE.$$
	fi
	for i in $@
	do
		SNAP_UUID=`take_snap_quiesce ${i%%:*} ${i##*:}`
		if [ "$?" -ne "0" ]
		then
			echo "Problem taking snapshot with quiesce for ${i%%:*}"
			echo "Attempting normal snapshot"
			SNAP_UUID=`take_snap ${i%%:*} ${i##*:}`
			if [ "$?" -ne "0" ]
                	then
                        	echo "Problem taking snapshot for ${i%%:*}"
				SNAP_UUID=""
			fi
		fi
		stop_ha_template $SNAP_UUID
		echo $SNAP_UUID >> $UUID_LIST_FILE
	done
}

Possible use will be like this:

. /usr/local/bin/xen_functions.sh

create_all_snapshots `assign_all_uuids` &> /tmp/snap_create.log

Xen guests cannot serv NFS requests

Tuesday, March 3rd, 2009

This sounds weird, but I have witnessed it today, and had to work rather hard to figure the cause of the problem.

When using ” Intel Corporation 82575EB Gigabit Network Connection (rev 02)” (as lspci reports), TCP offload causes problems.

Symptoms:

  • The host can communicate with the guest flawlessly (including HTTP get for larger than 2.6k files)
  • Other external hosts/guests report NFS timeout during mount attempt
  • Other external hosts/guests take a long while running “showmount -e” on the target guest
  • Pings work flawlessly
  • HTTP get from external nodes halts at about 2660 bytes, to which it reaches almost immediately
  • VLAN tagged interfaces on other than the default VLAN (1) do not experience these problems (cause – unknown to me at the moment).

The solution is simple – disable the offload from the NIC, and be happy. You could do it using the following line:

ethtool -K eth0 tx off

This should do the trick. It is required only on Dom0, and was tested to work well with my own method of configuring bonds and VLAN tags, as described in this post.

I was able to find the required hint in this post, under comment by Alejandro Anadon, and from there, directly to Xen’s FAQ site and to the  solution mentioned above.

Due to ETHTOOL_OPTS parameter limitations, I have placed (in an ugly manner, I know) the relevant ethtool commands in /etc/rc.local – in contradiction to this great article which shows the correct way of doing this action. Seems to be solved since.

Protect Vmware guest under RedHat Cluster

Monday, November 17th, 2008

Most documentation on the net is about how to run a cluster-in-a-box under Vmware. Very few seem to care about protecting Vmware guests under real RedHat cluster with a shared storage.

This article is just about it. While I would not recommend using Vmware in such a setup, it has been the case, and that Vmware guest actually resides on the shared storage. To relocate it is out of the question, so migrating it together with other resources is the only valid option.

To do so, I have created a simple script which will accept start/stop/status arguments. The Vmware guest VMX is hard-coded into the script, but in an easy-to-change format. This script will attempt to freeze the Vmware guest, and only if it fails, to shut it down. Mind you that the blog’s HTML formatting might alter quotation marks into UTF-8 marks which will not be understood well by shell.

#!/bin/bash
# This script will start/stop/status VMware machine
# Written by Ez-Aton
# http://www.tournament.org.il/run
 
# Hardcoded. Change to match your own settings!
VMWARE="/export/vmware/hosts/Windows_XP_Professional/Windows XP Professional.vmx"
VMRUN="/usr/bin/vmrun"
TIMEOUT=60
 
function status () {
  # This function will return success if the VM is up
  $VMRUN list | grep "$VMWARE" &amp;&gt;/dev/null
  if [[ "$?" -eq "0" ]]
  then
    echo "VM is up"
    return 0
  else
    echo "VM is down"
    return 1
  fi
}
 
function start () {
  # This function will start the VM
  $VMRUN start "$VMWARE"
  if [[ "$?" -eq "0" ]]
  then
    echo "VM is starting"
    return 0
  else
    echo "VM failed"
    return 1
  fi
}
 
function stop () {
  # This function will stop the VM
  $VMRUN suspend "$VMWARE"
  for i in `seq 1 $TIMEOUT`
  do
    if status
    then
      echo
    else
      echo "VM Stopped"
      return 0
    fi
    sleep 1
  done
  $VMRUN stop "$VMWARE" soft
}
 
case "$1" in
start)     start
        ;;
stop)      stop
        ;;
status)   status
        ;;
esac
RET=$?
 
exit $RET

Since the formatting is killed by the blog, you can find the script here: vmware1

I intend on building a “real” RedHat Cluster agent script, but this should do for the time being.

Enjoy!

Xen Networking – Bonding with VLAN Tagging

Thursday, October 23rd, 2008

The simple scripts in /etc/xen/scripts which manage networking are fine for most usages, however, when your server is using bonding together with VLAN tagging (802.11q) you should consider an alternative.

A PDF document written by Mark Nielsen, GPS Senior Consultant, Red Hat, Inc (I lost the original link, sorry) named “BOND/VLAN/Xen Network Configuration” as a service to the community, game me few insights on the subject. Following one of its references, I saw a bit more elegant method of doing a bridging setup under RedHat, which takes managing the bridges away from xend, and leaves it at the system level. Lets see how it’s done on RedHat style Linux distribution.

Manage your normal networking configurations

If you’re using VLAN tagging over bonding, than you should have to setup a bonding device (be it bond0) which has definitions such as this:

/etc/sysconfig/network-scripts/ifcfg-eth0 and /etc/sysconfig/network-scripts/ifcfg-eth1

DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
ISALIAS=no

/etc/sysconfig/network-scripts/ifcfg-bond0

DEVICE=bond0
BOOTPROTO=none
BONDING_OPTS=”mode=1 miimon=100″
ONBOOT=yes

This is rather stright-forward, and should be rather a default for such a setup. Now comes the more interesting part. Originally, the next configuration part would be bond0.2 and bond0.3 (in my example). The original configuration would have looked like this (this is in bold because I tend to fast-read myself, and tend to miss things too often). This is not how it should look when we’re done!

/etc/sysconfig/network-scripts/ifcfg-bond0.2 (same applies to ifcfg-bond0.3)

DEVICE=bond0.2
BOOTPROTO=static
IPADDR=192.168.0.2
NETMASK=255.255.255.0
ONBOOT=yes
VLAN=yes

Configure bridging

To setup a bridge device for bond0.2, replace the mentioned above ifcfg-bond0.2 with this new /etc/sysconfig/network-scripts/ifcfg-bond0.2

DEVICE=bond0.2
BOOTPROTO=static
ONBOOT=yes
VLAN=yes
BRIDGE=xenbr0

Now, create a new file /etc/sysconfig/network-scripts/ifcfg-xenbr0

DEVICE=xenbr0
BOOTPROTO=static
IPADDR=192.168.0.2
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=bridge

Now, on network restart, the bridge will be brought up, holding the right IP address – all done by initscripts, with no Xen intervention. You will want to repeat the last the “Configure bridge” part for any additional bridge you want to be enabled for Xen machines.

Don’t let Xen bring any bridges up

This is the last part of our drill, and it is very important. If you don’t do it, you’ll get a nice networking mess. As said before, Xen (community), by default, can’t handle bondings or vlan tags, so it will attempt to create or modify bridges to eth0 or the likes. Edit /etc/xen/xend-config.sxp and remark any line containing a directive containing starting with “network-script“. Such a directive would be, for example

(network-script network-bridge)

Restart xend and restart networking. You should now be able to configure VMs to use xenbr0 and xenbr1, etc (according to your own personal settings).

Xen VMs performance collection

Saturday, October 18th, 2008

Unlike VMware Server, Xen’s HyperVisor does not allow an easy collection of performance information. The management machine, called “Domain-0″ is actually a privileged virtual machine, and thus – get its own small share of CPUs and RAM. Collecting performance information on it will lead to, well, collecting performance information for a single VM, and not the whole bunch.

Local tools, such as “xentop” allows collection of information, however, combining this with Cacti, or any other SNMP-based collection tool is a bit tricky.

A great solution is provided by Ian P. Christian in his blog post about Xen monitoring. He has created a Perl script to collect information. I have taken the liberty to fix several minor things with his permission. The modified scripts are presented below. Name the script (according to your version of Xen) “/usr/local/bin/xen_stats.pl” and set it to be executable:

For Xen 3.1

?Download xen_stats.pl
#!/usr/bin/perl -w
 
use strict;
 
# declare...
sub trim($);
#<a href="/blog/files/xen_cloud.tar.gz" title="xen_cloud.tar.gz" target="_blank">xen_cloud.tar.gz</a>
# we need to run 2 iterations because CPU stats show 0% on the first, and I'm putting .1 second betwen them to speed it up
my @result = split(/\n/, `xentop -b -i 2 -d.1`);
 
# remove the first line
shift(@result);
 
shift(@result) while @result &amp;&amp; $result[0] !~ /^xentop - /;
 
# the next 3 lines are headings..
shift(@result);
shift(@result);
shift(@result);
shift(@result);
 
foreach my $line (@result)
{
  my @xenInfo = split(/[\t ]+/, trim($line));
  printf("name: %s, cpu_sec: %d, cpu_percent: %.2f, vbd_rd: %d, vbd_wr: %d\n",
    $xenInfo[0],
    $xenInfo[2],
    $xenInfo[3],
    $xenInfo[14],
    $xenInfo[15]
    );
}
 
# trims leading and trailing whitespace
sub trim($)
{
  my $string = shift;
  $string =~ s/^\s+//;
  $string =~ s/\s+$//;
  return $string;
}

For Xen 3.2 and Xen 3.3

?Download xen_stats.pl
#!/usr/bin/perl -w
 
use strict;
 
# declare…
sub trim($);
 
# we need to run 2 iterations because CPU stats show 0% on the first, and I’m putting .1 second between them to speed it up
my @result = split(/\n/, `/usr/sbin/xentop -b -i 2 -d.1`);
 
# remove the first line
shift(@result);
shift(@result) while @result &amp;&amp; $result[0] !~ /^[\t ]+NAME/;
shift(@result);
 
foreach my $line (@result)
{
        my @xenInfo = split(/[\t ]+/, trim($line));
        printf(“name: %s, cpu_sec: %d, cpu_percent: %.2f, vbd_rd: %d, vbd_wr: %d\n,
        $xenInfo[0],
        $xenInfo[2],
        $xenInfo[3],
        $xenInfo[14],
        $xenInfo[15]
        );
}
# trims leading and trailing whitespace
sub trim($)
{
        my $string = shift;
        $string =~ s/^\s+//;
        $string =~ s/\s+$//;
        return $string;
}

Cron settings for Domain-0

Create a file “/etc/cron.d/xenstat” with the following contents:

# This will run xen_stats.pl every minute
*/1 * * * * root /usr/local/bin/xen_stats.pl > /tmp/xen-stats.new && cat /tmp/xen-stats.new > /var/run/xen-stats

SNMP settings for Domain-0

Add the line below to “/etc/snmp/snmpd.conf” and then restart the snmpd service

extend xen-stats   /bin/cat /var/run/xen-stats

Cacti

I reduced Ian Cacti script to be based on a per-server setup, meaning this script gets the host (dom-0) name from Cacti, but cannot support live migrations. I will try to deal with combining live migrations with Cacti in the future.

Download and extract my modified xen_cloud.tar.gz file. Extract it, place the script and config in its relevant location, and import the template into Cacti. It should work like charm.

A note – the PHP script will work only on PHP5 and above. Works flawlessly on Centos5.2 for me.

x86 Scale Up

Thursday, September 11th, 2008

I have been introduced to a very cool software/hardware combination yesterday. It has been, without exaggerating, one of the coolest things I have seen in a while.

As you may know, x86 has an issue with scaling up. It’s that x86 architectures and price don’t justify scaling up to tenths and hundreds of CPUs. The multi-core technology introduced in the last few years made a four-way server seem trivial today, where in the past it was a high-performance server for large (and expensive) data centers. It is very common today to purchase an eight-way server at a price of a mere commodity server – all thanks to the multi-core technology.

However, when compared to the large Unix data centers, where 64 and 128 cpus are rather common (I will emphasis – the large Unix data centers), although nowadays, per-core, x86 is somewhat more powerful, for a large load set, it could not rival any many-way server. The common solution with x86 was to “scale out” – add more cheap servers and manage the workload in a more distributed way. Yes, you might pay with communication overhead, however, this can be made cheaper still.

With a distributed load sharing came the illnesses of communication latencies. Myrinet, 10Gb/s Ethernet and Infiniband were a common, yet expensive (as it was a niche market) solutions, and still – for distribution of high loads, they were well worth it. Still – a large scaled-up server based on x86 was nowhere to find.

No more. With ScaleMP’s concatenation you can “bundle” a set of servers using Infiniband link into a single huge-multi-way, huge-ram server at a very low cost, relatively.

Think about how you can purchase your current server, for example, your eight-core server (two quad-core cpus), and in time, scale it up into more powerful server (add another two quad-core cpus), or add more RAM, or more network interfaces, or whatever.

This is not as fast as the IBM x3950 board-link (excuse me for not knowing the exact name), so it is not ideal for databases or systems which tend to create a lot of cache-misses, however for large (actually – very large) SMP systems, it could be great. It can allow any company which feels that the current server might not be enough the safety and assurance that they can actually scale up, using the same server, into adding more cpus and more RAM to the server at any time.

I is supported, as far as I know, only for Linux at the time being. It diminishes some of the distance between the large Unix machines and the modern Linux, for a fracture of the price.

I liked it.

Combining Xen and VMware-Server on the same Physical server

Saturday, February 23rd, 2008

Doesn’t work. It will work fine up to the step where you actually try to active one of the VMware virtual machines. And then your kernel will panic.

Works fine without Xen kernel (but without Xen, of course). Pity.

Was tested on Centos5.1 64bit.