Posts Tagged ‘performance’

Connecting EMC/NetApp shelves as JBOD to a Linux machine

Wednesday, April 29th, 2015

Let’s say you have old shelves of either EMC or NetApp with SAS or SATA disks in them. And let’s say you want to connect them via FC to a Linux machine and have some nice ZFS machine/cluster, or whatever else. There are few things to know, and to attend in order for it to work.

The first one is the sector size. For NetApp – this applies only to non SATA disks (I don’t know about SSDs, though), and for EMC this could apply, as far as I noticed, to all disks – sector size is not 512 bytes, but 520 – the additional 8 bytes are used for block checksum. Linux does not handle well 520 blocks – the following error message will appear in the logs:

Unsupported sector size 520.

To solve it, we will need to identify the disks – using sg3_utils (in Centos-like – yum install sg3_utils) and then modify them to block size of 512 bytes. To identify the disks, run:

sg_scan -i
/dev/sg0: scsi0 channel=3 id=0 lun=0
HP P410i 3.66 [rmb=0 cmdq=1 pqual=0 pdev=0xc]
/dev/sg1: scsi0 channel=0 id=0 lun=0
HP LOGICAL VOLUME 3.66 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg2: scsi3 channel=0 id=0 lun=0 [em]
hp DVD A DS8A5LH 1HE3 [rmb=1 cmdq=0 pqual=0 pdev=0x5]
/dev/sg3: scsi1 channel=0 id=0 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg4: scsi1 channel=0 id=1 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg5: scsi1 channel=0 id=2 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg6: scsi1 channel=0 id=3 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg7: scsi1 channel=0 id=4 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg8: scsi1 channel=0 id=5 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg9: scsi1 channel=0 id=6 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg10: scsi1 channel=0 id=7 lun=0
SEAGATE SX3500071FC DA04 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg11: scsi1 channel=0 id=8 lun=0
FUJITSU MXW3300FE 0906 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg12: scsi1 channel=0 id=9 lun=0
FUJITSU MXW3300FE 0906 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg13: scsi1 channel=0 id=10 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg14: scsi1 channel=0 id=11 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg15: scsi1 channel=0 id=12 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg16: scsi1 channel=0 id=13 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]
/dev/sg17: scsi1 channel=0 id=14 lun=0
SEAGATE SX3300007FC D41B [rmb=0 cmdq=1 pqual=0 pdev=0x0]

So, for each sg device (member of our batch of disks) we need to modify the sector size.

Two ways to do so – the first suggested by this post here, is by using sg_format in the following manner:

sg_format –format –size=512 /dev/sg2

Another post suggested using a dedicated program called ‘setblocksize’. I followed this one, and it worked fine. I had to power cycle the disks before the Linux could use them.

I did notice that disk performance were not bright. I got about 45MB/s write, and about 65-70 MB/s read for large sequential operations, using something like:

dd bs=1M if=/dev/sdf of=/dev/null bs=1M count=10000
dd bs=1M if=/dev/null of=/dev/sdf oflag=direct count=10000 # WARNING – this writes on the disk. Do not use for disks with data!

Fairly disappointing. Also, using multipath, when the shelf is connected to one FC port, and then back to another, showed me that with the setting:

path_grouping_policy multibus

I got about 10MB/s less compared to using “failover” flag (the default for Centos 6). Whatever modification I did to the multipathd.conf, I was unable to exceed this number when using multiple access. These results were consistent when using multibus or group_by_serial, however, when a single path was active and the other was passive, It clearly showed better. I did modify rr_min_io and rr_min_io_rq, but with no effect.

The low disk performance could suggest I need to flush the original disk firmware, however, I am not sure I will do so. If anyone is reading this and had different results – I would love to hear about it.

XenServer 6.5 PCI-Passthrough

Thursday, April 16th, 2015

While searching the web for how to perform PCI-Passthrough on XenServers, we mostly get info about previous versions. Since I have just completed setting up PCI-Passthrough on XenServer version 6. 5 (with recent update 8, just to give you some notion of the exact time frame), I am sharing it here.

Hardware: Cisco UCS blades, with fNIC. I wish to pass through two FC HBAs into a VM (it is going to act as a backup server, and I need it accessing the FC tape). While all my XenServers in this pool have four (4) FC HBAs, this particular XenServer node has six (6). I am intending the first four for SR communication and the remaining two for the PCI Passthrough process.

This is the output of ‘lspci | grep Fibre':

0b:00.0 Fibre Channel: Cisco Systems Inc VIC FCoE HBA (rev a2)
0c:00.0 Fibre Channel: Cisco Systems Inc VIC FCoE HBA (rev a2)
0d:00.0 Fibre Channel: Cisco Systems Inc VIC FCoE HBA (rev a2)
0e:00.0 Fibre Channel: Cisco Systems Inc VIC FCoE HBA (rev a2)
0f:00.0 Fibre Channel: Cisco Systems Inc VIC FCoE HBA (rev a2)
10:00.0 Fibre Channel: Cisco Systems Inc VIC FCoE HBA (rev a2)

So, I want to pass through 0f:00.0 and 10:00.0. I had to add to /boot/extlinux.conf the following two entries after the word ‘splash’ and before the three dashes:

pciback.hide=(0f:00.0)(10:00.0) xen-pciback.hide=(0f:00.0)(10:00.0)

Initially, and contrary to the documentation, the parameter pciback.hide had no effect. As soon as the VM started, the command ‘multipath -l‘ would hang forever (or until hard reset to the host).

To apply the settings above, run (for a good measure. Don’t think we need it, but did not read anything about it): ‘extlinux -i /boot‘ and then reboot.

Now, when the host is back, we need to add the devices to the VM. Make sure that the VM is in ‘off’ state before doing that. Your command would look like this:

xe vm-param-set uuid=<VM UUID> other-config:pci=0/0000:0f:00.0,0/0000:10:00.0

The expression ‘0/0000′ is required. You can search for its purpose, however, in most cases, your value would look exactly like mine – ‘0/0000′

Since my VM is Windows, here it almost ends: Start the VM, and if it boots correctly, Install Cisco VIC into it, as if it were a physical host. You’re done.

XenServer 6.0 with DRBD

Wednesday, January 18th, 2012

DRBD is a low-cost shared-SAN-like solution, which has several great benefits, among which are no single point of failure, and very low cost (local storage and network cable). Its main disadvantages are in the need to constantly monitor it, and make sure it does what’s expected. Also – in some cases – performance might be affected greatly.

If you need XenServer pool with VMs XenMotion (used to call it LiveMigration. I liked it better then…), but you cannot afford or do not want classic shared storage acting a single point of failure, DRBD could be for you. You have to understand the limitations, however.

The most important limitation is with data consistency. If you aim at using it as Active/Active, as I have, you need to make sure that under any circumstance you will not have split brain, as it will mean losing data (you will recover to an older point in time). If you aim at Active/Passive, or all your VMs will run on a single host, then the danger is lower, however – for A/A, and VMs spread across both hosts – the danger is imminent, and you should be aware of it.

This does not mean that you will have to run crying in case of split brain. It means you might be required to export/import VMs to maintain consistent data, and that you will have a very long downtime. Kinda defies the purpose of XenMotion and all…

Using the DRBD guid here, you will find an excellent solution, but not a complete one. I will describe my additions to this document.

So, first, you need to download the DRBD packages. I have re-packaged them, as they did not match XenServer with XS60E003 update. You can grub this particular tar.gz here: drbd-8.3.12-xenserver6.0-xs003.tar.gz . I did not use DRBD 8.4.1, as it has shown great instability and liked getting split-brained all the time. Don’t want it with our system, do we?

Make sure you have defined the private link between your hosts, both as a network interface, as described, and in both servers’ /etc/hosts file. It will be easier later. Verify that the host hostname matches the configuration file, else DRBD will not start.

Next, follow the mentioned guide.

Unlike this guide, I did not define DRBD to be Active/Active in the configuration file. I have noticed that upon reboot of the pool master (and always it), probably due to timing issues, as the XE Toolstack did not release the DRBD device, it would have started in split-brain mode, and I was incapable of handling it correctly. No matter when I have tried to set the service to start, as early as possible, it would have always start in split-brain mode.

The workaround was to let it start in passive mode, and while being read-only device, XE Toolstack cannot use it. Then I wait (in /etc/rc.local) for it to complete sync, and connect the PBD.

You will need each host PBD for this specific SR.

You can do it by running:

1
2
3
4
for i in `xe host-list --minimal` ; do \
echo -n "host `xe host-param-get param-name=hostname uuid=$i`  "
echo "PBD `xe pbd-list sr-uuid=$(xe  sr-list name-label=drbd-sr1 --minimal) --minimal`"
done

This will result in a line per host with the DRBD PBD uuid. Replace drbd-sr1 with your actual DRBD SR name.

You will require this info later.

My drbd.conf file looks like this:

?Download drbd.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example
 
#include "drbd.d/global_common.conf";
#include "drbd.d/*.res";
 
resource drbd-sr1 {
protocol C;
startup {
degr-wfc-timeout 120; # 2 minutes.
outdated-wfc-timeout 2; # 2 seconds.
#become-primary-on both;
}
 
handlers {
    split-brain "/usr/lib/drbd/notify.sh root";
}
 
disk {
max-bio-bvecs 1;
no-md-flushes;
no-disk-flushes;
no-disk-barrier;
}
 
net {
allow-two-primaries;
cram-hmac-alg "sha1";
shared-secret "Secr3T";
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-1pri consensus;
after-sb-2pri disconnect;
#after-sb-2pri call-pri-lost-after-sb;
max-buffers 8192;
max-epoch-size 8192;
sndbuf-size 1024k;
}
 
syncer {
rate 1G;
al-extents 2099;
}
 
on xenserver1 {
device /dev/drbd1;
disk /dev/sda3;
address 10.1.1.1:7789;
meta-disk internal;
}
on xenserver2 {
device /dev/drbd1;
disk /dev/sda3;
address 10.1.1.2:7789;
meta-disk internal;
}
}

I did not force them both to become primary, as split-brain handling in A/A mode is very complex. I have forced them to start as secondary.

Then, in /etc/rc.local, I have added the following lines:

?Download rc.local
1
2
3
4
5
6
7
echo 1 > /sys/devices/system/cpu/cpu1/online
while grep sync /proc/drbd > /dev/null 2>&1
do
        sleep 5
done
/sbin/drbdadm primary all
/opt/xensource/bin/xe pbd-plug uuid=dfb02709-2483-a11a-cb0e-eac0fb05d8e2

This performs the following:

  • Add an additional core to Domain 0, to reduce chances of CPU overload with DRBD
  • Waits for any sync to complete (if DRBD failed, it will continue, but you will have a split brain, or no DRBD at all)
  • Brings the DRBD device to primary mode. I have had only one DRBD device, but this can be performed selectively for each device
  • Reconnects the PBD which, till this point in the boot sequence, was disconnected. An important note – replace the uuid with the one discovered above for each host – each host should unplug its own PBD.

To sum it up – until sync has been completed, the PBD will not be plugged, and until then, no VMs can run on this SR. Split brain handling for A/P configuration is so much easier.

Some additional notes:

  • I have failed horribly when the interconnect cable was down. I did not implement hardware fencing mechanisms, but it would probably be a very good practice for production systems. Disconnecting the cross cable will result in a split brain.
  • For this system to be worthy, it has to have external monitoring. DRBD must be monitored at all times.
  • Practice and document cases of single node failure, both nodes failure, host master failure, etc. Make sure you know how to react before it happens in real-life.
  • Performance was measured on a Linux RHEL6 VM to be about 82MB/s. The hardware it was tested on was Dell PE R610 with a very nice RAID5 array, etc. When the 2nd host was down, performance resulted in abour 450MB/s, so the bandwidth, in this particular case, matters.
  • Performance test was done using the command:
    dd if=/dev/zero bs=1M of=/tmp/test_file.dd oflag=direct
    Without the oflag=direct, the system will overload the disk write cache of the OS, and not the disk itself (at least – not immediately).
  • I did not test random-access performance.
Hope it helps

Xen VMs performance collection

Saturday, October 18th, 2008

Unlike VMware Server, Xen’s HyperVisor does not allow an easy collection of performance information. The management machine, called “Domain-0″ is actually a privileged virtual machine, and thus – get its own small share of CPUs and RAM. Collecting performance information on it will lead to, well, collecting performance information for a single VM, and not the whole bunch.

Local tools, such as “xentop” allows collection of information, however, combining this with Cacti, or any other SNMP-based collection tool is a bit tricky.

A great solution is provided by Ian P. Christian in his blog post about Xen monitoring. He has created a Perl script to collect information. I have taken the liberty to fix several minor things with his permission. The modified scripts are presented below. Name the script (according to your version of Xen) “/usr/local/bin/xen_stats.pl” and set it to be executable:

For Xen 3.1

?Download xen_stats.pl
#!/usr/bin/perl -w
 
use strict;
 
# declare...
sub trim($);
#<a href="/blog/files/xen_cloud.tar.gz" title="xen_cloud.tar.gz" target="_blank">xen_cloud.tar.gz</a>
# we need to run 2 iterations because CPU stats show 0% on the first, and I'm putting .1 second betwen them to speed it up
my @result = split(/\n/, `xentop -b -i 2 -d.1`);
 
# remove the first line
shift(@result);
 
shift(@result) while @result && $result[0] !~ /^xentop - /;
 
# the next 3 lines are headings..
shift(@result);
shift(@result);
shift(@result);
shift(@result);
 
foreach my $line (@result)
{
  my @xenInfo = split(/[\t ]+/, trim($line));
  printf("name: %s, cpu_sec: %d, cpu_percent: %.2f, vbd_rd: %d, vbd_wr: %d\n",
    $xenInfo[0],
    $xenInfo[2],
    $xenInfo[3],
    $xenInfo[14],
    $xenInfo[15]
    );
}
 
# trims leading and trailing whitespace
sub trim($)
{
  my $string = shift;
  $string =~ s/^\s+//;
  $string =~ s/\s+$//;
  return $string;
}

For Xen 3.2 and Xen 3.3

?Download xen_stats.pl
#!/usr/bin/perl -w
 
use strict;
 
# declare…
sub trim($);
 
# we need to run 2 iterations because CPU stats show 0% on the first, and I’m putting .1 second between them to speed it up
my @result = split(/\n/, `/usr/sbin/xentop -b -i 2 -d.1`);
 
# remove the first line
shift(@result);
shift(@result) while @result && $result[0] !~ /^[\t ]+NAME/;
shift(@result);
 
foreach my $line (@result)
{
        my @xenInfo = split(/[\t ]+/, trim($line));
        printf(“name: %s, cpu_sec: %d, cpu_percent: %.2f, vbd_rd: %d, vbd_wr: %d\n,
        $xenInfo[0],
        $xenInfo[2],
        $xenInfo[3],
        $xenInfo[14],
        $xenInfo[15]
        );
}
# trims leading and trailing whitespace
sub trim($)
{
        my $string = shift;
        $string =~ s/^\s+//;
        $string =~ s/\s+$//;
        return $string;
}

Cron settings for Domain-0

Create a file “/etc/cron.d/xenstat” with the following contents:

# This will run xen_stats.pl every minute
*/1 * * * * root /usr/local/bin/xen_stats.pl > /tmp/xen-stats.new && cat /tmp/xen-stats.new > /var/run/xen-stats

SNMP settings for Domain-0

Add the line below to “/etc/snmp/snmpd.conf” and then restart the snmpd service

extend xen-stats   /bin/cat /var/run/xen-stats

Cacti

I reduced Ian Cacti script to be based on a per-server setup, meaning this script gets the host (dom-0) name from Cacti, but cannot support live migrations. I will try to deal with combining live migrations with Cacti in the future.

Download and extract my modified xen_cloud.tar.gz file. Extract it, place the script and config in its relevant location, and import the template into Cacti. It should work like charm.

A note – the PHP script will work only on PHP5 and above. Works flawlessly on Centos5.2 for me.

Graphing on-demand Linux system performance parameters

Tuesday, May 20th, 2008

Current servers are way more powerful than we could have imagined before. With quad-core CPUs, even the simple dual-socket servers contain lots of horse-power. Remember our attitude towards CPU power five years ago, and see that we’re way beyond our needs.

When modern servers are equipped with at least eight cores, other, non-CPU related issues become noticeable. Storage, as always, remains a common bottleneck, and, as an increase in expectations always accommodate increase in abilities, memory and other elements can be the cause for performance degradation.

sar‘ is a known tool for Linux and other Unix flavors, however, understanding the contexts within is not trivial, and while the data is there, figuring what is relevant for the issue at hand becomes, with more disk devices, and more CPUs, even more complicated.

kSar is a simple java utility which makes this whole mess into a simple, readable graphs, capable of being exported to PDF for the pleasure of the customers (where applicable). It parses existing sar files, or the extracted contents of ‘sa’ files (from, by default, /var/log/sa/). It is a useful tool, and I recommend it with all my heart.

Alas, when it comes to parsing ‘sa’ files, you will need, in most cases, either to export the file into text on the source machine, or use a similar version of sysstat tools, as changes in versions reflect changes in the binary format used by sar.

You can obtain the sysstat utils from here, and compile it for your needs. You will need only ‘sar’ on your own machine.

An important note – you will not be able to compile sysstat utils using GCC 4.x. Only 3.x will do it. The error would look like:

warning: ‘packed’ attribute ignored for field of type `unsigned char…

followed by compilation errors. Using GCC version 3.x will work just fine.

Some few small insights

Wednesday, March 12th, 2008

Lately I have been overloaded above my capabilities. This did not prevent me from doing all kind of things, but most of them are too small to justify a real entry here, so I have decided to make a small collection of small stuff someone might need to know, in order to make it indexed in search engines. These small insights might save some time for someone. This is a noble cause.

1. Oreon is a nice overlay for Nagios, however, it is poorly documented, and some of the existing docs are in French. I have put hours on building it into a working setup, and I hope to be able to write down the process as is.

2. “Sun Java System Active Server Pages” does not support 64bit Linux installations – at least not if you’re interested in using it with your existing Apache server. Look here. Seems nothing has changed.

3. Under Ubuntu 7.10, Compiz suffers from a major memory leak when using NVidia display adapters. You can read about it in the bug page. I was able, thanks to this link, to workaround it using compiz –indirect-rendering . Does not see to cause any ill-effect on my display performance.

4. Suse 10 and wireless cards – This one is a great guide, which I would happily recommend.

5. Flushing the existing read buffer for your Linux machine (should never be done, unless you’re testing performance) can be done by running the following command:

echo 1 > /proc/sys/vm/drop_caches

Seems to be enough for today. Hope these tips help.

Quick provisioning of virtual machines

Friday, February 1st, 2008

When one wants to achieve fast provisioning of virtual machines, some solutions might come into account. The one I prefer uses Linux LVM snapshot capabilities to duplicate one working machine into few.

This can happen, of course, only if the host running VMware-Server is Linux.

LVM snapshots have one vast disadvantage – performance. When a block on the source of the snapshot is being changed for the first time, the original block is being replicated to each and every snapshot COOW space. It means that a creation of a 1GB file on a volume having ten snapshots means a total copy of 10GB of data across your disks. You cannot ignore this performance impact.

LVM2 has support for read/write snapshots. I have come up with a nice way of utilizing this capability to my benefit. An R/W snapshot which is being changed does not replicate its changes to any other snapshot. All changes are considered local to this snapshot, and are being maintained only in its COOW space. So adding a 1GB file to a snapshot has zero impact on the rest of the snapshots or volumes.

The idea is quite simple, and it works like this:

1. Create adequate logical volume with a given size (I used 9GB for my own purposes). The name of the LV in my case will be /dev/VGVM3/centos-base

2. Mount this LV on a directory, and create a VM inside it. In my case, it’s in /vmware/centos-base

3. Install the VM as the baseline for all your future VMs. If you might not want Apache on some of them, don’t install it on the baseline.

4. Install vmware-tools on the baseline.

5. Disable the service “kudzu”

6. Update as required

7. In my case I always use DHCP. You can set it to obtain its IP once from a given location, or whatever you feel like.

8. Shut down the VM.

9. In the VM’s .vmx file add a line like this:

uuid.action = “create”

I have added below (expand to read) two scripts which will create the snapshot, mount it and register it, including new MAC and UUID.

Press below for the scripts I have used to create and destroy VMs

create-replica.sh:

#!/bin/sh
# This script will replicate vms from a given (predefined) source to a new system
# Written by Ez-Aton, http://www.tournament.org.il/run
# Arguments: name

# FUNCITONS BE HERE
test_can_do () {
# To be able to snapshot, we need a set of things to happen
if [ -d $DIR/$TARGET ] ; then
echo “Directory already exists. You don’t want to do it…”
exit 1
fi
if [ -f $VG/$TARGET ] ; then
echo “Target snapshot exists”
exit 1
fi
if [ `vmrun list | grep -c $DIR/$SRC/$SRC.vmx` -gt “0” ] ; then
echo “Source VM is still running. Shut it down before proceeding”
exit 1
fi
if [ `vmware-cmd -l | grep -c $DIR/$TARGET/$SRC.vmx` -ne “0” ] ; then
echo “VM already registered. Unregister first”
exit 1
fi
}

do_snapshot () {
# Take the snapshot
lvcreate -s -n $TARGET -L $SNAPSIZE $VG/$SRC
RET=$?
if [ “$RET” -ne “0” ]; then
echo “Failed to create snapshot”
exit 1
fi
}

mount_snapshot () {
# This function creates the required directories and mounts the snapshot there
mkdir $DIR/$TARGET
mount $VG/$TARGET $DIR/$TARGET
RET=$?
if [ “$RET” -ne “0” ]; then
echo “Failed to mount snapshot”
exit 1
fi
}

alter_snap_vmx () {
# This function will alter the name in the VMX and make it the $TARGET name
cat $DIR/$TARGET/$SRC.vmx | grep -v “displayName” > $DIR/$TARGET/$TARGET.vmx
echo “displayName = \”$TARGET\”” >> $DIR/$TARGET/$TARGET.vmx
cat $DIR/$TARGET/$TARGET.vmx > $DIR/$TARGET/$SRC.vmx
\rm $DIR/$TARGET/$TARGET.vmx
}

register_vm () {
# This function will register the VM to VMWARE
vmware-cmd -s register $DIR/$TARGET/$SRC.vmx
}

# MAIN
if [ -z “$1″ ]; then
echo “Arguments: The target name”
exit 1
fi

# Parameters:
SRC=centos-base         #The name of the source image, and the source dir
PREFIX=centos             #All targets will be created in the name centos-$NAME
DIR=/vmware               #My VMware VMs default dir
SNAPSIZE=6G              #My COOW space
VG=/dev/VGVM3           #The name of the VG
TARGET=”$PREFIX-$1″

test_can_do
do_snapshot
mount_snapshot
alter_snap_vmx
register_vm
exit 0

remove-replica.sh:

#!/bin/sh
# This script will remove a snapshot machine
# Written by Ez-Aton, http://www.tournament.org.il/run
# Arguments: machine name

#FUNCTIONS
does_it_exist () {
# Check if the described VM exists
if [ `vmware-cmd -l | grep -c $DIR/$TARGET/$SRC.vmx` -eq “0” ]; then
echo “No such VM”
exit 1
fi
if [ ! -e $VG/$TARGET ]; then
echo “There is no matching snapshot volume”
exit 1
fi
if [ `lvs $VG/$TARGET | awk ‘{print $5}’ | grep -c $SRC` -eq “0” ]; then
echo “This is not a snapshot, or a snapshot of the wrong LV”
exit 1
fi
}

ask_a_thousand_times () {
# This function verifies that the right thing is actually done
echo “You are about to remove a virtual machine and an LVM. Details:”
echo “Machine name: $TARGET”
echo “Logical Volume: $VG/$TARGET”
echo -n “Are you sure? (y/N): “
read RES
if [ “$RES” != “Y” ]&&[ “$RES” != “y” ]; then
echo “Decided not to do it”
exit 0
fi
echo “”
echo “You have asked to remove this machine”
echo -n “Again: Are you sure? (y/N): “
read RES
if [ “$RES” != “Y” ]&&[ “$RES” != “y” ]; then
echo “Decided not to do it”
exit 0
fi
echo “Removing VM and snapshot”
}

shut_down_vm () {
# Shut down the VM and unregister it
vmware-cmd $DIR/$TARGET/$SRC.vmx stop hard
vmware-cmd -s unregister $DIR/$TARGET/$SRC.vmx
}

remove_snapshot () {
# Umount and remove the snapshot
umount $DIR/$TARGET
RET=$?
if [ “$RET” -ne “0” ]; then
echo “Cannot umount $DIR/$TARGET”
exit 1
fi
lvremove -f $VG/$TARGET
RET=$?
if [ “$RET” -ne “0” ]; then
echo “Cannot remove snapshot LV”
exit 1
fi
}

remove_dir () {
# Removes the mount point
rmdir $DIR/$TARGET
}

#MAIN
if [ -z “$1″ ]; then
echo “No machine name. Exiting”
exit 1
fi

#PARAMETERS:
DIR=/vmware                #VMware default VMs location
VG=/dev/VGVM3            #The name of the VG
PREFIX=centos              #Prefix to the name. All these VMs will be called centos-$NAME
TARGET=”$PREFIX-$1″
SRC=centos-base           #The name of the baseline image, LVM, etc. All are the same

does_it_exist
ask_a_thousand_times
shut_down_vm
remove_snapshot
remove_dir

exit 0

Pros:

1. Very fast provisioning. It takes almost five seconds, and that’s because my server is somewhat loaded.

2. Dependable: KISS at its marvel.

3. Conservative on space

4. Conservative on I/O load (unlike the traditional use of LVM snapshot, as explained in the beginning of this section).

Cons:

1. Cannot streamline the contents of snapshot into the main image (LVM team will implement it in the future, I think)

2. Cannot take a snapshot of a snapshot (same as above)

3. If the COOW space of any of the snapshots is full (viewable through the command ‘lvs‘) then on boot, the source LV might not become active (confirmed RH4 bug, and this is the system I have used)

4. My script does not edit/alter /etc/fstab (I have decided it to be rather risky, and it was not worth the effort at this time)

5. My script does not check if there is enough available space in the VG. Not required, as it will fail if creation of LV will fail

You are most welcome to contribute any further changes done to this script. Please maintain my URL in the script if you decide to use it.

Thanks!

net-snmp broken in RHEL (and Centos, of course) – diskio

Saturday, June 9th, 2007

I’ve had a belief for quite a while now that Linux, unlike other types of systems, was unable to produce any I/O SNMP information. I only recently found out that it was partially true – all production-level distros, such as RedHat (and Centos, for that matter) were unable to produce any output for any SNMP DISKIO queries.

I had found a bugzilla entry about it, so I raise the glove in a request to any of the maintainers of an RH-compatible repositories to recompile (and maintain, of course) an alternate net-snmp package which supports diskio.

Meanwhile, I have found this blog post, which offers an alternate (and quite clumsy, yet working) solution to the disk performance measurement issue in Linux. I haven’t tried it yet, but I will, rather soon.

—Update—

I have used the script from the blog post mentioned above, and it works.

Speed could be an issue. Comparing two servers the speed differential was amazing.

Both servers are connected on the same switch as the server running the query is connected. Server1 has a P2 233MHz CPU, while Server2 has a dual 2.8GHz Xion CPU.

~$ time snmpwalk -c COMMUNITY -v2c Server1 1.3.6.1.4.1.2021.13.15 > /dev/null

real 0m0.311s
user 0m0.024s
sys 0m0.020s

~$ time snmpwalk -c COMMUNITY -v2c Server2 1.3.6.1.4.1.2021.13.15 > /dev/null

real 0m8.303s
user 0m0.044s
sys 0m0.012s

Looks like a huge difference. However, I believe it’s currently good enough for me.

High load average due to hardware issues

Friday, March 30th, 2007

Performance tuning is a sort of art. You know what you expect to reach, and you somehow strive towards that through selective tuning. Either your OS memory utilization, your network settings, NFS mount parameters, etc.

I’ve been to a customer who’s server acted funny. First, it had high load average – for an idle server with 2 CPUs, a load average which never gets below 1.0 can be considered high.

Viewing the logs I’ve seen lots of PS/2 error messages. It seems that the hotplug daemon had been very busy at respawning several times a second due to incorrect hardware detection – due to these PS/2 errors, and caused high load average (many processes in the CPU queue). Disconnecting the PS/2 port between the server and the KVM solved the issue, and within around 2 minutes the load average has decreased to around 0.02.

Hardware related problems are, usually, the most intensive and easy to solve performance hogging.

It has been a while… Today – Monitor I/O in AIX

Saturday, November 25th, 2006

It is a question I’ve been asked a while back, and didn’t find the time to search for it.

I have searched for an answer just now, and got to an answer given by one of the old-time gurus (most likely) in news servers (where the gurus lurk, usually). The answer is to use the command "filemon" in a syntax such as this:

filemon -O lf -o outputfile.txt

To stop the monitoring session, run "trcstop". Review the output file generated by this command, and you will be able to view the I/O interactions which happened during that time on the system.