Archive for October, 2008

Xen Networking – Bonding with VLAN Tagging

Thursday, October 23rd, 2008

The simple scripts in /etc/xen/scripts which manage networking are fine for most usages, however, when your server is using bonding together with VLAN tagging (802.11q) you should consider an alternative.

A PDF document written by Mark Nielsen, GPS Senior Consultant, Red Hat, Inc (I lost the original link, sorry) named “BOND/VLAN/Xen Network Configuration” as a service to the community, game me few insights on the subject. Following one of its references, I saw a bit more elegant method of doing a bridging setup under RedHat, which takes managing the bridges away from xend, and leaves it at the system level. Lets see how it’s done on RedHat style Linux distribution.

Manage your normal networking configurations

If you’re using VLAN tagging over bonding, than you should have to setup a bonding device (be it bond0) which has definitions such as this:

/etc/sysconfig/network-scripts/ifcfg-eth0 and /etc/sysconfig/network-scripts/ifcfg-eth1

DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
ISALIAS=no

/etc/sysconfig/network-scripts/ifcfg-bond0

DEVICE=bond0
BOOTPROTO=none
BONDING_OPTS=”mode=1 miimon=100″
ONBOOT=yes

This is rather stright-forward, and should be rather a default for such a setup. Now comes the more interesting part. Originally, the next configuration part would be bond0.2 and bond0.3 (in my example). The original configuration would have looked like this (this is in bold because I tend to fast-read myself, and tend to miss things too often). This is not how it should look when we’re done!

/etc/sysconfig/network-scripts/ifcfg-bond0.2 (same applies to ifcfg-bond0.3)

DEVICE=bond0.2
BOOTPROTO=static
IPADDR=192.168.0.2
NETMASK=255.255.255.0
ONBOOT=yes
VLAN=yes

Configure bridging

To setup a bridge device for bond0.2, replace the mentioned above ifcfg-bond0.2 with this new /etc/sysconfig/network-scripts/ifcfg-bond0.2

DEVICE=bond0.2
BOOTPROTO=static
ONBOOT=yes
VLAN=yes
BRIDGE=xenbr0

Now, create a new file /etc/sysconfig/network-scripts/ifcfg-xenbr0

DEVICE=xenbr0
BOOTPROTO=static
IPADDR=192.168.0.2
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=bridge

Now, on network restart, the bridge will be brought up, holding the right IP address – all done by initscripts, with no Xen intervention. You will want to repeat the last the “Configure bridge” part for any additional bridge you want to be enabled for Xen machines.

Don’t let Xen bring any bridges up

This is the last part of our drill, and it is very important. If you don’t do it, you’ll get a nice networking mess. As said before, Xen (community), by default, can’t handle bondings or vlan tags, so it will attempt to create or modify bridges to eth0 or the likes. Edit /etc/xen/xend-config.sxp and remark any line containing a directive containing starting with “network-script“. Such a directive would be, for example

(network-script network-bridge)

Restart xend and restart networking. You should now be able to configure VMs to use xenbr0 and xenbr1, etc (according to your own personal settings).

MySQL permissions for LVM Snapshots

Thursday, October 23rd, 2008

aking LVM snapshots as a mean of backing up MySQL is rather simple, as can be described here. However, if you are into security, you would strive to grant minimal permissions for the action to the MySQL user. Per MySQL Documentation, the required privileges is “RELOAD”. That should be enough, granted on *.*, of course.

Raw devices for Oracle on RedHat (RHEL) 5

Tuesday, October 21st, 2008

There is a major confusion among DBAs regarding how to setup raw devices for Oracle RAC or Oracle Clusterware. This confusion is caused by the turn RedHat took in how to define raw devices.

Raw devices are actually a manifestation of character devices pointing to block devices. Character devices are non-buffered, so they act as FIFO, and have no OS cache, which is why Oracle likes them so much for Clusterware CRS and voting.

On other Unix types, commonly there are two invocations for each disk device – a block device (i.e /dev/dsk/c0d0t0s1) and a character device (i.e. /dev/rdsk/c0d0t0s1). This is not the case for Linux, and thus, a special “raw”, aka character, device is to be defined for each partition we want to participate in the cluster, either as CRS or voting disk.

On RHEL4, raw devices were setup easily using the simple and coherent file /etc/sysconfig/rawdevices, which included an internal example. On RHEL5 this is not the case, and customizing in a rather less documented method the udev subsystem is required.

Check out the source of this information, at this entry about raw devices. I will add it here, anyhow, with a slight explanation:

1. Add to /etc/udev/rules.d/60-raw.rules:

ACTION==”add”, KERNEL==”sdb1″, RUN+=”/bin/raw /dev/raw/raw1 %N”

2. To set permission (optional, but required for Oracle RAC!), create a new /etc/udev/rules.d/99-raw-perms.rules containing lines such as:

KERNEL==”raw[1-2]“, MODE=”0640″, GROUP=”oinstall”, OWNER=”oracle”

Notice this:

  1. The raw-perms.rules file name has to begin with the number 99, which defines its order during rules apply, so that it will be used after all other rules take place. Using lower numbers might cause permissions to be incorrect.
  2. The following permissions have to apply:
  • OCR Device(s): root:oinstall , mode 0640
  • Voting device(s): oracle:oinstall, mode 0666
  • You don’t have to use raw devices for ASM volumes on Linux, as the ASMLib library is very effective and easier to manage.

    Xen VMs performance collection

    Saturday, October 18th, 2008

    Unlike VMware Server, Xen’s HyperVisor does not allow an easy collection of performance information. The management machine, called “Domain-0” is actually a privileged virtual machine, and thus – get its own small share of CPUs and RAM. Collecting performance information on it will lead to, well, collecting performance information for a single VM, and not the whole bunch.

    Local tools, such as “xentop” allows collection of information, however, combining this with Cacti, or any other SNMP-based collection tool is a bit tricky.

    A great solution is provided by Ian P. Christian in his blog post about Xen monitoring. He has created a Perl script to collect information. I have taken the liberty to fix several minor things with his permission. The modified scripts are presented below. Name the script (according to your version of Xen) “/usr/local/bin/xen_stats.pl” and set it to be executable:

    For Xen 3.1

    #!/usr/bin/perl -w
    
    use strict;
    
    # declare...
    sub trim($);
    #xen_cloud.tar.gz
    # we need to run 2 iterations because CPU stats show 0% on the first, and I'm putting .1 second betwen them to speed it up
    my @result = split(/n/, `xentop -b -i 2 -d.1`);
    
    # remove the first line
    shift(@result);
    
    shift(@result) while @result && $result[0] !~ /^xentop - /;
    
    # the next 3 lines are headings..
    shift(@result);
    shift(@result);
    shift(@result);
    shift(@result);
    
    foreach my $line (@result)
    {
      my @xenInfo = split(/[t ]+/, trim($line));
      printf("name: %s, cpu_sec: %d, cpu_percent: %.2f, vbd_rd: %d, vbd_wr: %dn",
        $xenInfo[0],
        $xenInfo[2],
        $xenInfo[3],
        $xenInfo[14],
        $xenInfo[15]
        );
    }
    
    # trims leading and trailing whitespace
    sub trim($)
    {
      my $string = shift;
      $string =~ s/^s+//;
      $string =~ s/s+$//;
      return $string;
    }
    

    For Xen 3.2 and Xen 3.3

    #!/usr/bin/perl -w
    
    use strict;
    
    # declare…
    sub trim($);
    
    # we need to run 2 iterations because CPU stats show 0% on the first, and I’m putting .1 second between them to speed it up
    my @result = split(/n/, `/usr/sbin/xentop -b -i 2 -d.1`);
    
    # remove the first line
    shift(@result);
    shift(@result) while @result && $result[0] !~ /^[t ]+NAME/;
    shift(@result);
    
    foreach my $line (@result)
    {
            my @xenInfo = split(/[t ]+/, trim($line));
            printf(“name: %s, cpu_sec: %d, cpu_percent: %.2f, vbd_rd: %d, vbd_wr: %dn“,
            $xenInfo[0],
            $xenInfo[2],
            $xenInfo[3],
            $xenInfo[14],
            $xenInfo[15]
            );
    }
    # trims leading and trailing whitespace
    sub trim($)
    {
            my $string = shift;
            $string =~ s/^s+//;
            $string =~ s/s+$//;
            return $string;
    }
    

    Cron settings for Domain-0

    Create a file “/etc/cron.d/xenstat” with the following contents:

    # This will run xen_stats.pl every minute
    */1 * * * * root /usr/local/bin/xen_stats.pl > /tmp/xen-stats.new && cat /tmp/xen-stats.new > /var/run/xen-stats

    SNMP settings for Domain-0

    Add the line below to “/etc/snmp/snmpd.conf” and then restart the snmpd service

    extend xen-stats   /bin/cat /var/run/xen-stats

    Cacti

    I reduced Ian Cacti script to be based on a per-server setup, meaning this script gets the host (dom-0) name from Cacti, but cannot support live migrations. I will try to deal with combining live migrations with Cacti in the future.

    Download and extract my modified xen_cloud.tar.gz file. Extract it, place the script and config in its relevant location, and import the template into Cacti. It should work like charm.

    A note – the PHP script will work only on PHP5 and above. Works flawlessly on Centos5.2 for me.

    Oracle RAC with EMC iSCSI Storage Panics

    Tuesday, October 14th, 2008

    I have had a system panicking when running the mentioned below configuration:

    • RedHat RHEL 4 Update 6 (4.6) 64bit (x86_64)
    • Dell PowerEdge servers
    • Oracle RAC 11g with Clusterware 11g
    • EMC iSCSI storage
    • EMC PowerPate
    • Vote and Registry LUNs are accessible as raw devices
    • Data files are accessible through ASM with libASM

    During reboots or shutdowns, the system used to panic almost before the actual power cycle. Unfortunately, I do not have a screen capture of the panic…

    Tracing the problem, it seems that iSCSI, PowerIscsi (EMC PowerPath for iSCSI) and networking services are being brought down before “killall” service stops the CRS.

    The service file init.crs was never to be executed with a “stop” flag by the start-stop of services, as it never left a lock file (for example, in /var/lock/subsys), and thus, its existence in /etc/rc.d/rc6.d and /etc/rc.d/rc0.d is merely a fake.

    I have solved it by changing /etc/init.d/init.crs script a bit:

    • On “Start” action, touch a file called /var/lock/subsys/init.crs
    • On “Stop” action, remove a file called /var/lock/subsys/init.crs

    Also, although I’m not sure about its necessity, I have changed init.crs script SYSV execution order in /etc/rc.d/rc0.d and /etc/rc.d/rc6.d from wherever it was (K96 in one case and K76 on another) to K01, so it would be executed with the “stop” parameter early during shutdown or reboot cycle.

    It solved the problem, although future upgrades to Oracle ClusterWare will require being aware of this change.