Posts Tagged ‘redhat’

ZFS with Redhat Cluster Suite

Friday, July 25th, 2014

This is a very nice project I have been working on. The hardware at hand – two servers, with a shared SAS bus containing several SAS disks. Since it’s a shared bus, no RAID solution would cut it, and as I don’t want to waste disks with ASM (“normal” redundancy meaning half the size…), I went to ZFS storage.

ZFS is a wonderful technology, with many advantages, but with some dangerous pitfalls. As I prefer Linux, I did not bother with any Sloaris solutions, and went directly to Centos 6. I will describe my cluster setup below.

I will disclose the entire setup, including hardware layout, Linux platform, ZFS module parameters, the Redhat Cluster Suite ZFS agent I wrote and the cluster.conf configuration file. I will also share my considerations regarding some of the choices I made. In addition, this system was designed to act as NFS storage for Citrix XenServer pool, so I will have to describe the changed I had to perform on the XenServer itself (which might make it unsupported, but I will have to live with it), to allow it to handle the timeouts resulting by server failover.

So first – the servers – each having a single CPU (quad core), 24GB RAM, and dual 1Gb/s NICs. Also – a tiny internal SATA disk is used for the OS. The shared disks – at the moment, 10 SAS disks, dual port (notice – older HP disks might mark in a very small letters that they are only a single-port SAS disks…), 72GB, 10K RPM. Zpool called ‘share’ with two 5 disks RaidZ1 vdevs. As I mentioned before – ZFS seemed like the best possible option allowing me to achieve my goals at minimal cost.

When I came to this project, I wanted to be able to use a native ZFS cluster agent, and not a ‘script’ agent, which takes a very long time to respond (30 seconds). Also – I wanted to be able to handle multiple storage pools concurrently – each floating on its own. While I have only one at the moment, I wanted the ability to have a fine-grained control over multiple pools. In addition – I am unable (or unwilling?) to handle the multiple filesystems introduced with each pool. I wanted to be able to import or export the pool silently, and with a clear head, thus I had to verify that the multiple filesystems are not in use as part of the export process.

As an agent, I wanted to comply with Redhat Cluster Suite (RHCS from now on) OCF syntax. I used the supplied fs.sh script as an inspiration for my agent script, so some of it might look familiar. All credit goes to the original authors, of course.

The operating system I selected was Centos 6. Centos is based on Redhat Linux, and I find it mature and stable, which is exactly what I want when I plan a production-ready, enterprise-class storage solution. The version had to be x86_64, due to ZFS requirements, and due to the amount of RAM in the server.

To handle ZFS options, I added a file called /etc/modprobe.d/zfs.conf, with the following content

install zfs /bin/rm -f /etc/zfs/zpool.cache && /sbin/modprobe –ignore-install zfs
options zfs zfs_arc_max=12593790976
options zfs zfs_arc_min=12593790975

I had to verify there is no zpool.cache file. Since my pool was rather small (planned for 24 disks max), I was not concerned by the longer import process caused by not having the zpool.cache file. I was more concerned with automatic import process which might happen, and had to prevent it at almost any cost. In addition, I learned from other systems that the arc memory should never exceed half the RAM, and it should be given just a little under that.

Of course, when changing such module settings, you need to recreate initrd (dracut -f) to be on the safe side later on.

The zfs.sh agent script was placed in /usr/share/cluster directory. You must have rgmanager installed for this directory to exist, and anyhow, without rgmanager, you will have no cluster whatsoever.

This is the contents of the zfs.sh file. Notice that it is not compatible with Luci, so if you’re using it – them kids won’t play well together.

#!/bin/bash
 
LC_ALL=C
LANG=C
PATH=/bin:/sbin:/usr/bin:/usr/sbin
export LC_ALL LANG PATH
# Private return codes
FAIL=2
NO=1
YES=0
YES_STR="yes"
 
. $(dirname $0)/ocf-shellfuncs
 
meta_data()
{
    cat <
 
    1.0
 
	This script will import and export ZFS storage pools
	It will make sure to mount and umount all child filesystems
 
        This is a ZFS pool
 
                Symbolic name for this zfs pool
 
                File System Name
 
		ZFS Pool name or ID
 
                ZFS pool name
 
		ZFS Pool alternate mount
 
                ZFS pool alternate mount
 
                If set, the cluster will kill all processes using 
                this file system when the resource group is 
                stopped.  Otherwise, the unmount will fail, and
                the resource group will be restarted.
 
                Force Unmount
 
                If set and unmounting the file system fails, the node will
                immediately reboot.  Generally, this is used in conjunction
                with force-unmount support, but it is not required.
 
                Seppuku Unmount
 
	<!-- Note: active monitoring is constant and supplants all              check depths -->
        <!-- Checks to see if we can read from the mountpoint -->
 
        <!-- Checks to see if we can write to the mountpoint (if !ROFS) -->
 
EOT
}
 
ocf_log()
{
        echo $*
}
 
verify_driver() {
	ocf_log info "Verifying ZFS driver"
	lsmod | grep -w zfs &gt; /dev/null 2&gt;&amp;1 &amp;&amp; return 0
	ocf_log err "ZFS driver is not loaded"
	return $OCF_ERR_ARGS
}
 
verify_poolname() {
	ocf_log info "Verify pool name "
	if [ -z "$OCF_RESKEY_pool" ]
	then
		ocf_log err "Missing pool name"
		return $OCF_ERR_ARGS
	fi
	zpool import | grep pool: | grep -w $OCF_RESKEY_pool &gt; /dev/null 2&gt;&amp;1 &amp;&amp; return 0
	ocf_log err "Cannot identify pool name"
	return $OCF_ERR_ARGS
}
 
verify_mounted_poolname() {
	ocf_log info "Verify pool name "
	if [ -z "$OCF_RESKEY_pool" ]
	then
		ocf_log err "Missing pool name"
		return $OCF_ERR_ARGS
	fi
	zpool list $OCF_RESKEY_pool &gt; /dev/null 2&gt;&amp;1 &amp;&amp; return 0
	ocf_log err "Cannot identify pool name"
	return $OCF_ERR_ARGS
}
 
verify_mountpath() {
	ocf_log info "Verifying alternate root mount path"
	[ -z "$OCF_RESKEY_mount" ] &amp;&amp; return 0
	declare mp="${OCF_RESKEY_mount}"
	case "$mp" in
		/*)    	# found it
                	;;
        	*)      # invalid format
			ocf_log err \
"verify_mountpath: Invalid mount point format (must begin with a '/'): \'$mp\'"
                return $OCF_ERR_ARGS
                ;;
        esac
}
 
pool_import() {
	ocf_log info "Importing pool"
	OPTS=""
	[ -n "$OCF_RESKEY_mount" ] &amp;&amp; OPTS="-R $OCF_RESKEY_mount"
	zpool import $OCF_RESKEY_pool $OPTS
	RET="$?"
	if [ "$RET" -ne "0" ]
	then
		ocf_log info "Cannot import without applying force"
		zpool import -f $OCF_RESKEY_pool $OPTS
		RET="$?"
	fi
	if [ "$RET" -ne "0" ]
	then
		ocf_log err "Pool import failed for $OCF_RESKEY_pool. error=$RET"
		return 1
	fi
	ocf_log info "Imported ZFS pool"
	return $RET
}
 
check_and_release_fs() {
	ocf_log info "Checking and releasing FS"
	FS=""
	case ${OCF_RESKEY_force_unmount} in
        $YES_STR|on|true|1)	force_umount=$YES ;;
        *)		        force_umount="" ;;
        esac
 
	RET=0
	for i in `zfs list -t filesystem | grep ^${OCF_RESKEY_pool} | awk '{print $NF}'`
	do
		# To be on the safe side. Why not?
		sleep 1
		# Is it mounted?
		if ! df -l | grep -w "$i" &gt; /dev/null 2&gt;&amp;1
		then
			ocf_log info "Filesystem $i is not mounted"
			continue
		fi 	
		if [ `lsof $i | wc -l` -gt "0" ]
		then
			ocf_log info "Filesystem $i is in use"
			if [ "$force_umount" ]
			then
				ocf_log info "Attempting to kill processes on $i filesystem"
				fuser -k $i
				sleep 2
				if [ `lsof $i | wc -l` -gt "0" ]
				then
					ocf_log err "Cannot umount filesystem $i - filesystem in use"
					return 1
				fi
			else
				ocf_log err "Cannot umount filesystem $i
 - filesystem in use"
                                return 1
			fi
		fi
	done
	return $RET	
}
 
self_fence() {
	ocf_log info "Should we validate and call self-fence?"
	case ${OCF_RESKEY_self_fence} in
		$YES_STR|on|true|1)       self_fence=$YES ;;
       		*)              self_fence="" ;;
        esac	
 
	if [ "$self_fence" ]; then
		ocf_log alert "umount failed - REBOOTING"
               	sync
                reboot -fn
	fi
	return $OCF_ERR_GENERIC
}
 
pool_export() {
	ocf_log info "Exporting zfs pool"
	check_and_release_fs || self_fence
	zpool export $OCF_RESKEY_pool
	RET="$?"
	if [ "$RET" -ne "0" ]
	then
		ocf_log err "Pool export failed for $OCF_RESKEY_pool. error=$RET"
		return 1
	fi
	return $RET
}
 
start() {
	ocf_log info "Starting ZFS"
	verify_driver || return $OCF_ERR_ARGS 
	verify_poolname || return $OCF_ERR_ARGS
	verify_mountpath || return $OCF_ERR_ARGS
	pool_import
	# Handle filesystem?
}
 
stop() {
	ocf_log info "Starting ZFS"
	verify_driver || return $OCF_ERR_ARGS 
	verify_mounted_poolname || return $OCF_ERR_ARGS
	verify_mountpath || return $OCF_ERR_ARGS
	# Handle filesystem?
	pool_export
}
 
is_imported() {
	ocf_log debug "Checking if $OCF_RESKEY_pool is imported"
	zpool list ${OCF_RESKEY_pool} &gt; /dev/null 2&gt;&amp;1
	return $?
}
 
is_alive() {
	ocf_log debug "Checking ZFS pool read/write"
	declare file=".writable_test.$(hostname)"
	declare TIMEOUT="10s"
	[ -z "$OCF_CHECK_LEVEL" ] &amp;&amp; export OCF_CHECK_LEVEL=0
	mount_point=`zfs list ${OCF_RESKEY_pool} | grep ${OCF_RESKEY_pool} | awk '{print $NF}'`
	test -d "$mount_point"
        if [ $? -ne 0 ]; then
                ocf_log err "${OCF_RESOURCE_INSTANCE}: is_alive: $mount_point is not a directory"
                return $FAIL
        fi
	[ $OCF_CHECK_LEVEL -lt 10 ] &amp;&amp; return $YES
 
        # depth 10 test (read test)
        timeout -s 9 $TIMEOUT ls "$mount_point" &gt; /dev/null 2&gt; /dev/null
        errcode=$?
        if [ $errcode -ne 0 ]; then
                ocf_log err "${OCF_RESOURCE_INSTANCE}: is_alive: failed read test on [$mount_point]. Return code: $errcode"
                return $NO
        fi
 
	[ $OCF_CHECK_LEVEL -lt 20 ] &amp;&amp; return $YES
 
        # depth 20 check (write test)
        rw=$YES
        for o in `echo $OCF_RESKEY_options | sed -e s/,/\ /g`; do
                if [ "$o" = "ro" ]; then
                        rw=$NO
                fi
        done
	if [ $rw -eq $YES ]; then
                file="$mount_point"/$file
                while true; do
                        if [ -e "$file" ]; then
                                file=${file}_tmp
                                continue
                        else
                                break
                        fi
                done
                timeout -s 9 $TIMEOUT touch $file &gt; /dev/null 2&gt; /dev/null
                errcode=$?
                if [ $errcode -ne 0 ]; then
                        ocf_log err "${OCF_RESOURCE_INSTANCE}: is_alive: failed write test on [$mount_point]. Return code: $errcode"
                        return $NO
                fi
                rm -f $file &gt; /dev/null 2&gt; /dev/null
        fi
 
	return $YES
}
 
monitor() {
	ocf_log debug "Checking ZFS pool $OCF_RESKEY_pool, Level $OCF_CHECK_LEVEL"
	verify_driver || return $OCF_ERR_ARGS 
	is_imported
	RET=$?
	if [ "$RET" -ne $YES ]; then
                ocf_log err "${OCF_RESOURCE_INSTANCE}: ${OCF_RESKEY_device} is not mounted on ${OCF_RESKEY_mountpoint}"
                return $OCF_NOT_RUNNING
        fi
	is_alive
	return $RET
}
 
if [ -z "$OCF_CHECK_LEVEL" ]; then
	OCF_CHECK_LEVEL=0
fi
 
case $1 in
start)
	ocf_log info "zfs start $OCF_RESKEY_pool\n"
	OCF_CHECK_LEVEL=0
	monitor
	[ "$?" -ne "0" ] &amp;&amp; start || ocf_log info "$OCF_RESKEY_pool is already mounted"
	exit $?
	;;
stop)
	ocf_log info "zfs stop $OCF_RESKEY_pool\n"
	OCF_CHECK_LEVEL=0
	monitor
	[ "$?" -eq "0" ] &amp;&amp; stop || ocf_log info "$OCF_RESKEY_pool is not mounted"
	exit $?
	;;
status|monitor)
	ocf_log debug "ZFS monitor $OCF_RESKEY_pool"
	monitor
	exit $?
	;;
meta-data)
	echo -e "zfs metadat $OCF_RESKEY_address\n" &gt;&gt;/tmp/out
	meta_data
	exit 0
	;;
validate-all)
	exit 0
	;;
*)
	echo "usage: $0 {start|stop|status|monitor|restart|meta-data|validate-all}"
	exit $OCF_ERR_UNIMPLEMENTED
	;;
esac

All I had to do now was to build the cluster.conf file.

The reason I placed the IP address as the last to start and the first to stop was that the other way around, the NFS client would receive an ordered disconnection command, and would not bother to establish a connection with the remaining server. Abruptly taking away the clustered IP address causes the NFS clients to initiate a reconnection process, of which the systems are supposed to recover

I have left this article incomplete for a while now. It has some stuff I do like to share, so I am sharing it as-is. I will (some day) complete it.

Extracting/Recreating RHEL/Centos6 initrd.img and install.img

Tuesday, October 1st, 2013

A quick note about extracting and recreating RHEL6 or Centos6 (and their derivations) installation media components:

Initrd:

Extract:

mv initrd.img /tmp/initrd.img.xz
cd /tmp
xz –format=lzma initrd.img.xz –decompress
mkdir initrd
cd initrd
cpio -ivdum < ../initrd.img

Archive (after you applied your changes):

cd /tmp/initrd
find . | cpio -o -H newc | xz -9 –format=lzma > ../new-initrd.img

/images/install.img:

Extract:

mount -o loop install.img /mnt
mkdir /tmp/install.img.dir
cd /mnt ; tar cf – –one-file-system . | ( cd /tmp/install.img.dir ; tar xf – )
umount /mnt

Archive (after you applied your changes):

cd /tmp
mksquashfs install.img.dir/ install-new.img

Additional note for Anaconda installation parameters:

I did not test it, however there is a boot flag called stage2= which should lead to a new install.img file, other than the hardcoded one. I don’t if it will accept /images/install-new.img as its flag, but it can be a good start there.

One more thing:

Make sure that the vmlinuz and initrd used for any custom properties, in $CDROOT/isolinux do not exceed 8.3 format. Longer names didn’t work for me. I assume (without any further checks) that this is isolinux limitation.

IPSec VPN for mobile devices on Linux

Saturday, December 8th, 2012

I have had recently the pleasure and challenge of setting up VPN server for mobile devices on top of Linux. the common method to do so would be by using IPSec + L2TP, as these are to more common methods mobile devices allow, and it should work quite fine with other types of clients (although I did not test it) like Linux, Windows and Mac.

I have decided to use PSK (Pre Shared Key) due to its relative simplicity when handling multiple clients (compared to managing certificate per-device), and its relative simplicity of setup.

My VPN server platform is Linux, x86_64 (64 bit), Centos 6. Latest release, which is for the time being 6.3 and some updates.

I have used the following link as a baseline, and added some extra about IPTables, which was a little challenge, where I wanted good-enough security around this setup.

Initially, I wanted to use OpenSWAN, however, it does not allow easy integration with dynamic IP address, and its policy, while capable of being very precise, was not flexible enough to handle varying local IP address.

First – Add the following two repositories: Nikoforge, for Racoon (ipsec-tools), and EPEL for xl2tpd.

You can add them the following way:

rpm -ivH http://repo.nikoforge.org/redhat/el6/nikoforge-release-latest
yum -y install http://vesta.informatik.rwth-aachen.de/ftp/pub/Linux/fedora-epel/6/i386/epel-release-6-7.noarch.rpm
yum -y install ipsec-tools xl2tpd

Following that, create a script called /etc/racoon/init.sh:

#!/bin/sh
# set security policies
echo -e "flush;\n\
        spdflush;\n\
        spdadd 0.0.0.0/0[0] 0.0.0.0/0[1701] udp -P in  ipsec esp/transport//require;\n\
        spdadd 0.0.0.0/0[1701] 0.0.0.0/0[0] udp -P out ipsec esp/transport//require;\n"\
        | setkey -c
# enable IP forwarding
echo 1 > /proc/sys/net/ipv4/ip_forward

Make sure this script allows execution, and add it to /etc/rc.local

Racoon config /etc/racoon/racoon.conf looks like this for my setup:

path include "/etc/racoon";
path pre_shared_key "/etc/racoon/psk.txt";
path certificate "/etc/racoon/certs";
path script "/etc/racoon/scripts";
#log debug;
remote anonymous
{
      exchange_mode    aggressive,main;
      #exchange_mode    main;
      passive          on;
      proposal_check   obey;
      support_proxy    on;
      nat_traversal    on;
      ike_frag         on;
      dpd_delay        20;
      #generate_policy unique;
      generate_policy on;
      verify_identifier on;
      proposal
      {
            encryption_algorithm  aes;
            hash_algorithm        sha1;
            authentication_method pre_shared_key;
            dh_group              modp1024;
      }
      proposal
      {
            encryption_algorithm  3des;
            hash_algorithm        sha1;
            authentication_method pre_shared_key;
            dh_group              modp1024;
      }
}
sainfo anonymous
{
      encryption_algorithm     aes,3des;
      authentication_algorithm hmac_sha1;
      compression_algorithm    deflate;
      pfs_group                modp1024;
}

The PSK is kept inside /etc/racoon/psk.txt. It looks like this for me (changed password, duh!):

myHome   ApAssPhR@se

Both said files (/etc/racoon/racoon.conf and /etc/racoon/psk.txt) should have only-root permissions, aka 600.

Notice the bold myHome identifier. As the local address might change (either ppp dialup, or DHCP client), this one will be used instead of the local address identifier, as the common identifier of the connection. For Android devices, it will be defined as the ‘IPSec Identifier’ value.

We need to setup xl2tpd: Edit /etc/xl2tpd/xl2tpd.conf and have it look like this:

[global]
debug tunnel = no
debug state = no
debug network = no
ipsec saref = yes
force userspace = yes
[lns default]
ip range = 192.169.0.10-192.169.0.20
local ip = 192.169.0.1
refuse pap = yes
require authentication = yes
name = l2tpd
ppp debug = no
pppoptfile = /etc/ppp/options.xl2tpd
length bit = yes

The IP range will define the client VPN interface address. The amount should match the expected number of clients, or be somewhat larger, to be on the safe side. Don’t try to be a smart ass with it. Use explicit IP addresses. Easier that way. The “local” IP address should be external to the pool defined, or else a client might collide with it. I haven’t tried checking if xl2tpd allowed such configuration. You are invited to test, although it’s rather pointless.

Create the file /etc/ppp/options.xl2tpd with the following contents:

ms-dns 192.168.0.2
require-mschap-v2
asyncmap 0
auth
crtscts
lock
hide-password
modem
debug
name l2tpd
proxyarp
lcp-echo-interval 10
lcp-echo-failure 100

The ms-dns option should specify the desired DNS the client will use. In my case – I have an internal DNS server, so I wanted it to use it. You can either use your internal, if you have any, or Google’s 8.8.8.8, for example.

Almost done – you should add the relevant login info to /etc/ppp/chap-secrets. It should look like: “username” * “password” * . In my case, it would look like this:

# Secrets for authentication using CHAP
# client    server    secret            IP addresses
ez-aton        *    “SomePassw0rd”        *

Select a good password. Security should not be taken lightly.

We’re almost done – we need to define IP forwarding, which can be done by adding the following line to /etc/sysctl.conf:

# Controls IP packet forwarding
net.ipv4.ip_forward = 1

and then running ‘sysctl -p’ to load these values.

Run the following commands, and your system is ready to accept connections:

chkconfig racoon on
chkconfig xl2tpd on
service racoon start
service xl2tpd start
/etc/racoon/init.sh

That said – we have not configured IPTables, in case this server acts as the firewall as well. It does, in my case, so I have had to take special care for the IPTables rules.

As my rules are rather complex, I will only show the rules relevant to the system, assuming (and this is important!) it is both the firewall/router and the VPN endpoint. If this is not the case, you should search for more details about forwarding IPSec traffic to backend VPN server.

So, my IPTables rules would be these three:

iptables -A INPUT -p udp --dport 500 -j ACCEPT
iptables -A INPUT -p udp --dport 4500 -j ACCEPT
iptables -A INPUT -p esp -j ACCEPT
iptables -A INPUT -p 51 -j ACCEPT # Not sure it's required, but too lazy to test without

This covers the IPSec part, however, we would not want the L2TP server to accept connections from the net, just like that, so it has its own rule, for port 1701:

iptables -A INPUT -p udp -m policy –dir in –pol ipsec -m udp –dport 1701 -j ACCEPT

You can save your current iptables rules (after checking that they work correctly) using ‘service iptables save’, or manually (backup your original rules to be on the safe side), and you’re all ready to go.

Good luck!

Bonding + VLAN tagging + Bridge – updated

Wednesday, April 25th, 2012

In the past I hacked around a problem with the order of starting (and with several bugs) a network stack combined of network bonding (teaming) + VLAN tagging, and then with network bridging (aka – Xen bridges). This kind of setup is very useful for introducing VLAN networks to guest VMs. This works well on Xen (community, Server), however, on RHEL/Centos 5 versions, the startup scripts (ifup and ifup-eth) are buggy, and do not handle this operation correctly. It means that, depending on the update release you use, results might vary from “everything works” to “I get bridges without VLANs” to “I get VLANs without bridges”.

I have hacked a solution in the past, modifying /etc/sysconfig/network-scripts/ifup-eth and fixing some bugs in it, however, both maintaining the fix on every release of ‘initscripts’ package has proven, well, not to happen…

So, instead, I present you with a smarter solution, better adept to updates supplied from time to time by RedHat or Centos, using predefined ‘hooks’ in the ifup scripts.

Create the file /sbin/ifup-pre-local with the following contents:

 

#!/bin/bash
# $1 is the config file
# $2 is not interesting
# We will start the vlan bonding before any bridge
 
DIR=/etc/sysconfig/network-scripts
 
[ -z "$1" ] &amp;&amp; exit 0
. $1
 
if [ "${DEVICE%%[0-9]*}" == "xenbr" ]
then
    for device in $(LANG=C egrep -l "^[[:space:]]*BRIDGE=\"?${DEVICE}\"?" /etc/sysconfig/network-scripts/ifcfg-*) ; do
        /sbin/ifup $device
    done
fi

You can download this scrpit. Don’t forget to change it to be executable. It will call ifup for any parent device of xenbr* device called at. If the parent device is already up, no harm is done. If the parent device is not up, it will be brought up, and then the xenbr device can start normally.

Stateless Systems (diskless, boot from net) on RHEL5/Centos5

Thursday, July 21st, 2011

I have encountered several methods of doing stateless RedHat Linux systems. Some of them are over-sophisticated, and it doesn’t work. Some of them are too old, and you either have to fix half the scripts, or give up (which I did, BTW), and after long period of attempts, I have found my simple-yet-working-well-enough solution. It goes like that (drums please!)

yum install system-config-netboot

Doesn’t it sound funny? So simple, and yet – so working. So I have decided to supply you with the ultimate, simple, working guide to this method – how to make it work – fast and easy.

You will need the following:

  • RHEL5/Centos5 with ability to run yum client, and enough disk space to contain an entire system image, and more. Lots more. About it later. This machine will be called “server” in this guide.
  • A single “Golden Image” system – the base of your system-to-replicate. A system you will, with some (minor) modifications, use as the source of all your future stateless systems. Usually – resides on another machine (physical or virtual, doesn’t matter much. More about it later). This machine will be called “GI” in this guide.
  • A test system. Better be diskless, for assurance. Better also be able to boot from net, otherwise, we miss something here (although hybrid boot methods are possible, I will not discuss them here, for the time being). Can be virtual, as long as it is full hardware virtualization, as you cannot net-boot, except for the latest Xen Community versions, in PV mode. It will be called “net-client” or “client” in this guide.

Our flow:

  • Install required server-side tools
  • Setup server configuration parameters
  • Image the GI into the server
  • Run this and that
  • Boot our net-client happily

Let’s start!

On the server, run:

yum install -y system-config-netboot xorg-x11-xinit dhcp

and then, run:

chkconfig dhcpd on
chkconfig tftp on
chkconfig xinetd on
chkconfig nfs on

We will now perform configurations for the above mentioned services.

Your /etc/dhcpd.conf should look like this:

ddns-update-style interim;
ignore client-updates;

subnet $NETWORK netmask $NETMASK {
option routers              $GATEWAY;
option subnet-mask          $NETMASK;
option domain-name-servers  $DNS;
ignore unknown-clients;
filename “linux-install/pxelinux.0″;
next-server $SERVERIP;
}

You should replace all these variables with the ones matching your own layout. In my case

NETWORK=10.0.0.0
NETMASK=255.0.0.0
GATEWAY=10.200.1.1
DNS=10.100.1.4
SERVERIP=10.100.1.8

We will include hosts later. Notice that this DHCP server will refuse to serve unknown hosts. This will prevent the “Oops” factor…

We can start our DHCPD now. It won’t do anything, and it’s a good opportunity to test our configuration:

service dhcpd start

We need to create the directory structure. I have selected /export/NFSroots as my base structure. I will follow this structure here:

mkdir -p /export/NFSroots

We will proceed with the NFS server settings later.

Imaging the GI to the server is quite simple. We will begin by creating a directory in /export/NFSroots with the name of the imaged system type. In my case:

mkdir /export/NFSroots/centos5.6

However (and this is the tricky part!), we will create a directory under this location, called ‘root’. We will image the entire contents of our GI to this root directory. This is how this mechanism works, and I have no desire of bending it. So

mkdir /export/NFSroots/centos5.6/root

Now we copy the contents of the GI to this directory. There are several methods, but in this particular case, I have chosen to use ‘rsync’ over ‘ssh’. There are other methods just as good. Just one note – on the GI, before this copy, we will need to have busybox-anaconda package. So make sure you have it:

yum install -y busybox-anaconda

Now we can create an image from the GI:

rsync -e ssh -av –exclude ‘/proc/*’ –exclude ‘/sys/*’ –exclude ‘/dev/shm/*’ $GI_IP:/ /export/NFSroots/centos5.6/root

You must include as many exclude entries as required. You should never cause the system to attempt to grab the company’s huge NFS share, just because you have forgotten some ‘exclude’ entry. This will cause major loss of time, and possibly – some outage to some network resource. Be careful!

This is the best time to grub a cup of coffee/tee/cocoa/coke/whatever, and chit-chat with your friends. I have had about 10 minutes for 1.8GB image, via 1Gb/s network link, so you can do some math and guesswork, and probably – go grab launch.

When this operation is complete, your next mission is to configure your NFS server. Order does matter. You should create a read-only entry for the image root directory, and a full-access entry for the above structure. Well – I can probably narrow it down, but I did not really bother. During pre-production, I will test how to narrow permissions down, without getting myself into management hell. So – our entries are looking like this in /etc/exports :

/export/NFSroots/centos5.6/root *(ro,no_root_squash)
/export/NFSroots *(rw,no_root_squash,async)

I would love to hear comments by people who did narrow security down to the required minimum, and how they managed to handle tenths and hundreds of clients, or several dozens of RO images without much hassle. Really. Comment, people, if you have something to add. I would love to modify this document accordingly.

Now we need to start our NFS server:

service nfs start

We are ready for the big entry. You will need GUI here. We have installed xorg-x11-xauth which will allow us, if invoked, remote X. I use X over SSH, so it’s a breeze. Run the command:

system-config-netboot

I will not bother with screenshots, as this is not my style, but the following entries will be required:

  • First Time Druid: Diskless
  • Name: Easy identified. In my case: Centos5.6
  • Server IP address: $SERVERIP
  • Directory: /export/NFSroots/centos5.6
  • Kernel: Select the desired kernel
  • Apply

The system might take several minutes to complete. When done, you will be presented with a blank table. We need to populate it now.

For that to be easy, we better have all our clients in /etc/hosts file. While DNS does help, a lot, this specific tool, for it to work like charm, requires /etc/hosts to have the entry, or else, it just won’t be very readable. Below is a one-liner script to create a set of entries in /etc/hosts. I have started with .100 and completed with .120. This can be changed easily to match your requirements:

for i in {100..120} ; do echo “10.100.1.$i   pxe${i}” >> /etc/hosts ; done

This way we can refer to our clients as pxe100, pxe101, and so on.

Let’s create a new client!

Select “new” from the GUI menu, and fill in the name ‘pxe100′. If you have multiple images, this is a good time to select which one you will be using. If you have followed this guide to the letter, you have only a single option. Press on “OK” and you’re done.

We need to setup MAC and IP addresses into the DHCP server. I have written a script which assists with this greatly. For this script to work, you will need to run the following commands:

echo “” > /etc/dhcpd.conf.bootnet
echo ‘include “/etc/dhcpd.d/dhcpd.conf.bootnet”;’ >> /etc/dhcpd.conf

#!/bin/bash
# Creates and removes netboot nodes
 
# Check arguments
 
# VARIABLES
DHCPD_CONF=/etc/dhcpd.conf.bootnet
SERVICE=dhcpd
DATE=`date +"%H%M%S%d%m%y"`
BACKUPS=${DHCPD_CONF}.backups
 
function show_help() {
	echo "Usage: $0"
	echo "add NAME MAC IP"
	echo "where NAME is the name of the node"
	echo "MAC is the hardware address of the netboot interface (usually eth0)"
	echo "IP is the designated IP address of the node"
	echo
	echo
	echo "del ARGUMENT"
	echo "Where ARGUMENT can be either name, MAC address or IP address"
	echo "The script will attempt to remove whatever match it finds, so be careful"
	echo
	echo
	echo "check ARGUMENT"
	echo "Where ARGUMENT can be either name, MAC address or IP address"
        echo "The script will list any matching entries. This is a recommended"
	echo "action prior to removing a node"
	echo "The same logic used for finding entries to remove is used here"
	echo "so it is rather safe to run 'del' after a successful 'check' operation"
	echo
	exit 0
}
 
function check_inside() {
	# Will be used to either show or find the amount of matching entries"
	# Arguments: 	$1 - silent - 0 ; loud - 1
	# 		$2 - the string to search
	# The function will search the string as a stand-alone word only
	# so it will not find abc in abcde, but only in "abc"
 
	case "$1" in
		0) 	# silent
			RET=`grep -iwc $2 $DHCPD_CONF`
			;;
		1) 	# loud
			grep -iw $2 $DHCPD_CONF
			;;
 
		*)	echo "Something is very wrong with $0"
			echo "Exiting now"
			exit 2
			;;
	esac
 
	return $RET
}
 
function add_to() {
	# This function will add to the conf file
	# Arguments: $1 host ; $2 MAC ; $3 IP address
	echo "host $1 { hardware ethernet $2; fixed-address $3; }" >> $DHCPD_CONF
	[ "$?" -ne "0" ] && echo "Something wrong happened when attempted to add entry" && exit 3
}
 
function del_from() {
	# This function will delete a matching entry from the conf file
	# Arguments: $1 - expression
	[ -z "$1" ] && echo "Missing argument to del function" && exit 4
	if cat $DHCPD_CONF | grep -vw $1 > $DHCPD_CONF.$$
	then
		\mv $DHCPD_CONF.$$ $DHCPD_CONF
	else
		echo "Failed to create temp DHCPD file. Aborting"
		exit 5
	fi
}
 
function backup_conf() {
	cp $DHCPD_CONF $BACKUPS/`basename $DHCPD_CONF.$DATE`
}
 
function restore_conf() {
	\cp $BACKUPS/`basename $DHCPD_CONF.$DATE` $DHCPD_CONF
}
 
function check_wrapper() {
	# Perform check. Loud one
	echo "Searching for $REGEXP in configuration"
	check_inside 1 $REGEXP
	exit 0
}
 
function del_wrapper() {
	# Performs delete
	[ -z "$REGEXP" ] && echo "Missing argument for delete action" && exit 6
	backup_conf
	echo "Removing all invocations which include $REGEXP from config"
	del_from $REGEXP
 
	if /sbin/service $SERVICE restart
        then
                echo "Done"
        else
                restore_conf
                /sbin/service $SERVICE restart
                echo "Failed to update. Syntax error"
        fi
}
 
function add_wrapper() {
	# Adds to config file
	[ -z "$NAME" -o -z "$MAC" -o -z "$IP" ] && echo "Missing argument for add action" && exit 7
 
	for i in $NAME $MAC $IP
	do
		if check_inside 0 $i
		then
			echo -n .
		else
			echo "Value $i already exists"
			echo "Will not update duplicate value"
			exit 7
		fi
	done
	echo
 
	backup_conf
	add_to $NAME $MAC $IP
 
	if /sbin/service $SERVICE restart
	then
		echo "Done"
	else
		restore_conf
		/sbin/service $SERVICE restart
		echo "Failed to update. Syntax error"
	fi
}
 
function prep() {
	# Make sure everything is as expected
	[ ! -d ${BACKUPS} ] && mkdir ${BACKUPS}
}
 
prep
 
case "$1" in
	add)	NAME="$2"
		MAC="$3"
		IP="$4"
		add_wrapper
		;;
	check)	REGEXP="$2"
		check_wrapper
		;;
	del)	REGEXP="$2"
		del_wrapper
		;;
	help)	show_help
		;;
	*)	echo "Usage: $0 [add|del|check|help]"
		exit 1
		;;
esac

This script will update your /etc/dhcpd.conf.bootnet with new nodes. Place it somewhere in your path (for example: /usr/local/bin/ )

We will need the MAC address of a node. Run

net-node.sh add pxe100 00:16:3E:00:82:7A 10.100.1.100

This will add the node pxe100 with that specific MAC address to the DHCPD configuration.

Now, all you need to do is boot your client, and see how it works. Remember to disable other DHCP servers which might serve this node, or blacklist its MAC address from them.

Our next chapters will deal with my (future) attempt to make RHEL6 work under this setup (as a GI, and client, not as a server), and all kind of mass-deployment methods. If and when :-)

Cables connection in Israel for Linux

Thursday, May 14th, 2009

Update to 0.2. Links remain the same. At the moment I cannot host many versions (it’s mostly uncomfortable), but this might change in the future.

I have created a GUI cables installer and configurator for L2TP on Linux.

I have noticed that there is no GUI solution, so, after this has been brought up, I have done it (!!!)

I have uploaded these files here, and you are welcome to use them.

Remember – they are designed for a blank Ubuntu (currently. More distros will be supported in the future, upon request) with not much of junk installed. Also – they are designed for the simple user. Double-click and run. That’s it.

Quoting my readme file:

L2TP Cables connection in Israel (and across the world, where relevant) by Ez-Aton

—About:
This is an installer and configurator for L2TP over cables in Israel
With some luck, by running this installer, you will be able to connect
to the Internet with a dialer!

The system assumes you have little technical knowledge of Linux and you
are not expected to have any. Follow the defaults, and you should be fine.

This configuration will be cross distro in the future, meaning it will work
both on your Ubuntu, your RHEL, your Centos, Mandrake, etc. In order for me
to be able to do so, please assist by sending information on systems I am
not familiar with yet, per the appendix at the bottom.
Also, you can feel free to send me info in case the system did not work for
you (and let me know what are the differences from a default installation),
or, as always, send me money.

Visit my technical blog for updates and all kind of other technical stuff, at

http://run.tournament.org.il

OSS work is meant to be based on others work, and that I have done. I would
like to thank (and mention below) the resources for without this would not
have happened.

I hope you enjoy this dialer!

Ez

—How to use
Simply double-click on the “cables” icon on your desktop, and the system will
get you connected.
For CLI utilization: Run /usr/local/bin/cables

—Tools and resources used:
To create this package I have used the following tools and resources
makeself http://megastep.org/makeself/
xl2tpd by http://www.xelerance.com/software/xl2tpd/
xl2tpd guide for Israel Cables http://stuff.pulkes.org/l2tp/
ISP LNS list http://www.cables.org.il/cable-vpn/vpn.html
My connect/disconnect scripts from http://run.tournament.org.il

—License
This package contents are under GNUv2 license, meaning you have full permission
to modify the contents of this package, except for the binary packages included
with it, where you are binded by their respective licenses.

—My Distro/ISP is not supported!
Well, these things happen. Over 300 distros our there, and I can’t have them all.
However – you have your own distro, right? For me to add it to this package
(assuming you don’t want to do this yourself) you will have to supply me with the
following info:
* What distro, kernel and version, and how you get the distro name
(for example – on Redhat – /etc/redhat-release. On Ubuntu – /etc/lsb-release)
* The file containing the version inforamtion (see above)
* The versions available from your repositories of xl2tpd or l2tpd for older
releases, and where you can get them
* Your ISP, your ISPs LNS names/addresses
* Your country
* All other info you think relevant

—Change log
0.2 – Added ability to enter manual LNS address. Added Orange LNS. Fixed fixroute to allow both IP and hostname without problems. Fixed cables connection script to run fixroute anyhow.
0.1 – Initial release

Download it here: cables_connect.sh

If you want the scripts and sources (not for the simple user!), you can get there here: l2tp-cables

RedHat Cluster custom Oracle “Agent”/script V1.0

Friday, April 24th, 2009

Working with RH Cluster quite a lot, I have decided to create an online store of customer agents/scripts.

I have not, so far, invested the effort of making these agents accept settings from the cluster.conf file, but this might happen.

Let the library be!

Oracle DB script/agent:

Although I discovered (a bit late) that RH Cluster for Oracle Ent. Linux 5.2 does include oracle DB agent, this script should be good enough for RHEL4 RH Cluster versions as well.

This script only checks that the ‘smon’ process is up. Nothing fancy. This script can include, in the future, the ability to check that Oracle responses to SQL queries (meaning – actually working).

?Download oracle.sh
#!/bin/bash
#Service script for Oracle DB under RH Cluster
#Written by Ez-Aton
#http://run.tournament.org.il
 
# Global variables
ORACLE_USER=oracle
HOMEDIR=/home/$ORACLE_USER
OVERRIDE_FILE=/var/tmp/oracle_override
REC_LIST="user@domain.com"
 
function override () {
	if [ -f $OVERRIDE_FILE ]
	then
		exit 0
	fi
}
 
function start () {
	su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; sqlplus / as sysdba << EOF
startup
EOF
"
	status
}
 
function stop () {
	su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; sqlplus / as sysdba << EOF
shutdown immediate
EOF
"
	status && return 1 || return 0
}
 
function status () {
	ps -afu $ORACLE_USER | grep -v grep | grep smon
	return $?
}
 
function notify () {
	mail -s "$1 oracle on `hostname`" $REC_LIST < /dev/null
}
 
override
case "$1" in
start)	start
	notify $1
	;;
stop)	stop
#	notify $1
	;;
status)	status
	;;
*)	echo "Usage: $0 start|stop|status"
	;;
esac

I usually place this script (with execution permissions, of course) in /usr/local/sbin and call it as a “script” from the cluster configuration. You will probably be required to alter the first few variable lines to match to your environment.

Listener Agent/script:

The tnslsnr should be started/stopped as well, if we want the $ORACLE_HOME to migrate as well. This is its agent/script:

?Download lsnr.sh
#!/bin/bash
#Service script for Oracle DB under RH Cluster
#Written by Ez-Aton
#http://run.tournament.org.il
 
ORACLE_USER=oracle
HOMEDIR=/home/$ORACLE_USER
OVERRIDE_FILE=/var/tmp/oracle_override
 
function override () {
if [ -f $OVERRIDE_FILE ]
then
exit 0
fi
}
 
function start () {
su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; lsnrctl start"
status
}
 
function stop () {
su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; lsnrctl stop"
status && return 1 || return 0
}
 
function status () {
su - $ORACLE_USER -c ". $HOMEDIR/.bash_profile ; lsnrctl status"
}
 
override
case "$1" in
start)    start
;;
stop)    stop
;;
status)    status
;;
*)    echo "Usage: $0 start|stop|status"
;;
esac

Again – place it in /usr/local/sbin and call it from the cluster configuration file as type “script”.

I will add more agents and more resources for RedHat Cluster in the future.

Tips and tricks for Redhat Cluster

Saturday, January 31st, 2009

Redhat Cluster is a nice HA product. I have been implementing it for a while now, lecturing about it, and yes – I like it. But like any other software product, it has few flaws and issues which you should take under consideration – especially when you create custom “agents” – plugins to control (start/stop/status) your 3rd party application.

I want to list several tips and good practices which will help you create your own agent or custom script, and will help you sleep better at night.

Nighty Night: Sleeping is easier when your cluster is quiet. It usually means that  you don’t want the cluster to suddenly failover during night time, or – for that matter, during any hour, unexpectedly.
Below are some tips to help you sleep better, or to perform an easier postmortem of any cluster failure.

Chop the Logs: Since RedHat Cluster logging might be hidden and filled with lots of irrelevant information, you want your agents to be nice about it. Let them log out somewhere the result of running “status” or “stop” or even “start”. Of course – either recycle the output logs, or rotate them away. You could use

exec &>/tmp/my_script_name.out

much like HACMP does (or at least – behaves as if it does). You can also use specific logging facility for different subsystems of the cluster (cman, rg, qdiskd)

Mind the Gap: Don’t trust unknown scripts or applications’ return codes. Your cluster will fail miserably if a script or a file you expect to run will not be there. Do not automatically assume that the vmware script, for example, will return normal values. Check the return codes and decide how to respond accordingly.

Speak the Language: A service in RedHat Cluster is a combination of one or more resources. This can be somewhat confusing as we tend to refer to resources as a (system) service. Use the correct lingo.  I will try to do just that in this document, so heed the difference between the terms “service” and “system service”, which can be a cluster resource.

Divide and Conquer: Split your services to the minimal set of resources possible. If your service consists of hundreds of resources failure to one of them could cause the entire service to restart, taking down all other working resources. If you keep it to the minimum, you actually protect yourself.

Trust No One: To stress out the “Mind the Gap” point above – don’t trust 3rd party scripts or applications to return a correct error code. Don’t trust their configuration files, and don’t trust the users to “do it right”. They will not. Try to create your own services as fault-protected as possible. Don’t crash because some stupid user (or a stupid administrator – for a cluster implementer both are the same, right?) used incorrect input parameters, or because he has kept  an important configuration file in a different name than was required.

I have some special things I want to do with regard to RedHat Cluster Suite. Stay tuned :-)

Protect Vmware guest under RedHat Cluster

Monday, November 17th, 2008

Most documentation on the net is about how to run a cluster-in-a-box under Vmware. Very few seem to care about protecting Vmware guests under real RedHat cluster with a shared storage.

This article is just about it. While I would not recommend using Vmware in such a setup, it has been the case, and that Vmware guest actually resides on the shared storage. To relocate it is out of the question, so migrating it together with other resources is the only valid option.

To do so, I have created a simple script which will accept start/stop/status arguments. The Vmware guest VMX is hard-coded into the script, but in an easy-to-change format. This script will attempt to freeze the Vmware guest, and only if it fails, to shut it down. Mind you that the blog’s HTML formatting might alter quotation marks into UTF-8 marks which will not be understood well by shell.

#!/bin/bash
# This script will start/stop/status VMware machine
# Written by Ez-Aton
# http://www.tournament.org.il/run
 
# Hardcoded. Change to match your own settings!
VMWARE="/export/vmware/hosts/Windows_XP_Professional/Windows XP Professional.vmx"
VMRUN="/usr/bin/vmrun"
TIMEOUT=60
 
function status () {
  # This function will return success if the VM is up
  $VMRUN list | grep "$VMWARE" &amp;&gt;/dev/null
  if [[ "$?" -eq "0" ]]
  then
    echo "VM is up"
    return 0
  else
    echo "VM is down"
    return 1
  fi
}
 
function start () {
  # This function will start the VM
  $VMRUN start "$VMWARE"
  if [[ "$?" -eq "0" ]]
  then
    echo "VM is starting"
    return 0
  else
    echo "VM failed"
    return 1
  fi
}
 
function stop () {
  # This function will stop the VM
  $VMRUN suspend "$VMWARE"
  for i in `seq 1 $TIMEOUT`
  do
    if status
    then
      echo
    else
      echo "VM Stopped"
      return 0
    fi
    sleep 1
  done
  $VMRUN stop "$VMWARE" soft
}
 
case "$1" in
start)     start
        ;;
stop)      stop
        ;;
status)   status
        ;;
esac
RET=$?
 
exit $RET

Since the formatting is killed by the blog, you can find the script here: vmware1

I intend on building a “real” RedHat Cluster agent script, but this should do for the time being.

Enjoy!

Xen Networking – Bonding with VLAN Tagging

Thursday, October 23rd, 2008

The simple scripts in /etc/xen/scripts which manage networking are fine for most usages, however, when your server is using bonding together with VLAN tagging (802.11q) you should consider an alternative.

A PDF document written by Mark Nielsen, GPS Senior Consultant, Red Hat, Inc (I lost the original link, sorry) named “BOND/VLAN/Xen Network Configuration” as a service to the community, game me few insights on the subject. Following one of its references, I saw a bit more elegant method of doing a bridging setup under RedHat, which takes managing the bridges away from xend, and leaves it at the system level. Lets see how it’s done on RedHat style Linux distribution.

Manage your normal networking configurations

If you’re using VLAN tagging over bonding, than you should have to setup a bonding device (be it bond0) which has definitions such as this:

/etc/sysconfig/network-scripts/ifcfg-eth0 and /etc/sysconfig/network-scripts/ifcfg-eth1

DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
ISALIAS=no

/etc/sysconfig/network-scripts/ifcfg-bond0

DEVICE=bond0
BOOTPROTO=none
BONDING_OPTS=”mode=1 miimon=100″
ONBOOT=yes

This is rather stright-forward, and should be rather a default for such a setup. Now comes the more interesting part. Originally, the next configuration part would be bond0.2 and bond0.3 (in my example). The original configuration would have looked like this (this is in bold because I tend to fast-read myself, and tend to miss things too often). This is not how it should look when we’re done!

/etc/sysconfig/network-scripts/ifcfg-bond0.2 (same applies to ifcfg-bond0.3)

DEVICE=bond0.2
BOOTPROTO=static
IPADDR=192.168.0.2
NETMASK=255.255.255.0
ONBOOT=yes
VLAN=yes

Configure bridging

To setup a bridge device for bond0.2, replace the mentioned above ifcfg-bond0.2 with this new /etc/sysconfig/network-scripts/ifcfg-bond0.2

DEVICE=bond0.2
BOOTPROTO=static
ONBOOT=yes
VLAN=yes
BRIDGE=xenbr0

Now, create a new file /etc/sysconfig/network-scripts/ifcfg-xenbr0

DEVICE=xenbr0
BOOTPROTO=static
IPADDR=192.168.0.2
NETMASK=255.255.255.0
ONBOOT=yes
TYPE=bridge

Now, on network restart, the bridge will be brought up, holding the right IP address – all done by initscripts, with no Xen intervention. You will want to repeat the last the “Configure bridge” part for any additional bridge you want to be enabled for Xen machines.

Don’t let Xen bring any bridges up

This is the last part of our drill, and it is very important. If you don’t do it, you’ll get a nice networking mess. As said before, Xen (community), by default, can’t handle bondings or vlan tags, so it will attempt to create or modify bridges to eth0 or the likes. Edit /etc/xen/xend-config.sxp and remark any line containing a directive containing starting with “network-script“. Such a directive would be, for example

(network-script network-bridge)

Restart xend and restart networking. You should now be able to configure VMs to use xenbr0 and xenbr1, etc (according to your own personal settings).