Archive for July, 2011

Stateless Systems (diskless, boot from net) on RHEL5/Centos5

Thursday, July 21st, 2011

I have encountered several methods of doing stateless RedHat Linux systems. Some of them are over-sophisticated, and it doesn’t work. Some of them are too old, and you either have to fix half the scripts, or give up (which I did, BTW), and after long period of attempts, I have found my simple-yet-working-well-enough solution. It goes like that (drums please!)

yum install system-config-netboot

Doesn’t it sound funny? So simple, and yet – so working. So I have decided to supply you with the ultimate, simple, working guide to this method – how to make it work – fast and easy.

You will need the following:

  • RHEL5/Centos5 with ability to run yum client, and enough disk space to contain an entire system image, and more. Lots more. About it later. This machine will be called “server” in this guide.
  • A single “Golden Image” system – the base of your system-to-replicate. A system you will, with some (minor) modifications, use as the source of all your future stateless systems. Usually – resides on another machine (physical or virtual, doesn’t matter much. More about it later). This machine will be called “GI” in this guide.
  • A test system. Better be diskless, for assurance. Better also be able to boot from net, otherwise, we miss something here (although hybrid boot methods are possible, I will not discuss them here, for the time being). Can be virtual, as long as it is full hardware virtualization, as you cannot net-boot, except for the latest Xen Community versions, in PV mode. It will be called “net-client” or “client” in this guide.

Our flow:

  • Install required server-side tools
  • Setup server configuration parameters
  • Image the GI into the server
  • Run this and that
  • Boot our net-client happily

Let’s start!

On the server, run:

yum install -y system-config-netboot xorg-x11-xinit dhcp

and then, run:

chkconfig dhcpd on
chkconfig tftp on
chkconfig xinetd on
chkconfig nfs on

We will now perform configurations for the above mentioned services.

Your /etc/dhcpd.conf should look like this:

ddns-update-style interim;
ignore client-updates;

subnet $NETWORK netmask $NETMASK {
option routers              $GATEWAY;
option subnet-mask          $NETMASK;
option domain-name-servers  $DNS;
ignore unknown-clients;
filename “linux-install/pxelinux.0”;
next-server $SERVERIP;
}

You should replace all these variables with the ones matching your own layout. In my case

NETWORK=10.0.0.0
NETMASK=255.0.0.0
GATEWAY=10.200.1.1
DNS=10.100.1.4
SERVERIP=10.100.1.8

We will include hosts later. Notice that this DHCP server will refuse to serve unknown hosts. This will prevent the “Oops” factor…

We can start our DHCPD now. It won’t do anything, and it’s a good opportunity to test our configuration:

service dhcpd start

We need to create the directory structure. I have selected /export/NFSroots as my base structure. I will follow this structure here:

mkdir -p /export/NFSroots

We will proceed with the NFS server settings later.

Imaging the GI to the server is quite simple. We will begin by creating a directory in /export/NFSroots with the name of the imaged system type. In my case:

mkdir /export/NFSroots/centos5.6

However (and this is the tricky part!), we will create a directory under this location, called ‘root’. We will image the entire contents of our GI to this root directory. This is how this mechanism works, and I have no desire of bending it. So

mkdir /export/NFSroots/centos5.6/root

Now we copy the contents of the GI to this directory. There are several methods, but in this particular case, I have chosen to use ‘rsync’ over ‘ssh’. There are other methods just as good. Just one note – on the GI, before this copy, we will need to have busybox-anaconda package. So make sure you have it:

yum install -y busybox-anaconda

Now we can create an image from the GI:

rsync -e ssh -av –exclude ‘/proc/*’ –exclude ‘/sys/*’ –exclude ‘/dev/shm/*’ $GI_IP:/ /export/NFSroots/centos5.6/root

You must include as many exclude entries as required. You should never cause the system to attempt to grab the company’s huge NFS share, just because you have forgotten some ‘exclude’ entry. This will cause major loss of time, and possibly – some outage to some network resource. Be careful!

This is the best time to grub a cup of coffee/tee/cocoa/coke/whatever, and chit-chat with your friends. I have had about 10 minutes for 1.8GB image, via 1Gb/s network link, so you can do some math and guesswork, and probably – go grab launch.

When this operation is complete, your next mission is to configure your NFS server. Order does matter. You should create a read-only entry for the image root directory, and a full-access entry for the above structure. Well – I can probably narrow it down, but I did not really bother. During pre-production, I will test how to narrow permissions down, without getting myself into management hell. So – our entries are looking like this in /etc/exports :

/export/NFSroots/centos5.6/root *(ro,no_root_squash)
/export/NFSroots *(rw,no_root_squash,async)

I would love to hear comments by people who did narrow security down to the required minimum, and how they managed to handle tenths and hundreds of clients, or several dozens of RO images without much hassle. Really. Comment, people, if you have something to add. I would love to modify this document accordingly.

Now we need to start our NFS server:

service nfs start

We are ready for the big entry. You will need GUI here. We have installed xorg-x11-xauth which will allow us, if invoked, remote X. I use X over SSH, so it’s a breeze. Run the command:

system-config-netboot

I will not bother with screenshots, as this is not my style, but the following entries will be required:

  • First Time Druid: Diskless
  • Name: Easy identified. In my case: Centos5.6
  • Server IP address: $SERVERIP
  • Directory: /export/NFSroots/centos5.6
  • Kernel: Select the desired kernel
  • Apply

The system might take several minutes to complete. When done, you will be presented with a blank table. We need to populate it now.

For that to be easy, we better have all our clients in /etc/hosts file. While DNS does help, a lot, this specific tool, for it to work like charm, requires /etc/hosts to have the entry, or else, it just won’t be very readable. Below is a one-liner script to create a set of entries in /etc/hosts. I have started with .100 and completed with .120. This can be changed easily to match your requirements:

for i in {100..120} ; do echo “10.100.1.$i   pxe${i}” >> /etc/hosts ; done

This way we can refer to our clients as pxe100, pxe101, and so on.

Let’s create a new client!

Select “new” from the GUI menu, and fill in the name ‘pxe100’. If you have multiple images, this is a good time to select which one you will be using. If you have followed this guide to the letter, you have only a single option. Press on “OK” and you’re done.

We need to setup MAC and IP addresses into the DHCP server. I have written a script which assists with this greatly. For this script to work, you will need to run the following commands:

echo “” > /etc/dhcpd.conf.bootnet
echo ‘include “/etc/dhcpd.d/dhcpd.conf.bootnet”;’ >> /etc/dhcpd.conf

#!/bin/bash
# Creates and removes netboot nodes

# Check arguments

# VARIABLES
DHCPD_CONF=/etc/dhcpd.conf.bootnet
SERVICE=dhcpd
DATE=`date +"%H%M%S%d%m%y"`
BACKUPS=${DHCPD_CONF}.backups

function show_help() {
	echo "Usage: $0"
	echo "add NAME MAC IP"
	echo "where NAME is the name of the node"
	echo "MAC is the hardware address of the netboot interface (usually eth0)"
	echo "IP is the designated IP address of the node"
	echo
	echo
	echo "del ARGUMENT"
	echo "Where ARGUMENT can be either name, MAC address or IP address"
	echo "The script will attempt to remove whatever match it finds, so be careful"
	echo
	echo
	echo "check ARGUMENT"
	echo "Where ARGUMENT can be either name, MAC address or IP address"
        echo "The script will list any matching entries. This is a recommended"
	echo "action prior to removing a node"
	echo "The same logic used for finding entries to remove is used here"
	echo "so it is rather safe to run 'del' after a successful 'check' operation"
	echo
	exit 0
}

function check_inside() {
	# Will be used to either show or find the amount of matching entries"
	# Arguments: 	$1 - silent - 0 ; loud - 1
	# 		$2 - the string to search
	# The function will search the string as a stand-alone word only
	# so it will not find abc in abcde, but only in "abc"

	case "$1" in
		0) 	# silent
			RET=`grep -iwc $2 $DHCPD_CONF`
			;;
		1) 	# loud
			grep -iw $2 $DHCPD_CONF
			;;

		*)	echo "Something is very wrong with $0"
			echo "Exiting now"
			exit 2
			;;
	esac

	return $RET
}

function add_to() {
	# This function will add to the conf file
	# Arguments: $1 host ; $2 MAC ; $3 IP address
	echo "host $1 { hardware ethernet $2; fixed-address $3; }" >> $DHCPD_CONF
	[ "$?" -ne "0" ] && echo "Something wrong happened when attempted to add entry" && exit 3
}

function del_from() {
	# This function will delete a matching entry from the conf file
	# Arguments: $1 - expression
	[ -z "$1" ] && echo "Missing argument to del function" && exit 4
	if cat $DHCPD_CONF | grep -vw $1 > $DHCPD_CONF.$$
	then
		mv $DHCPD_CONF.$$ $DHCPD_CONF
	else
		echo "Failed to create temp DHCPD file. Aborting"
		exit 5
	fi
}

function backup_conf() {
	cp $DHCPD_CONF $BACKUPS/`basename $DHCPD_CONF.$DATE`
}

function restore_conf() {
	cp $BACKUPS/`basename $DHCPD_CONF.$DATE` $DHCPD_CONF
}

function check_wrapper() {
	# Perform check. Loud one
	echo "Searching for $REGEXP in configuration"
	check_inside 1 $REGEXP
	exit 0
}

function del_wrapper() {
	# Performs delete
	[ -z "$REGEXP" ] && echo "Missing argument for delete action" && exit 6
	backup_conf
	echo "Removing all invocations which include $REGEXP from config"
	del_from $REGEXP

	if /sbin/service $SERVICE restart
        then
                echo "Done"
        else
                restore_conf
                /sbin/service $SERVICE restart
                echo "Failed to update. Syntax error"
        fi
}

function add_wrapper() {
	# Adds to config file
	[ -z "$NAME" -o -z "$MAC" -o -z "$IP" ] && echo "Missing argument for add action" && exit 7

	for i in $NAME $MAC $IP
	do
		if check_inside 0 $i
		then
			echo -n .
		else
			echo "Value $i already exists"
			echo "Will not update duplicate value"
			exit 7
		fi
	done
	echo

	backup_conf
	add_to $NAME $MAC $IP

	if /sbin/service $SERVICE restart
	then
		echo "Done"
	else
		restore_conf
		/sbin/service $SERVICE restart
		echo "Failed to update. Syntax error"
	fi
}

function prep() {
	# Make sure everything is as expected
	[ ! -d ${BACKUPS} ] && mkdir ${BACKUPS}
}

prep

case "$1" in
	add)	NAME="$2"
		MAC="$3"
		IP="$4"
		add_wrapper
		;;
	check)	REGEXP="$2"
		check_wrapper
		;;
	del)	REGEXP="$2"
		del_wrapper
		;;
	help)	show_help
		;;
	*)	echo "Usage: $0 [add|del|check|help]"
		exit 1
		;;
esac

This script will update your /etc/dhcpd.conf.bootnet with new nodes. Place it somewhere in your path (for example: /usr/local/bin/ )

We will need the MAC address of a node. Run

net-node.sh add pxe100 00:16:3E:00:82:7A 10.100.1.100

This will add the node pxe100 with that specific MAC address to the DHCPD configuration.

Now, all you need to do is boot your client, and see how it works. Remember to disable other DHCP servers which might serve this node, or blacklist its MAC address from them.

Our next chapters will deal with my (future) attempt to make RHEL6 work under this setup (as a GI, and client, not as a server), and all kind of mass-deployment methods. If and when 🙂