Archive for the ‘Linux’ Category

XenServer 6.5 PCI-Passthrough

Thursday, April 16th, 2015

While searching the web for how to perform PCI-Passthrough on XenServers, we mostly get info about previous versions. Since I have just completed setting up PCI-Passthrough on XenServer version 6. 5 (with recent update 8, just to give you some notion of the exact time frame), I am sharing it here.

Hardware: Cisco UCS blades, with fNIC. I wish to pass through two FC HBAs into a VM (it is going to act as a backup server, and I need it accessing the FC tape). While all my XenServers in this pool have four (4) FC HBAs, this particular XenServer node has six (6). I am intending the first four for SR communication and the remaining two for the PCI Passthrough process.

This is the output of ‘lspci | grep Fibre’:

0b:00.0 Fibre Channel: Cisco Systems Inc VIC FCoE HBA (rev a2)
0c:00.0 Fibre Channel: Cisco Systems Inc VIC FCoE HBA (rev a2)
0d:00.0 Fibre Channel: Cisco Systems Inc VIC FCoE HBA (rev a2)
0e:00.0 Fibre Channel: Cisco Systems Inc VIC FCoE HBA (rev a2)
0f:00.0 Fibre Channel: Cisco Systems Inc VIC FCoE HBA (rev a2)
10:00.0 Fibre Channel: Cisco Systems Inc VIC FCoE HBA (rev a2)

So, I want to pass through 0f:00.0 and 10:00.0. I had to add to /boot/extlinux.conf the following two entries after the word ‘splash’ and before the three dashes:

pciback.hide=(0f:00.1)(10:00.1) xen-pciback.hide=(0f:00.1)(10:00.1)

Initially, and contrary to the documentation, the parameter pciback.hide had no effect. As soon as the VM started, the command ‘multipath -l‘ would hang forever (or until hard reset to the host).

To apply the settings above, run (for a good measure. Don’t think we need it, but did not read anything about it): ‘extlinux -i /boot‘ and then reboot.

Now, when the host is back, we need to add the devices to the VM. Make sure that the VM is in ‘off’ state before doing that. Your command would look like this:

xe vm-param-set uuid=<VM UUID> other-config:pci=0/0000:0f:00.0,0/0000:10:00.0

The expression ’0/0000′ is required. You can search for its purpose, however, in most cases, your value would look exactly like mine – ’0/0000′

Since my VM is Windows, here it almost ends: Start the VM, and if it boots correctly, Install Cisco VIC into it, as if it were a physical host. You’re done.

Redhat Cluster and Citrix XenServer

Thursday, April 9th, 2015

I wanted to write down a guide for RHCS on RHEL/Centos6 and XenServer.

If you want to do that, you need to go through two major challenges which you will encounter. I want to save on the search and sum it all up together here.

The first difficulty is the shared disk. In order to set up most common cluster scenarios, you will need a shared storage. You could either map the VMs to an iSCSI LUNs external to the environment, however, if you do not have such infrastructure (either because everything is based on SAS/FC, or you do not have the ability to set up iSCSI storage with reasonable level of availability), you will want XenServer to allow you to share the VDI between two VMs.

In order to do so, you will need to add a flag to all your pool’s XenServers, and to create the VDI in a specific method. First – the flag – you need to create a file in /etc/xensource called “allow_multiple_vdi_attach”. Do not forget to add it to all your XenServers:

touch /etc/xensource/allow_multiple_vdi_attach

Next, you will need to create your VDI as “raw” type. This is an example. You need to change the SR UUID to the one you use:

xe vdi-create sm-config:type=raw sr-uuid=687a023b-0b20-5e5f-d1ef-3db777ce7ae4 name-label=”My Raw LVM VDI” virtual-size=8GiB type=user

You can find Citrix article about it here.

Following that, you can complete your cluster setup and configuration. I will not add details about it here, as this is not the focus of this article. However, when it comes to fencing, you will need a solution. The solution I used was a fencing agent which was written specifically for XenServer using XenAPI, by using the agent called fence-xenserver. I did not use the fencing agents repository (which this page also points to), because I was unable to compile the required components to run on Centos6. They just don’t compile well. This is, however, a simple Python script which actually works.

In order to make it work, I did the following:

  • Extracted the archive (version 0.8)
  • Placed fence_cxs* in /usr/sbin, and removed their ‘.py’ suffix
  • Placed XenAPI.py as-is in /usr/sbin
  • Verified /usr/sbin/fence_cxs* had execution permissions.

Now, I needed to add it to the cluster configuration. Since the agent cannot handle accessing a non-pool master, it had to be defined for each pool member (I cannot tell in advance which of them is going to have the pool master role when a failover should happen). So, this is my cluster.conf relevant parts:

<fencedevices>
<fencedevice agent=”fence_cxs_redhat” login=”root” name=”xenserver01″ passwd=”password” session_url=”https://xenserver01″/>
<fencedevice agent=”fence_cxs_redhat” login=”root” name=”xenserver02″ passwd=”password” session_url=”https://xenserver02″/>
<fencedevice agent=”fence_cxs_redhat” login=”root” name=”xenserver03″ passwd=”password” session_url=”https://xenserver03″/>
<fencedevice agent=”fence_cxs_redhat” login=”root” name=”xenserver04″ passwd=”password” session_url=”https://xenserver04″/>
</fencedevices>
<clusternodes>
<clusternode name=”clusternode1″ nodeid=”1″>
<fence>
<method name=”xenserver01″>
<device name=”xenserver01″ vm_name=”clusternode1″/>
</method>
<method name=”xenserver02″>
<device name=”xenserver02″ vm_name=”clusternode1″/>
</method>
<method name=”xenserver03″>
<device name=”xenserver03″ vm_name=”clusternode1″/>
</method>
<method name=”xenserver04″>
<device name=”xenserver04″ vm_name=”clusternode1″/>
</method>
</fence>
</clusternode>
<clusternode name=”clusternode2″ nodeid=”2″>
<fence>
<method name=”xenserver01″>
<device name=”xenserver01″ vm_name=”clusternode2″/>
</method>
<method name=”xenserver02″>
<device name=”xenserver02″ vm_name=”clusternode2″/>
</method>
<method name=”xenserver03″>
<device name=”xenserver03″ vm_name=”clusternode2″/>
</method>
<method name=”xenserver04″>
<device name=”xenserver04″ vm_name=”clusternode2″/>
</method>
</fence>
</clusternode>
</clusternodes>

Attached xenserver-fencing-cluster.xml for clarity (WordPress makes a mess out of that)

Note that I used four (4) entries, since my pool has four hosts. Also note the VM name (it is case sensitive), and your methods – one for each host, since you don’t want them running in parallel, but one at a time. Failover time is between 5-15 seconds on my tests, depending on who is the actually pool master (xenserver04 takes the longest, obviously). I did not test it with pool master down (before or without HA kicking in), nor with the hosts down and thus TCP timeout is longer (than when attempting to connect a host which responds immediately that it is not the pool master). However, if ILO fencing takes about 30-60 seconds, I am not complaining about the current timeouts.

Windows 7 hammering dnsmasq

Saturday, February 1st, 2014

I migrated to dnsmasq just yesterday, and discovered that a Windows 7 machine was hammering the server with messages like this:

Feb  1 11:06:07 dns dnsmasq-dhcp[1078]: DHCPINFORM(eth0) 192.168.1.77 91:de:87:7b:e5:a8
Feb  1 11:06:07 dns dnsmasq-dhcp[1078]: DHCPACK(eth0) 192.168.1.77 91:de:87:7b:e5:a8 winpc

Googling a bit, I found out this link (with an explanation). The solution is fairly simple. Add the following line to your dnsmasq.conf file to solve the problem:

dhcp-option=252,”\n”

 

Extracting/Recreating RHEL/Centos6 initrd.img and install.img

Tuesday, October 1st, 2013

A quick note about extracting and recreating RHEL6 or Centos6 (and their derivations) installation media components:

Initrd:

Extract:

mv initrd.img /tmp/initrd.img.xz
cd /tmp
xz –format=lzma initrd.img.xz –decompress
mkdir initrd
cd initrd
cpio -ivdum < ../initrd.img

Archive (after you applied your changes):

cd /tmp/initrd
find . | cpio -o -H newc | xz -9 –format=lzma > ../new-initrd.img

/images/install.img:

Extract:

mount -o loop install.img /mnt
mkdir /tmp/install.img.dir
cd /mnt ; tar cf – –one-file-system . | ( cd /tmp/install.img.dir ; tar xf – )
umount /mnt

Archive (after you applied your changes):

cd /tmp
mksquashfs install.img.dir/ install-new.img

Additional note for Anaconda installation parameters:

I did not test it, however there is a boot flag called stage2= which should lead to a new install.img file, other than the hardcoded one. I don’t if it will accept /images/install-new.img as its flag, but it can be a good start there.

One more thing:

Make sure that the vmlinuz and initrd used for any custom properties, in $CDROOT/isolinux do not exceed 8.3 format. Longer names didn’t work for me. I assume (without any further checks) that this is isolinux limitation.

XenServer – increase LVM over iSCSI LUN size – online

Wednesday, September 4th, 2013

The following procedure was tested by me, and was found to be working. The version of the XenServer I am using in this particular case is 6.1, however, I belive that this method is generic enough so that it could work for every version of XS, assuming you're using iSCSI and LVM (aka - not NetApp, CSLG, NFS and the likes). It might act as a general guideline for fiber channel communication, but this was not tested by me, and thus - I have no idea how it will work. It should work with some modifications when using Multipath, however, regarding multipath, you can find in this particular blog some notes on increasing multipath disks. Check the comments too - they might offer some better and simplified way of doing it.

So - let's begin.

First - increase the size of the LUN through the storage. For NetApp, it involves something like:

lun resize /vol/XenServer/luns/SR1.lun +1t

You should always make sure your storage volume, aggregate, raid group, pool or whatever is capable of holding the data, or - if using thin provisioning - that a well tested monitoring system is available to alert you when running low on storage disk space.

Now, we should identify the LUN. From now on - every action should be performed on all XS pool nodes, one after the other.

cat /proc/partitions

We should keep the output of this command somewhere. We will use it later on to identify the expanded LUN.

Now - let's scan for storage changes:

iscsiadm -m node -R

Now, running the previous command again will have a slightly different output. We can not identify the modified LUN

cat /proc/partitions

We should increase it in size. XenServer uses LVM, so we should harness it to our needs. Let's assume that the modified disk is /dev/sdd.

pvresize /dev/sdd

After completing this task on all pool hosts, we should run sr-scan command. Either by CLI, or through the GUI. When the scan operation completes, the new size would show.

Hope it helps!

BackupExec 2012 (14) on newer Linux

Tuesday, August 6th, 2013

In particular – Oracle UEK, which “claims” to be 2.6.39-xxx, but is actually 3.0.x with a lower version number. Several misbehaviors (or differences) of version 3 can be found. One of them is related to BackupExec. The service would not start on OEL6 with UEK kernels. The cause of it is an incorrect use of a function – getIfAddrs. Everything can be seen in this amazing post. The described patch works, at least to allow the service to start. Check out the comments for some insights about how to identify the correct call.

I am re-posting it here, so it can be found for Oracle Universal Enterprise Kernel (UEK) as well.

SABnzbd and high CPU usage on weak CPUs

Sunday, July 21st, 2013

SABnzbd is a nice tool. I just replaced my previous nzbget with it, due to its better handling of the obfuscated names in usenet groups. However, on an Atom CPU, the max download speeds did not go over ~5MB/s on a 100Mb/s link. This is rather sad, because nzbget did get the whole ~11MB/s speeds.
The source of this slowness is the handling of the SSL within python (or SABnzbd, to be exact) which cannot exceed a single core. A workaround for this problem was using stunnel with the SABnzbd, as described here. Because stunnel uses multiple cores, handling of the connections can exceed the rather weak CPU limit into other cores. Following this procedure, I was able to reach 11-12MB/s speeds.

Notice a little correction: The bottom string: connect, has to have the equal sign, like this:

connect = news.giganews.com:563

Juniper NetworkConnect (NC) and 64bit Linux

Tuesday, June 25th, 2013

Due to a major disk crash, I have had to use my ‘other’ computer for VPN connections. It meant that I have had to prepare it for the operation. I attempted to login to aJuniper-based SSL-VPN connection, however, I did get a message saying that my 64bit Java was inadequate. I had a link, as part of the error message to Juniper KB, to which I must link (remembering how I have had to search for possible solutions in the past).

The nice thing about this solution is that it does not replace your default Java version on the system, which was always a problem, as I was using Java for various purposes, but it recognizes that it’s part of the (update-)alternatives list, and makes use of the correct Java version.

Juniper did it right this time!

Oh – and the link to their KB

And to Oracle Java versions, to make life slightly easier for you. You will need Oracle login, however (you can register for free).

TSClient on Ubuntu 12.04

Tuesday, June 25th, 2013

Today there will be a few different posts. This is a day full of events, so…

My first – to allow tsclient to work under Ubuntu 12.04, you should follow this guide: http://superuser.com/a/547102

To sum it up:

  1. Get tsclient to your architecture from http://pkgs.org
  2. Install it using ‘sudo dpkg –force-depends -i tsclient_0.150-3ubuntu1_amd64.deb’
  3. Edit /var/lib/dpkg/status , search for tsclient, and remove the entry containing libpanel-applet2-0

Done.

Target-based persistent device naming

Saturday, June 22nd, 2013

When Connecting Linux to a large array of SAS disks (JBOD), udev creates default persistent names in /dev/disk/by-* . These names are based on LUN ID (all disks take lun0 by default), and by path, which includes, for a pure SAS bus – the PWWN of the disks. It means that an example to such naming would be like this (slightly trimmed for ease of view):

/dev/disk/by-id:
scsi-35000c50055924207 -> ../../sde
scsi-35000c50055c5138b -> ../../sdd
scsi-35000c50055c562eb -> ../../sda
scsi-35000c500562ffd73 -> ../../sdc
scsi-35001173100134654 -> ../../sdn
scsi-3500117310013465c -> ../../sdk
scsi-35001173100134688 -> ../../sdj
scsi-35001173100134718 -> ../../sdo
scsi-3500117310013490c -> ../../sdg
scsi-35001173100134914 -> ../../sdh
scsi-35001173100134a58 -> ../../sdp
scsi-3500117310013671c -> ../../sdm
scsi-35001173100136740 -> ../../sdl
scsi-350011731001367ac -> ../../sdi
scsi-350011731001cdd58 -> ../../sdf
wwn-0x5000c50055924207 -> ../../sde
wwn-0x5000c50055c5138b -> ../../sdd
wwn-0x5000c50055c562eb -> ../../sda
wwn-0x5000c500562ffd73 -> ../../sdc
wwn-0×5001173100134654 -> ../../sdn
wwn-0x500117310013465c -> ../../sdk
wwn-0×5001173100134688 -> ../../sdj
wwn-0×5001173100134718 -> ../../sdo
wwn-0x500117310013490c -> ../../sdg
wwn-0×5001173100134914 -> ../../sdh
wwn-0x5001173100134a58 -> ../../sdp
wwn-0x500117310013671c -> ../../sdm
wwn-0×5001173100136740 -> ../../sdl
wwn-0x50011731001367ac -> ../../sdi
wwn-0x50011731001cdd58 -> ../../sdf

/dev/disk/by-path:
pci-0000:03:00.0-sas-0x5000c50055924206-lun-0 -> ../../sde
pci-0000:03:00.0-sas-0x5000c50055c5138a-lun-0 -> ../../sdd
pci-0000:03:00.0-sas-0x5000c50055c562ea-lun-0 -> ../../sda
pci-0000:03:00.0-sas-0x5000c500562ffd72-lun-0 -> ../../sdc
pci-0000:03:00.0-sas-0×5001173100134656-lun-0 -> ../../sdn
pci-0000:03:00.0-sas-0x500117310013465e-lun-0 -> ../../sdk
pci-0000:03:00.0-sas-0x500117310013468a-lun-0 -> ../../sdj
pci-0000:03:00.0-sas-0x500117310013471a-lun-0 -> ../../sdo
pci-0000:03:00.0-sas-0x500117310013490e-lun-0 -> ../../sdg
pci-0000:03:00.0-sas-0×5001173100134916-lun-0 -> ../../sdh
pci-0000:03:00.0-sas-0x5001173100134a5a-lun-0 -> ../../sdp
pci-0000:03:00.0-sas-0x500117310013671e-lun-0 -> ../../sdm
pci-0000:03:00.0-sas-0×5001173100136742-lun-0 -> ../../sdl
pci-0000:03:00.0-sas-0x50011731001367ae-lun-0 -> ../../sdi
pci-0000:03:00.0-sas-0x50011731001cdd5a-lun-0 -> ../../sdf

Real port (connection) persistence is not possible in that manner. A map of PWWN-to-Slot is required, and handling the system in case of a disk failure by non-expert is nearly impossible. A solution for that is to create matching udev rules which will allow handling disks per-port.

While there are (absolutely) better ways of doing it, time constrains require that I get it to work quick&dirty. The solution is based on lsscsi command, as the backend engine of the system, so make sure it exists on the system. I tend to believe that the system will not be able to scale out to hundreds of disks in its current design, but for my 16 disks (and probably for several tenths as well) – it works fine.

Add 60-persistent-disk-ports.rules to /etc/udev/rules.d/ (and omit the .txt suffix)

 

# By Ez-Aton, based partially on the built-in udev block device rule
# forward scsi device event to corresponding block device
ACTION=="change", SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", TEST=="block", ATTR{block/*/uevent}="change"

ACTION!="add|change", GOTO="persistent_storage_end"
SUBSYSTEM!="block", GOTO="persistent_storage_end"

# skip rules for inappropriate block devices
KERNEL=="fd*|mtd*|nbd*|gnbd*|btibm*|dm-*|md*", GOTO="persistent_storage_end"

# never access non-cdrom removable ide devices, the drivers are causing event loops on open()
KERNEL=="hd*[!0-9]", ATTR{removable}=="1", SUBSYSTEMS=="ide", ATTRS{media}=="disk|floppy", GOTO="persistent_storage_end"
KERNEL=="hd*[0-9]", ATTRS{removable}=="1", GOTO="persistent_storage_end"

# ignore partitions that span the entire disk
TEST=="whole_disk", GOTO="persistent_storage_end"

# for partitions import parent information
ENV{DEVTYPE}=="partition", IMPORT{parent}="ID_*"

# Deal only with SAS disks
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", IMPORT{program}="/usr/local/sbin/detect_disk.sh $tempnode", ENV{ID_BUS}="scsi"
KERNEL=="sd*|sr*|cciss*", ENV{DEVTYPE}=="disk", ENV{TGT_PATH}=="?*", SYMLINK+="disk/by-target/disk-$env{TGT_PATH}"
#KERNEL=="sd*|cciss*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}!="?*", IMPORT{program}="/usr/local/sbin/detect_disk.sh $tempnode"
KERNEL=="sd*|cciss*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", IMPORT{program}="/usr/local/sbin/detect_disk.sh $tempnode", SYMLINK+="disk/by-target/disk-$env{TGT_PATH}p%n"

ENV{DEVTYPE}=="disk", KERNEL!="xvd*|sd*|sr*", ATTR{removable}=="1", GOTO="persistent_storage_end"
LABEL="persistent_storage_end"

 
You will need to add (and make executable) the script detect_disk.sh in /usr/local/sbin. Again – remove the .txt suffix
 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/bin/bash
# Written by Ez-Aton to assist with disk-to-port mapping
# $1 - disk device name
name=$1
name=${name##*/}
# Full disk
TGT_PATH=`/usr/bin/lsscsi | grep -w /dev/$name | awk '{print $1}' | tr -d ] | tr -d [`
if [ -z "$TGT_PATH" ]
then
	# This is a partition, so our grep fails
	name=`echo $name | tr -d [0-9]`
	TGT_PATH=`/usr/bin/lsscsi | grep -w /dev/$name | awk '{print $1}' | tr -d ] | tr -d [`
fi
echo "TGT_PATH=$TGT_PATH"

 
The result of this addition to udev would be a directory called /dev/disk/by-target containing links as follow:

/dev/disk/by-target:
disk-0:0:0:0 -> ../../sda
disk-0:0:1:0 -> ../../sdb
disk-0:0:10:0 -> ../../sdk
disk-0:0:11:0 -> ../../sdl
disk-0:0:12:0 -> ../../sdm
disk-0:0:13:0 -> ../../sdn
disk-0:0:14:0 -> ../../sdo
disk-0:0:15:0 -> ../../sdp
disk-0:0:2:0 -> ../../sdc
disk-0:0:3:0 -> ../../sdd
disk-0:0:4:0 -> ../../sde
disk-0:0:5:0 -> ../../sdf
disk-0:0:6:0 -> ../../sdg
disk-0:0:7:0 -> ../../sdh
disk-0:0:8:0 -> ../../sdi
disk-0:0:9:0 -> ../../sdj

The result is a persistent naming, based on real device ports.
 
I hope it helps. If you get to read it and have some suggestions (or a better use of udev, which I know is far from perfect in this case), I would love to hear about it.