Nas4Free and support for Mellanox ConnectX 10GbE

I have been implementing Nas4Free recently, and found this system to be a very nice one. I might try to port its web interface to Linux, as it completes a set of requirements (regarding graphic interface) I do not find in Linux, and wish I could…

However, I have had to add a driver for ConnectX 10GbE interface, which, unfortunately, was not included.

This might show as a simple task, however, for a person unfamiliar with FreeBSD, it was a challenge.

I have followed the steps described in this build-your-own Nas4Free wiki guide, except a minor change:
I have edited the file /usr/local/nas4free/svn/build/kernel-config/NAS4FREE-amd64 (attached: NAS4FREE-amd64).

I did not build the system beyond compiling the kernel, as I needed only the kernel – I needed to implant the kernel into an embedded system. The procedure was as follow:

  • On the compilation server, run: cat /usr/obj/nas4free/usr/src/sys/NAS4FREE-amd64/kernel | gzip -n -9 > /tmp/kernel.gz
  • On the target system: note the device for the mount /cf
  • umount /cf (because it’s read-only)
  • mount the device noted before to /cf (now /cf is writable)
  • Copy from compilation server the file /tmp/kernel.gz to /cf/boot/kernel/ overwriting the existing file
  • umount /cf
  • Reboot

This is how to compile with extended support and to push it into an embedded Nas4Free system. Note, however, that the version of my Nas4Free was 9.1.0.1.636. Newer versions might include the Mellanox drivers, and this operation will be obsolete.

 

 

Recovery of a StorageRepository (SR) in XenServer, part one

In this part I will discuss a possible solution to a problem I encountered several times already – failure to understand XenServer use of LVM, but first – a little explanation of the topic.

XenServer makes extensive use of LVM technology in order to support the storage requirements of virtual disks. It is being utilized in two methods – LVMoISCSI/LVMoHBA and ext. In both cases, XenServer defines the initial layout as a LVM framework. The LVM, except for the system disk, is positioned directly on the disk in whole, and not on the first partition. I imagine that the desire to avoid dealing with GPT/Basic/Other partitioning schemes is the root of this notion. While it does solve the disk partitioning method problem, it creates a different problem – PEBKC problem (Problem Exists Between Keyboard and Chair). Lack of understanding that there is no partition on the disk, but the data is structured directly on it, is the cause of relatively frequent deletion of the LVM structure as it being replaced by a partitioning layout. The cause of it can be one of two common problems – the first is that the LUN/disk is exposed directly to a Windows machine, which asks joyfully if one would like to ‘sign the partition’. If one does so, a basic partitioning structure is created, and the LVM data structure is overwritten by it. The second problem is a little less common, and involves lack of understanding of the LVM structure as employed by XenServer, when performing disk tasks as the root user on the XenServer host directly. In this case, the user will not be aware of the data structure, and might be tempted to partition, and God forbid – even format the created partition. The result would be a total loss of the SR.

This was about how data is structured and how it is erased or damaged.

I was surprised to discover the ‘easy’ method of recovery from a partitioning table layer over the LVM metadata. I assume that no one has attempted to format the resulting partition(s), but stopped only at creating the partition layout and attempting to understand why it doesn’t work anymore in XenServer.

The easy way, which will be discussed here, is the first of two articles I intend on writing about LVM recovery. If this ‘easy’ method works for you – no need to try your luck with the more complex one.

So, to work. In case someone has created a partition layout, overwriting, as explained earlier, the LVM metadata structure, the symptoms would be that a disk will have (a) partition(s). For example, the results of ‘cat /proc/partitions’ would look like that (snipping the irrelevant parts)

8         16        156290904 sdb
8        17        156288321 sdb1

As clearly visible – the bold line should not be there. The output of ‘fdisk -l /dev/sdb’ showed (again – snipping the irrelevant parts):

/dev/sdb1                                1                   19457                 156288321       83  Linux

It proves someone has manually attempted to partition the disk. Had a mount command worked (example: ‘mount /dev/sdb1 /mnt’) my response e-mail message would go like this: “Sorry. The data was overwritten. Can’t do anything about it”, however, this was not the case. Not this time.

The magic trick I used was to remove the partition entirely, freeing the disk to be identified as LVM, if it could – I wasn’t sure it would – and then take some recovery actions.

First – fdisk to remove the partition:

fdisk /dev/sdb << EOF
d
w
EOF

Now, a pvscan operation could take place. The following command returned the correct value – a PV ID which wasn’t there before, meaning that the PV information was still intact:

pvscan

Now, a simple ‘SR Repair’ operation could take place.

Easy.
My next article in this series will show a more complex method of recovery to employ when this ‘easy’ one doesn’t work.

NetApp – Copy LUN between filers using NDMP

ndmpcopy is a wonderful command. It allows a fine-grained copy of files or directories between NetApp devices, across network, even if they do not use (or unlicensed) SnapMirror, SnapVault and the rest of the Snap* products NetApp offer.
In this example I will show how to copy a LUN from one filer to the other.

First, set the LUN to offline on the source filer. Make sure that it is not mounted, disconnected, etc – whatever prevents any major data loss. As you can deduce – setting a LUN to offline state will prevent write access to it. Also – take its parameters. For example:

lun show -v /vol/server1/data/mydb.lun

Second, create the required qtree structure. Make sure that the LUN is created at the root of either a volume or a qtree, or else.

Third, use ndmpcopy:

ndmpcopy -da root:password /vol/server1/data/mydb.lun remotefiler:/vol/server1/data/

This operation will take time.

When it completes, on the target NetApp, set priv to diag, and do the following:

  • Rename the LUN:
    mv /vol/server1/data/mydb.lun /vol/server1/data/mydb.not.lun
  • Create a hard-link LUN from a file (requires priv diag!)
    lun create -f /vol/server1/data/mydb.not.lun -t linux -o noreserve /vol/server1/data/mydb.lun
    (Command syntax: lun create -f <file_path> -t <ostype> [ -o noreserve ] [ -e space_alloc ] <lun_path>)
  • Remove the original file (it is hard-linked, so the data will not be affected)
    rm /vol/server1/data/mydb.not.lun
  • Resize, if required, the LUN to the original full size (relevant if the LUN was thin-privisioned)
    lun resize /vol/server1/data/mydb.lun 400g

You can now map the LUN to any relevant host, and obtain full access to its data.

XenServer – Setting virtual disks names based on the VM names

One of the worst things you can have in XenServer, is some wize-guy performing a ‘forget storage’ on a storage device still holding virtual disks related to VMs. As XenServer database is internal (for the whole pool) and not per-VM, all references to this virtual disks disappear, and you remain with bunch of VMs without disks, and later on, when the recovered from the shock and restored the SR, with a bunch of virtual disks you have no clue as to where they belong. Why? Because we are lazy, and we tend to skip the part where you can (or is it – should?) define a custom name for your virtual disks so you would know later on (for example – in the case specified above) where they belong(ed).

To solve this annoying issue, and to save time for Citrix XenServer admins, I have created a script which resets the VDI (virtual disk object) names to the name of the VM+ the logical position of the virtual disk (example: xvda, hdb, etc), related to the VM. That way, it will become very easy to identify the disks in case of such annoying micro-catastrophy (micro because no data is lost, just where it belongs…).

The script can be called manually, and since we’re lazy people, and we will forget to handle it manually every said interval, and will accumulate virtual machines with “Template of XYZ” virtual disks, it can be called from cron. When called manually, it asks the user to proceed by pressing ‘Enter’. If called from cron, it just runs.

Enjoy!

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#!/bin/bash
# This script will reset the names of the virtual disks used for each VM to a standard name, based on the VM name
# and the disk position
# It is meant to solve problems where due to 'forget storage' operations or the likes
# virtual disk associations disappear, and you face many disks having the same name
#
# Written by Ez-Aton: http://run.tournament.org.il
 
 
if [ -t 1 ]
then
        echo "This script will reset *all* VM disks to a name constructed of the VM and the disk name (xvda, hdb, etc)"
        echo "This operation is not reversible, however, it can be called repeatedly"
        echo "If you want this script to skip a said virtual disk, make sure its name includes the name of the VM"
        echo "For example 'vm1 the real important data disk' for a disk used by vm1."
        echo "Note that the name is case sensitive, and it is very important that to match the name using upper/lower case letters as needed"
        echo "To abort, press Ctrl+C"
        echo "To proceed, press Enter"
        read abc
fi
 
VM_LIST=`xe vm-list is-control-domain=false --minimal | tr , ' '`
 
for i in $VM_LIST
do
        # Resetting several parameters, so we have a clean start
        VM_NAME=""
        VBD_LIST=""
        VDI_LIST=""
        # We iterate through all existing VMs, to get both their names, and their disks
        VM_NAME="`xe vm-param-get uuid=$i param-name=name-label`"
        if [ -z "$VM_NAME" ]
        then
                # We have a problem with empty VM names, so we will use the VMs uuid
                VM_NAME=$i
        fi
        VBD_LIST=`xe vbd-list vm-uuid=$i --minimal | tr , ' '`
        for j in $VBD_LIST
        do
                # Resetting several parameters, so we have a clean start
                VDI_UUID=""
                DEV_NAME=""
                # We iterate through all existing VBDs to reset the VDI nane
                VDI_UUID=`xe vbd-param-get uuid=$j param-name=vdi-uuid`
                if [ "$VDI_UUID" == "<not in database>" ]
                then
                        # This is a virtual CDROM
                        continue
                fi
                DEV_NAME=`xe vbd-param-get uuid=$j param-name=device`
                VDI_NAME=`xe vbd-param-get uuid=$j param-name=vdi-name-label`
 
                # Test if the name was reset in the past or manually
                TGT_NAME="$VM_NAME $DEV_NAME"
                if [[ "$TGT_NAME" = "$VDI_NAME" ]]
                then
                        # There is nothing to do
                        echo "Name already includes VM name, so nothing to do"
                else
                        # Here we reset the VDI name
                        echo xe vdi-param-set uuid=$VDI_UUID name-label="$TGT_NAME"
                        xe vdi-param-set uuid=$VDI_UUID name-label="$TGT_NAME"
                fi
        done
done

XenServer (licensed) – adding license from command line

I have had a single node of a pool using a different license server. Temporary, unfortunately. It has expired, and as the purchase process was somewhat prolonged, I have had to extend it. I did not want to disconnect my other-four-hosts-pool from the permanent license server, which works so fine for the last year or so, so I have had to change the license only for a single host.

XenServer 6.1 XenCenter does not allow changing the license server for a single host in a licensed pool. I have had to search for a solution. The solution looks like this:

As for a while now, Citrix do not use license file, but license server, and alternate solution had to be found.

xe host-apply-edition edition=advanced|enterprise|platinum|enterprise-xd license-server-address=<license_server_address> host-uuid=<uuid_of_host> license-server-port=<license_server_port> 

This has solved the problem immediately.

Nice.

IPSec VPN for mobile devices on Linux

I have had recently the pleasure and challenge of setting up VPN server for mobile devices on top of Linux. the common method to do so would be by using IPSec + L2TP, as these are to more common methods mobile devices allow, and it should work quite fine with other types of clients (although I did not test it) like Linux, Windows and Mac.

I have decided to use PSK (Pre Shared Key) due to its relative simplicity when handling multiple clients (compared to managing certificate per-device), and its relative simplicity of setup.

My VPN server platform is Linux, x86_64 (64 bit), Centos 6. Latest release, which is for the time being 6.3 and some updates.

I have used the following link as a baseline, and added some extra about IPTables, which was a little challenge, where I wanted good-enough security around this setup.

Initially, I wanted to use OpenSWAN, however, it does not allow easy integration with dynamic IP address, and its policy, while capable of being very precise, was not flexible enough to handle varying local IP address.

First – Add the following two repositories: Nikoforge, for Racoon (ipsec-tools), and EPEL for xl2tpd.

You can add them the following way:

rpm -ivH http://repo.nikoforge.org/redhat/el6/nikoforge-release-latest
yum -y install http://vesta.informatik.rwth-aachen.de/ftp/pub/Linux/fedora-epel/6/i386/epel-release-6-7.noarch.rpm
yum -y install ipsec-tools xl2tpd

Following that, create a script called /etc/racoon/init.sh:

#!/bin/sh
# set security policies
echo -e "flush;\n\
        spdflush;\n\
        spdadd 0.0.0.0/0[0] 0.0.0.0/0[1701] udp -P in  ipsec esp/transport//require;\n\
        spdadd 0.0.0.0/0[1701] 0.0.0.0/0[0] udp -P out ipsec esp/transport//require;\n"\
        | setkey -c
# enable IP forwarding
echo 1 > /proc/sys/net/ipv4/ip_forward

Make sure this script allows execution, and add it to /etc/rc.local

Racoon config /etc/racoon/racoon.conf looks like this for my setup:

path include "/etc/racoon";
path pre_shared_key "/etc/racoon/psk.txt";
path certificate "/etc/racoon/certs";
path script "/etc/racoon/scripts";
#log debug;
remote anonymous
{
      exchange_mode    aggressive,main;
      #exchange_mode    main;
      passive          on;
      proposal_check   obey;
      support_proxy    on;
      nat_traversal    on;
      ike_frag         on;
      dpd_delay        20;
      #generate_policy unique;
      generate_policy on;
      verify_identifier on;
      proposal
      {
            encryption_algorithm  aes;
            hash_algorithm        sha1;
            authentication_method pre_shared_key;
            dh_group              modp1024;
      }
      proposal
      {
            encryption_algorithm  3des;
            hash_algorithm        sha1;
            authentication_method pre_shared_key;
            dh_group              modp1024;
      }
}
sainfo anonymous
{
      encryption_algorithm     aes,3des;
      authentication_algorithm hmac_sha1;
      compression_algorithm    deflate;
      pfs_group                modp1024;
}

The PSK is kept inside /etc/racoon/psk.txt. It looks like this for me (changed password, duh!):

myHome   ApAssPhR@se

Both said files (/etc/racoon/racoon.conf and /etc/racoon/psk.txt) should have only-root permissions, aka 600.

Notice the bold myHome identifier. As the local address might change (either ppp dialup, or DHCP client), this one will be used instead of the local address identifier, as the common identifier of the connection. For Android devices, it will be defined as the ‘IPSec Identifier’ value.

We need to setup xl2tpd: Edit /etc/xl2tpd/xl2tpd.conf and have it look like this:

[global]
debug tunnel = no
debug state = no
debug network = no
ipsec saref = yes
force userspace = yes
[lns default]
ip range = 192.169.0.10-192.169.0.20
local ip = 192.169.0.1
refuse pap = yes
require authentication = yes
name = l2tpd
ppp debug = no
pppoptfile = /etc/ppp/options.xl2tpd
length bit = yes

The IP range will define the client VPN interface address. The amount should match the expected number of clients, or be somewhat larger, to be on the safe side. Don’t try to be a smart ass with it. Use explicit IP addresses. Easier that way. The “local” IP address should be external to the pool defined, or else a client might collide with it. I haven’t tried checking if xl2tpd allowed such configuration. You are invited to test, although it’s rather pointless.

Create the file /etc/ppp/options.xl2tpd with the following contents:

ms-dns 192.168.0.2
require-mschap-v2
asyncmap 0
auth
crtscts
lock
hide-password
modem
debug
name l2tpd
proxyarp
lcp-echo-interval 10
lcp-echo-failure 100

The ms-dns option should specify the desired DNS the client will use. In my case – I have an internal DNS server, so I wanted it to use it. You can either use your internal, if you have any, or Google’s 8.8.8.8, for example.

Almost done – you should add the relevant login info to /etc/ppp/chap-secrets. It should look like: “username” * “password” * . In my case, it would look like this:

# Secrets for authentication using CHAP
# client    server    secret            IP addresses
ez-aton        *    ”SomePassw0rd”        *

Select a good password. Security should not be taken lightly.

We’re almost done – we need to define IP forwarding, which can be done by adding the following line to /etc/sysctl.conf:

# Controls IP packet forwarding
net.ipv4.ip_forward = 1

and then running ‘sysctl -p’ to load these values.

Run the following commands, and your system is ready to accept connections:

chkconfig racoon on
chkconfig xl2tpd on
service racoon start
service xl2tpd start
/etc/racoon/init.sh

That said – we have not configured IPTables, in case this server acts as the firewall as well. It does, in my case, so I have had to take special care for the IPTables rules.

As my rules are rather complex, I will only show the rules relevant to the system, assuming (and this is important!) it is both the firewall/router and the VPN endpoint. If this is not the case, you should search for more details about forwarding IPSec traffic to backend VPN server.

So, my IPTables rules would be these three:

iptables -A INPUT -p udp --dport 500 -j ACCEPT
iptables -A INPUT -p udp --dport 4500 -j ACCEPT
iptables -A INPUT -p esp -j ACCEPT
iptables -A INPUT -p 51 -j ACCEPT # Not sure it's required, but too lazy to test without

This covers the IPSec part, however, we would not want the L2TP server to accept connections from the net, just like that, so it has its own rule, for port 1701:

iptables -A INPUT -p udp -m policy –dir in –pol ipsec -m udp –dport 1701 -j ACCEPT

You can save your current iptables rules (after checking that they work correctly) using ‘service iptables save’, or manually (backup your original rules to be on the safe side), and you’re all ready to go.

Good luck!

XenServer get VM by MAC

Using the GUI, it could be somewhat complex identifying a VM based on its MAC address. There are several solutions on the network using PowerShell, but I will demonstrate it using a simple bash script, below. Save, make executable, and run.

Enjoy

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/bin/bash
if [ -z "$1" ]
then
	echo "Requires parameter - MAC address"
	exit 1
fi
 
MAC=$1
# You might want to check MAC correctness here. Enjoy doing it. RegExp, man!
 
# XenServer is agnostic to case for MAC addresses, so we don't care
VIF_UUID=`xe vif-list MAC=$MAC | grep ^uuid | awk '{print $NF}'`
 
VM=`xe vif-param-list uuid=$VIF_UUID | grep vm-name-label | awk '{print $NF}'`
 
echo "MAC $MAC has VM $VM"

Recycling old and terrible <100$ two years old tablet

I find it that identifying a missing something in the fridge, and adding it to a list does not work well for me. It’s either that I take a mental note of the missing groceries, and then, almost immediately, forget them until the unpacking of the just-purchased groceries, back home, several days later, or that I actually move myself into writing it down on a note, placed on the fridge, and then, of course, forget to take the note with me to the supermarket. Not working.

I have had an old tablet I purchased as my first Android device (and I’m not quite sure why I stayed liking Android in general after the experience I have had with the device). This tablet was very weak when purchased (didn’t get better since), and very cheap (that’s why I purchased it). It’s called ‘Eken M001′ and you can read a review of it here.

This tablet was horrible when purchased. You can hardly do anything with it. However, I came up with an idea – why not use it to hold a grocery purchase list on the fridge and sync this list to my Android cell phone? Wow! A silly, and very cool idea, at the same time.

So, today I have (re)installed the device, configured it to support Hebrew (not that simple on Android 1.6, a breeze on Android 4 and above. Guess what version I have there…) and added that nice list application “OurGroceries”. The result is in the following pictures:

The fridge is hardly visible, acting as our white background. To prevent the device from falling, I have attached it to the sides of the fridge, and not to the door.

I hope it actually will save my problem there. Could be nice :-)

RedHat cluster on RHEL6 and KVM-based VMs

The concept of running a virtual machine, KVM-based, in this case, under RHCS is acceptable and reasonable. The interesting part is that the <vm/> directive replaces the <service/> directive and acts as a high-level directive for VMs. This allows for things which cannot be performed with regular 'service', such as live migration. There are probably more, but this is not the current issue.

An example of how it can be done can be shown in this excellent explanation. You can grab whatever parts of it relevant to you, as there is an excellent combination of DRBD, CLVM, GFS and of course, KVM-based VMs.

This whole guide assumes that the VMs reside on a shared storage, which is concurrently accessible by both (all?) hosts. When this is not the case, like when the shared filesystem is ext3/4 and not GFS, and the virtual disk image file is located on it. In this particular case, you would want to connect the VM to the mount. This cannot be performed, however, when using the <vm/> as a top directive (like <service/>), as it does not allow for child-resources.

As the <vm/> directive allows to be defined (with some limitations) as a child resource in a <service/> group, it inherits some properties from its parent (the <service/> directive), while some other properties are not mandatory and will be ignored. A sample configuration would be this:

<resources>
     <fs device="/dev/mapper/mpathap1" force_fsck="1" force_unmount="1" fstype="ext4" mountpoint="/images" name="vmfs" self_fence="0"/>
</resources>
<service autostart="1" domain="vm1_domain" max_restarts="2" name="vm1" recovery="restart">
     <fs ref="vmfs"/>
     <vm migrate="pause" name="vm1" restart_expire_time="600" use_virsh="1" xmlfile="/images/vm1.xml"/>
</service>

This would do the trick. However, the VM will not be able to live migrate, but will have to shutdown/startup for each cluster takeover.

Mapping internal (SATA, SAS, RAID, etc) disks from XenServer host to VM

In my post here, I have explained (actually – created a shell script) to map USB disks to VMs directly. While this is easy and simple, it becomes more challenging when you want to map internal SATA disks. They are not attached to the “Removable Storage” SR, and thus, behave differently.

The solution is to make them part of the “Removable Storage” group. This can be performed by adding the following two lines at the bottom of the XenServer’s /etc/udev/rules.d/50-udev.rules

 

ACTION=="add", KERNEL=="sdb", SYMLINK+="xapi/block/%k", RUN+="/bin/sh -c '/opt/xensource/libexec/local-device-change %k 2>&1 >/dev/null&'"
ACTION=="remove", KERNEL=="sdb", RUN+="/bin/sh -c '/opt/xensource/libexec/local-device-change %k 2>&1 >/dev/null&'"

 
Replace sdb with the device, as can be found using `cat /proc/partitions` (that way you can get the exact size, and compare it to what you expect to see). In this particular case, the device ‘sdb’ will be added to the “Removable Storage” group and then it’s all easy – just like I have described in my previous post.

I have had a great reference from here