Citrix XenServer 5.0 cannot cooperate with NetApp SnapMirror
It has been a long while, I know. I was busy with life, work and everything around it. Not much worth mentioning.
This, however, is something else.
I have discovered an issue with Citrix XenServer 5.0 (probably the case with 5.5, but I have other issues with that release) using NetApp through NetApp API SR – Any non XenServer-generated snapshot will be deleted as soon as any snapshot-related action would be performed on that volume. Meaning that if I had manually created a snapshot called “1111” (short and easy to recognize, especially with all these UUID-based volumes, LUNs and snapshot names XenServer uses…), the next time anyone would create a snapshot of a machine which has a disk (VDI) on this specific volume, the snapshot, my snapshot, “1111” will be removed under that specific volume. The message seen in /var/log/SMlog would look like this:
Removing unused snap (1111)
While under normal operation, this does not matter much, as non-XenServer snapshots have little value, when using NetApp SnapMirror technology, the mechanism works a bit differently.
It appears that the SnapMirror system takes snapshots with predefined names (non-XenServer UUID type, luckily for us all). These snapshots include the entire changes performed since the last SnapMirror snapshots, and are used for replication. Unfortunately, XenServer deletes them. No SnapMirror snapshots, well, this is quite obvious, is it not? No SnapMirror…
We did not detect this problem immediately, and I should take the blame for that. I had to define a set of simple trial and error tests, as described above, instead of battling with a system I did not quite follow at that time – NetApp SnapMirror. Now I do, however, and I have this wonderful insight which can make your personal life, if you had issues with SnapMirror and XenServer, and did not know how to make it work, better. This solution cannot be an official one, due to its nature, which you will understand shortly. This is a personal patch for your pleasure, based on the hard fact that SnapMirror uses a predefined name for its snapshots. This name, in my case, is the name of the DR storage device. You must figure out what name is being used as part of the snapshot naming convention on your own site. Search for my ‘storagedr’ phrase, and replace it with yours.
This is the diff file for /opt/xensource/sm/NETAPPSR.py . Of course – back up your original file. Also – this is not an official patch. It was tested to function correctly on XenServer 5.0, and it will not work on XenServer 5.5 (since NETAPPSR.py is different). Last warning – it might break on the next update or upgrade you have for your XenServer environment, and if that happens, you better monitor your SnapMirror status closely then.
400,403c400,404 < util.SMlog("Removing unused snap (%s)" % val) < out = netapplib.fvol_snapdelete_wrapper(self.sv, val, volname) < if not na_test_result(out): if 'storagedr' not in val: > util.SMlog("Removing unused snap (%s)" % val) > out = netapplib.fvol_snapdelete_wrapper(self.sv, val, volname) > if not na_test_result(out): > pass
Hope it helps!
This is a known issue where snapmirror snapshots are deleted. In StorageLink v1.0.5 maintenance pack, this issue has been fixed. Note, the XenServer native NetApp adapter might still have the issue.
Thanks for your comment!
I have heard about this fix, however, I cannot use StorageLink, as I cannot use XenServer 5.5 just yet, due to other issues with it.
I have other issues with the StorageLink system – this is a Windows-based appliance. This doesn’t seem right to me, really. The entire world is moving towards Linux-based appliances, with the flexibility and power of it.
Maybe it’s only me 🙂
Citrix is a Windows based company.
Don’t like it? Move to KVM! 🙂
Are you serious? Do you really compare an enterprise class solution to hand-tailored script trial-and-error patchy KVM based solution? Even RedHat’s Kumranet were based on a set of lousy scripts.
I could have used Xen Community, which is nice, and lacks some of the nice manageability features, especially now, that there are PV drivers for Windows out there for all of us to use, but this is still far from ideal.
BTW – I have an automated provisioning system based on Xen Community (Actually, Centos 5.3’s old Xen Community) which works like a charm. Zero problems. However, I had to build my own set of scripts, which makes life somewhat more difficult in an enterprise environment.
When you think enterprise-class, you have to set your mind to look at things in the right proportions. Otherwise, well, it’s a mess someone has to clean up afterwards (thing I have done numerous times during the last few years).
Now you just need to set your mind and look at windows in the right proportions ^_^
Microsoft is not enterprise-class more than any other. Manageability and ability to maintain and restore system-state is a major drawback in Windows systems.
Are you arguing just for the sake or argue? Fine with me, but warn me before 🙂
We’ve got 2 IBM Blade Centers & 2 StoreVaults (1 S500 & 1 S550) here & Xen 5 here. We’re just now finding the tip of the iceberg as far as obstacles go… Since what we set out to accomplish wasn’t completed by a vendor (that went under after we closed the upgrade project) we’re kind of starting over in a sense. One of our major obstacles now is discovering we cannot utilize the SnapMirror/SnapRestore products. To quote NetApp support, “you cannot use our product unless our product first creates the LUNs”. Fair enough, I’ve also heard that if you create the LUNs using NetApp’s interface then you in turn lose the ability to migrate VMs from one Xen installation to another (on the fly). I’ve yet to verify this but I thought I might as well just ask before I put the time into fumbling through the setup just to run into another virtual wall.
One thing that is to our advantage is that the the 2nd set of hardware isn’t in production yet. If we need to start over we can and it looks like we might just have to do things this way. One of my questions at this point is: If we want to keep the XenMotion capability as well as mirror the volumes what is the best way to proceed during the initial build? I honestly don’t care if Xen or ONTap manages the storage I just want to build and preserve the key parts of the functionality that bennefit us most (xenmotion and mirrored data sets).
Why would you like to use SnapRestore? Use Xen internal built-in snapshot abilities with NetApp, and keep as many copies as you like.
You do not lose the ability to migrate LUNs. All your LUNs will be mapped directly to all Xen servers as generic iSCSI LVM LUNs. You will not be able to snapshot them using XenServer 5.0, but only XenServer 5.5. With a bug about freeing space with snapshots on XenServer 5.5 LVM-based snapshots, I would still attempt to delay using it.
You can use XenServer native NetApp interface, along with SnapMirror (we use it, so I believe you could do too between storage devices, and there is probably an internal copying mechanism which will work just the same – volume-level using snapshots). It works quite well. Check my blog about how to prevent it from removing the “important” snapshots used by SnapMirror.
Sorry for the delay. Vacation.
Keep in touch about it. I truly believe you can get whatever you want from this system, with some minor effort.
So is StorageLink required or does it just make things easier to setup between Xen, server hardware and 3rd party storage vendors?
It’s not required. It just eases your life with storage devices.
To be honest, I have not used StorageLink just yet. So far I have had the pleasure of using NetApp devices, and mostly XenServer 5.0