Ontap Simulator, and some insights about NetApp
First and foremost – the Ontap simulator, a great tool which surely can assist in learning NetApp interface and utilization, lacks in performance. It has some built-in limitations – No FCP, no disks (virtual disks) larger than 1GB (per my trial-and-error. I might find out I was wrong somehow, and put in on this website), and low performance. I’ve got about 300KB/s transfer rate both on iSCSI and on NFS. To make sure it was not due to some network hog hiding somewhere on my net(s), I’ve even tried it from the host of the simulator itself, but to no avail. Low performance. Don’t try to use it as your own home iSCSI Target. Better just use Linux for this purpose, with the drivers obtained from here (It’s one of my next steps into “shared storage(s) for all”).
Another issue – After much reading through NetApp documentation, I’ve reached the following concepts of the product. Please correct me if you see fit:
The older method was to create a volume (vol create) directly from disks. Either using raid_dp or raid4.
The current method is to create aggregations (aggr create) from disks. Each aggregate consists of raid groups. A raid group (rg) can be made up of up to eight physical disks. Each group of disks (an rg) has one or two parity disks, depending on the type of raid (raid 4 uses one parity, and raid_dp uses “double parity”, as its name can suggest).
Actually, I can assume that each aggregation is formatted using the WAFL filesystem, which leads to the conclusion that modern (flex) volumes are logical “chunks” of this whole WAFL layout. In the past, each volume was a separated WAFL formatted unit, and each size change required adding disks.
This separation of the flex volume from the aggregation suggests to me the possibility of multiple-root capable WAFL. It can explain the lack of requirement for a continuous space on the aggregation. This eases the space management, and allows for fast and easy “cloning” of volumes.
I believe that the new “clone” method is based on the WAFL built-in snapshot capabilities. Although WAFL Snapshots are supposed to be space conservatives, they require a guaranteed space on the aggregation prior to committing the clone itself. If the aggregation is too crowded, they will fail with the error message “not enough space”. If there is enough for snapshots, but not enough to guarantee a full clone, you’ll get a message saying “space not guaranteed”.
I see the flex volumes as some combination between filesystem (WAFL) and LVM, living together on the same level.
LUNs on NetApp: iSCSI and/or Fibre LUNs are actually managed as a single (per-LUN) large file contained within a volume. This file has special permissions (I was not able to copy it or modify it while it was online and I had root permissions. However, I am rather new to NetApp technology), and it is being exported as a disk outside. Much like an ISO image (which is a large file containing a whole filesystem layout) these files contain a whole disk layout, including partition tables, LVM headers, etc – just like a real disk.
Thinking about it, it’s neither impossible nor very surprising. A disk is no more than a container of data, of blocks, and if you can utilize the required communication protocol used for accessing it and managing its blocks (aka, the transport layer on which filesystem can access the block data), you can, with just a little translation interface, set up a virtual disk which will behave just like any regular disk.
This brings us to the advantages of NetApp’s WAFL – the ability to minimize I/O while maintaining a set of snapshots for the system – a list of per-block modification history. It means you can “snapshot” your LUN, being physically no more than a file on a WAFL-based volume, and you can go back with your data to a previous date – an hour, a day, a week. Time travel for your data.
There are, unfortunately, some major side effects. If you’ve read the WAFL description from NetApp, my summary will be inaccurate at best. If you haven’t, it will be enough, but still you are most encouraged to read it. The idea is that this filesystem is made out of multi-layers of pointers, and of blocks. A pointer can point to more than one block. When you commit a snapshot, you do not change the pointers, you do not move data, you just modify the set of pointers. When there is any change in the data (meaning a block is changed), the pointer points to the alternate block instead of the previous (historical) block, but keeps reference of the older block’s location. This way, only modified blocks are actually recreated, while any unmodified data remains on the same spot on the physical disk. An additional claim of NetApp is that their WAFL is optimized for the raid4 and raid_dp they use, and utilizes it in a smart manner.
The problem with WAFL, as can be easily seen, is fragmentation. For CIFS and NFS, it does not cause much of a problem, as the system is very capable of read-ahead just to solve this issue. However, A LUN (which is supposed to act as a continuous layout, just like any hard-drive or raid-array in the world and on which various file-system related operations occur) gets fragmented.
Unlike CIFS or NFS, LUN read-ahead is harder to predict, as the client tries to do just the same. Unlike real disks, NetApp LUNs do not behave, performance-wise, like the hard-drive layout any DB or FS has learned to expect and was best optimized for. It means, for my example, that on a DB with lots of small changes, that the DB itself would have tried to commit changes in large write operations, committed every so and so interval, and would thrive to commit them as close to each other, as continuous as possible. On NetApp LUN this will cause fragmentation, and will result in lower write (and later read) performance.
That’s all for today.