VSAN Flexability for VDI POC and Beyond
Quick thoughts on VSAN flexibility compared to the Hyper Converged offerings, and solving the “how do I do a cost effective POC, —> Pilot —> Production roll out?” without having to overbuild or forklift out undersized gear.
Traditionally I’ve not been a fan of scale out solutions because they force you to purchase storage and compute at the same time (and often in fixed ratio’s). While this makes solving capacity problems easier (Buy another node is the response to all capacity issues) you often end up with extra compute and memory to address unstructured and rarely utilized data growth. This also incurs additional per socket license fees as you get forced into buying more sockets to handle the long tail storage growth (VMware, Veeam, RedHat/Oracle/Microsoft). Likewize if storage IO is fine, your stuck still buying more to address growing memory needs.
Traditional modular non-scale out designs have the problem in that you tend to have to overbuild certain elements (Switching, Storage Controllers or Cache) up front, to hedge against costly and time consuming forklift upgrades. Scale out systems solve this, but the cost of growth can get more expensive than a lot of people like, and for the reasons listed above limit flexibility.
Here’s a quick scenario I have right now that I’m going to use VSAN to solve and cheaply scale through each phase of the projects growth. This is an architects worst nightmare. No defined performance requirements for users and poorly understood applications, and a rapid testing/growth factor where the spec’s for the final design will remain organic.
I will start with a proof of concept for VMware View for 20 users. If it meets expectations It will grow into a 200 user Pilot, and if that is liked, the next growth point can quickly reach 2000 users. I want a predictable scaling system with reduced waste, but I do not yet know the Memory/CPU/Storage IO ratio and expect to narrow down the understanding during the proof of concept and pilot. While I do not expect to need offload cards (APEX, GRID) during the early phases I want to be able to quickly add them if needed. If we do not stay ahead of performance problems, or are not able to quickly adapt to scaling issues within a few days the project will not move forward to the next phases. The datacenter we are in is very limited on rack space. power or is expensive and politically unpopular with management. I can not run blades due to cooling/power density concerns. Reducing unnecessary hardware is as much about savings on CAPEX as OPEX.
For the Proof of Concept start with a single 2RU 24 x 2.5” bay server. (Example Dell R710, or equivalent Superstorage 2027R-AR24NV 2U).
For storage, 12 x 600GB 10K drives, and a PCI-Express 400GB Intel 910 Flash drive. The intel presents 2 x200Gb LUN’s and can serve 2 x 6 disk disk groups.
A pair of 6Core 2.4Ghz Intel Processors 16 x 16GB DIMMs for memory.
For Network connectivity I will purchase 2 x 10Gbps NIC’s but likely only use GigE as the switches will not be needed to be ordered until I add more nodes.
I will bootstrap VSAN onto a single node (not a supported config, but will work for the purposes of testing in a Proof of Concept) and build out a vCenter Appliance, a single Composer, Connection and security server, and two dozen virtual machines. At this point we can begin testing, and should have a solid basis for testing Memory/CPU/Disk IOPS as well as delivering a “fast” VDI experience. If GRID is needed or a concern it can also be added to this single serve to test with and without it (as well as APEX tested for CPU offload of heavy 2D video users).
As We move into the Pilot with 200 users, we have an opportunity to adjust things. With adding 2-3 more nodes we can also expand the disks by doubling the number of spindles to 24, or keep the disk size/flash amount at existing ratio’s. If Compute is heavier than memory we can back down to 128GB (cannibalize half the dimms in the first node even) or even adjust to more core’s or offload cards. At this point we have the base cluster of 3-4 nodes with disk, we can get a bit more radical in future adjustments. At this point 10Gbps or Infiniband switching will need to be purchased. Existing stacked switches though may have enough interfaces to avoiding having to buy new switches or modules for chassis.
As we move into production and nodes 4-8 use 1000 VM’s and up the benefits of VSAN really shine. If we are happy with the disk performance of the first nodes, we can simply add more spindles and flash to the first servers. If we do not need offload cards, dense TWIN Servers, Dell C6000, or HP Sl2500t can be used to provide disk-less nodes. If we find we have more complicated needs, we can resume expanding with the larger 2RU boxes. Ideally we can use the smaller nodes to improve density going forward. At this point we should have a better understanding of how many nodes we will need for full scaling and have desktops from the various user communities represented and be able to predict the total node count. This should allow us to size the switching purchase correctly.