Virtual Desktop Infrastructure Archives

VSAN is now up to 30% cheaper!

Ok, I’ll admit this is an incredibly misleading click bait title. I wanted to demonstrate how the economics of cheaper flash make VMware Virtual SAN (and really any SDS product that is not licensed by capacity) cheaper over time. I also wanted to share a story of how older slower flash became more expensive.

Lets talk about a tale of two cities who had storage problems and faced radically different cost economics. One was a large city with lots of purchasing power and size, and the other was a small little bedroom community. Who do you think got the better deal on flash?

Just a small town data center….

A 100 user pilot VDI project was kicking off. They knew they wanted great storage performance, but they could not invest in a big storage array with a lot of flash up front. They did not want to have to pay more tomorrow for flash, and wanted great management and integration. VSAN and Horizon View were quickly chosen. They used the per concurrent user licensing for VSAN so their costs would cleanly and predictably scale. Modern fast enterprise flash was chosen that cost ~$2.50 per GB and had great performance. This summer they went to expand the wildly successful project, and discovered that the new version of the drives they had purchased last year now cost $1.40 per GB, and that other new drives on the HCL from their same vendor were available for ~$1 per GB. Looking at other vendors they found even lower cost options available. They upgraded to the latest version of VSAN and found improved snapshot performance, write performance and management. Procurement could be done cost effectively at small scale, and small projects could be added without much risk. They could even adopt the newest generation (NVMe) without having to forklift controllers or pay anyone but the hardware vendor.

Meanwhile in the big city…..

The second city was quite a bit larger. After a year long procurement process and dozens of meetings they chose a traditional storage array/blade system from a Tier 1 vendor. They spent millions and bought years worth of capacity to leverage the deepest purchasing discounts they could. A year after deployment, they experienced performance issues and wanted to add flash. Upon discussing with the vendor the only option was older, slower, small SLC drives. They had bought their array at the end of sale window and were stuck with 2 generations old technology. It was also discovered the array would only support a very small amount of them (the controllers and code were not designed to handle flash). The vendor politely explained that since this was not a part of the original purchase the 75% discount off list that had been on the original purchase would not apply and they would need to pay $30 per GB. Somehow older, slower flash had become 4x more expensive in the span of a year. They were told they should have “locked in savings” and bought the flash up front. In reality though, they would locking in a high price for a commodity that they did not yet need. The final problem they faced was an order to move out of the data center into 2-3 smaller facilities and split up the hardware accordingly. That big storage array could not easily be cut into parts.

There are a few lessons to take away from these environments.

Storage should become cheaper to purchase as time goes on. Discounts should be consistent and pricing should not feel like a game show. Software licensing should not be directly tied to capacity or physical and should “live” through a refresh.
Adding new generations of flash and compute should not require disruption and “throwing away” your existing investment.
Storage products that scale down and up without compromise lead to fewer meetings, lower costs, and better outcomes. Large purchases often leads to the trap of spending a lot of time and money on avoiding failure, rather than focusing on delivering excellence.

Is VDI really not “serious” production?

This post is in response to a tweet by Chris Evans (Who I have MUCH respect for and is one of the people that I follow on a daily basis on all forms of internet media). The discussion on twitter was unrelated (Discussing the failings of XtremeIO) and the point that triggered this post was when he stated VDI is “Not serious production”.

While I might have agreed 2-3 years ago when VDI was often in POC, or a plaything of remote road warriors or a CEO, VDI has come a long way in adoption. I’m working at a company this week with 500 users and ALL users outside of a handful of IT staff work in VDI at all times. I”m helping them update their service desk operations and a minor issue with VDI (profile server problems) is a critical full stoppage of the business. Even if all of their 3 critical LOB apps going down would be less of an impact. At least people could still access email, jabber and some local files.

There are two perspectives I have from this.

1. Some people are actually dependent on VDI to access all those 99.99999% uptime SLA apps so its part of the dependency tree.

2. We need to quit using 99.9% SLA up-time systems and process’s to keep VDI up. It needs real systems, change control, monitoring and budget. 2 years ago I viewed vCOPS for View as an expensive necessity, now I view it as a must have solution. I’m deploying tools like LogInsight to get better information and telemetry of whats going on, and training service desks on the fundamentals of VDI management (that used to be the task of a handful of sysadmins). While it may not replace the traditional PC and in many ways is a middle ground towards some SaaS web/mobile app future, its a lot more serious today than a lot of people realize.

I’ve often joked that VDI is the technology of last resort when no other reasonable offering made sense (Keep data in datacenter, solve apps that don’t work under RDS, organizations who can’t figure out patch/app distribution, highly mobile but poorly secured workforce). For better or for worse its become the best tool for a lot of shops, and its time to give it the respect it deserves.

At least the tools we use to make VDI serious today (VSAN/VCOPS/LogInsight/HorzionView6) are a lot more serious than the stuff I was using 4 years ago.

My apologies, for calling our Chris (which wasn’t really the point of this article) but I will thank him for giving me cause to reflect on the state of VDI “seriousness” today.

How does your organization view and depend on VDI today, and is there a gap in perception?

KeyNote Part1

VMware: EVO:RAIL – It looks like our shift to SuperMicro for VSAN was the right choice. Will be be looking for
EVO:Rack – A vBlock without limits? We will see.

OpenStack – VMware is doing a massive amounts of code push to OpenStack so OpenStack can control vSphere, NSX etc allowing for people to run VMware API’s and OpenStack API’s for higher level functionality.

Containers – Docker, Google, Pivotal are allowing very clean and consistent operational deployments.

NSX – Moving security from the edge to Layer 2. Get ready to hear “Zero Trust networking”. The biggest challenge in Enterprise shops is they are going to have to define and understand their networking needs on a granular level. For once network security ability will outrun operational understanding. If your a Sysadmin today get ready to have to understand and defend every TCP connection your application makes, but take comfort in that policy engines will allow this discussion to only have to happen once.

Cloud Volumes – While I’m most excited about this as a replacement for Persona, there are so many use cases (Physical, Servers, Thinapp, Profiles, VDI) that I know its going to take some serious lab time to understand where all we can use this.

vCloud Air – In a final attempt to get SE’s everywhere to quit calling it “Veee – Cheese” VMware is re-branding the name. I was skeptical last year, but have found a lot of interest in clients in recent months as Hurricane Seasons closes in Houston

VSAN Flexability for VDI POC and Beyond

Quick thoughts on VSAN flexibility compared to the Hyper Converged offerings, and solving the “how do I do a cost effective POC, —> Pilot —> Production roll out?” without having to overbuild or forklift out undersized gear.

Traditionally I’ve not been a fan of scale out solutions because they force you to purchase storage and compute at the same time (and often in fixed ratio’s). While this makes solving capacity problems easier (Buy another node is the response to all capacity issues) you often end up with extra compute and memory to address unstructured and rarely utilized data growth. This also incurs additional per socket license fees as you get forced into buying more sockets to handle the long tail storage growth (VMware, Veeam, RedHat/Oracle/Microsoft). Likewize if storage IO is fine, your stuck still buying more to address growing memory needs.

Traditional modular non-scale out designs have the problem in that you tend to have to overbuild certain elements (Switching, Storage Controllers or Cache) up front, to hedge against costly and time consuming forklift upgrades. Scale out systems solve this, but the cost of growth can get more expensive than a lot of people like, and for the reasons listed above limit flexibility.

Here’s a quick scenario I have right now that I’m going to use VSAN to solve and cheaply scale through each phase of the projects growth. This is an architects worst nightmare. No defined performance requirements for users and poorly understood applications, and a rapid testing/growth factor where the spec’s for the final design will remain organic.

I will start with a proof of concept for VMware View for 20 users. If it meets expectations It will grow into a 200 user Pilot, and if that is liked, the next growth point can quickly reach 2000 users. I want a predictable scaling system with reduced waste, but I do not yet know the Memory/CPU/Storage IO ratio and expect to narrow down the understanding during the proof of concept and pilot. While I do not expect to need offload cards (APEX, GRID) during the early phases I want to be able to quickly add them if needed. If we do not stay ahead of performance problems, or are not able to quickly adapt to scaling issues within a few days the project will not move forward to the next phases. The datacenter we are in is very limited on rack space. power or is expensive and politically unpopular with management. I can not run blades due to cooling/power density concerns. Reducing unnecessary hardware is as much about savings on CAPEX as OPEX.

For the Proof of Concept start with a single 2RU 24 x 2.5” bay server. (Example Dell R710, or equivalent Superstorage 2027R-AR24NV 2U).

For storage, 12 x 600GB 10K drives, and a PCI-Express 400GB Intel 910 Flash drive. The intel presents 2 x200Gb LUN’s and can serve 2 x 6 disk disk groups.
A pair of 6Core 2.4Ghz Intel Processors 16 x 16GB DIMMs for memory.
For Network connectivity I will purchase 2 x 10Gbps NIC’s but likely only use GigE as the switches will not be needed to be ordered until I add more nodes.

I will bootstrap VSAN onto a single node (not a supported config, but will work for the purposes of testing in a Proof of Concept) and build out a vCenter Appliance, a single Composer, Connection and security server, and two dozen virtual machines. At this point we can begin testing, and should have a solid basis for testing Memory/CPU/Disk IOPS as well as delivering a “fast” VDI experience. If GRID is needed or a concern it can also be added to this single serve to test with and without it (as well as APEX tested for CPU offload of heavy 2D video users).

As We move into the Pilot with 200 users, we have an opportunity to adjust things. With adding 2-3 more nodes we can also expand the disks by doubling the number of spindles to 24, or keep the disk size/flash amount at existing ratio’s. If Compute is heavier than memory we can back down to 128GB (cannibalize half the dimms in the first node even) or even adjust to more core’s or offload cards. At this point we have the base cluster of 3-4 nodes with disk, we can get a bit more radical in future adjustments. At this point 10Gbps or Infiniband switching will need to be purchased. Existing stacked switches though may have enough interfaces to avoiding having to buy new switches or modules for chassis.

As we move into production and nodes 4-8 use 1000 VM’s and up the benefits of VSAN really shine. If we are happy with the disk performance of the first nodes, we can simply add more spindles and flash to the first servers. If we do not need offload cards, dense TWIN Servers, Dell C6000, or HP Sl2500t can be used to provide disk-less nodes. If we find we have more complicated needs, we can resume expanding with the larger 2RU boxes. Ideally we can use the smaller nodes to improve density going forward. At this point we should have a better understanding of how many nodes we will need for full scaling and have desktops from the various user communities represented and be able to predict the total node count. This should allow us to size the switching purchase correctly.

vSAN the cure for persistant VDI technical and political chalenges.

What do you mean I have to redesign my entire storage platform just so a user can install an application!?!
What do you mean my legacy array/vendor is not well suited for VDI!
“Am I going to have to do a forklift upgrade to every time I want to add xxx number of users?”
“Do I really want to have one mammoth VSP/VMAX serving that many thousand desktops and acting as a failure domain for that many users?”

I’m sure other VDI Architects and SE’s in the field have had these conversations, and its always an awkward one that needs some quick white boarding to clear up. Often times this conversation is after someone has promised users that this is a simple change of a drop down menu, or after it has been implemented and is filling up storage and bringing the array to its knees. At this point the budget is all gone and the familiar smell of shame and disappointment is filling the data center as you are asked to pull off a herculean task of making a broken system work to fulfill promises that never should have been made. To make this worse broken procurement process’s often severely limit getting the right design or gear to make this work.

We’ve worked around this in the past by using software (Atlantis) or design changes (Pod design using smaller arrays) but Ultimately we have been trying to cram a round peg (Modular design storage and non-persistent desktops) into a square hole (Scale out bursting random writes, and users expecting zero loss of functionality). We’ve rationalized these decisions, (Non-Persistent changes how we manage desktops for the better!) but ultimately if VDI is going to grow out of being a niche technology it needs and architecture that supports the existing use cases as well as the new ones. Other challenges include environments trying to cut corners or deploy systems that will not scale because a small pilot worked. (try to use the same SAN for VDI and servers until scale causes problems). Often times the storage administrators or an organization is strongly bound to a legacy or unnecessarily expensive vendor or platform (Do I really need 8 protocols, and 7 x 9’s of reliability for my VDI farm?)

The VSAN solves not only the technical challenges of persistent desktops (Capacity/performance scale out) but also solves the largest political challenge, the entrenched storage vendor.

I’ve seen many a VDI project’s cost rationalization break down because the storage admin forced it to use the existing legacy platform. This causes one of two critical problems.

1. The cost per IOPS/GB gets ugly, or requires seemingly random and unpredictable forklift upgrades with uneven scaling of cost.
2. The storage admin underestimates the performance needs and tries to shove it all on non-accelerated SATA drives.

vSAN allows the VDI team to do a couple things to solve these problems.

1. Cost Control. Storage is scaled at the same time as the hosts and desktops in a even/linear fashion. No surprise upgrades when you run out of ports/cache/storage processor power. Adjustments in IOPS to capacity can be made slowly as nodes are added, and changing basic factors like the size of drives does not require swing migrations or rip and replace of drives.

2. Agility. Storage can be purchased without going through the storage team, and the usual procurement tar pit that involves large scale array purchases. Servers can often be purchased with a simple PO from the standard vendor. Storage expansion beyond the current vendor generally requires a barking carnival of vendor pitches, and Apples to spaceship mismatched quotes and pitches In the bureaucratic government and fortune 1000 this can turn into a year long mess that ends up under delivering. Because of the object system with dynamic protection systems non-persistent disks can be deployed with RAID 0, on the same spindles

3. Risk Mitigation. A project can start with as small as three hosts. There is not an “all in” and huge commitment of resources to get the initial pilot and sizing done, and scaling is guaranteed by virtue of the scale out design.

vSphere Distributed Storage and why its not going to be “production ready” at VMworld

vSphere Distributed Storage (or vSAN) is a potentially game changing feature for VMware. Being able to run its own flash caching, auto mirroring/striping storage system that’s fully baked into the hypervisor is powerful. Given that storage is such a huge part of the build out, it makes sense that this is a market in need of disruption.

Now as we all hold our breath for VMworld I’m going to give my prediction that it will not be listed as production ready from day one and her are my reasons.

1. VMware is always cautious with new storage technologies. VMware got burned by the SCSI UNMAP fiasco, and since has been slow to release storage features dirrectly. NFS cloning for view underwent extensive testing, and tech preview status.
2. Vmware doesn’t like to release home grown products straight to production. They do this with acquisitions (mirage, View, Horizon Data, vCops) but they tread carefully with internal products. They are not Microsoft (shipping a broken snapshot feature for two versions was absurd).
3. The trust and disruption needs to happen slowly. Not everyone’s workload fits scale out, and encouraging people to “try it carefully” sets expectations right. I think it will be undersold by a lot, and talked down by a lot of vendors but ultimately people will realize that it “just works”. I’m looking for huge adoption in VDI where a single disk array often can cause awkward bottlenecks. This also blunts any criticisms from the storage vendor barking carnival, and lets support for it build up organically. Expect shops desperate for an easier cheaper way to scale out VDI, and vCloud environments turn to this. From a market side I expect an uptick in 2RU server’s being used, and the back plane network requirements pushing low latency top of rack 10Gbps switching further into mainstream for smaller shops and hosting providers who have been holding out.

These predictions I’m making are based on my own crystal ball. I’m not currently under any NDA for this product.
No clue what I’m talking about? Go check out this video

New vCloud Calculator (and my tips)

myvirtualcloud is at it again with an update to the most scientific attempt at VDI sizing ever attempted. While part of me still view VDI sizing as a dark art, I have to admit he’s thought of a lot of the little things (like how much vswap changes for 3D RAM).

Continue reading “New vCloud Calculator (and my tips)”

The Future of VMware EUC

So VMware View 5.2 is almost here, and there’s a sweeping set of changes.

First of all the primary way to buy the suite is moving from a per connection to a per named user (yes I know this is ugly for schools but I’ve been promised by project managers they are going to work around this/play lets make a deal to help people out).

Continue reading “The Future of VMware EUC”