the-vsan-build-part-1

Parts are starting to roll in for this and next weeks new project. A VSAN to take over our old lab. The SMS was getting long in the tooth, and the remaining servers where either too old, or had been hijacked for the internal VDI environment. We have been aware of this project for a few years now and have been partly sandbagging a major lab overhaul while waiting on a firm release date for this project. VMware has put out a call to arms on testing the new product and we really wanted to put it through its paces before its available to our customers.

Here’s the initial hardware spec’s (Subject to change based on things not working, or Ingram sending me the wrong part).

For Server I have three of the following
ASUS RS720-X7/RS8 2U
Intel Ethernet Converged Network Adapter X540-T1
ASUS PIKE 2008 (8 port LSI)
3 x SAMSUNG 16GB 240-Pin DDR3 SDRAM ECC Registered DDR3 1333
Intel Xeon E5-2620 Sandy Bridge-EP 2.0GHz (2.5GHz Turbo Boost) 15MB L3 Cache LGA 2011 95W Six-Core
6 x Intel RED 2TB 5400RPM SATA drives.
1 x Intel 240GB DC S3500 SSD flash drives.

For switching I have one of the following
NetGear XS712T 12 x 10Gbps RJ-45 SmartSwitch

Here’s the justification for the parts chosen, and thoughts on if this was to be more than a lab where to upgrade.

1. The Server. This was pretty much one of the cheapest 8 drive servers money could buy. Honestly Supermicro would have been a consideration except their HBA was more expensive. LFF was also a design requirement (Lab has a high capacity low IOPS need), and 8 drives was the target. 4 x 1Gbps on-board NIC’s (and a 5th for IPKVM) isn’t a bad thing to be bundled. 2RU was a requirement as it opened up additional options for FC/PCI-Flash-SAS expansion trays etc. My only complaint is the lack of an internal SD card slot. Personally I don’t enjoy 1RU pizza box servers in the lab as the fans spin a lot louder. If this was a production system wanting tier 1 hardware, a Cisco C240M3 or a Dell 720XD would be good options.

2. The Memory – Its cheap, and 144GB of RAM across the cluster should be enough to get me started. Down the road we may add more. If this was a production setup I likely wouldn’t see anything less than 128GB or 192GB per host.

3. The CPU – Our lab loads are relatively light, but I wanted something modern so I would have a baseline for VSAN CPU usage. As we scale up and need more memory slots I suspect we’ll end up putting a second CPU in. I wanted something that on reasonable VDI composer and other testing could give me a baseline so I will know how to scale CPU/Memory/IOPS ratio’s going forward.
Drives piling up!

Drives piling up!

4. The Drives – Our lab generally has a LOT of VM’s sitting around doing nothing. Because of our low IOP/GB ratio I’m violating the recommendation of 1:10 Flash to normal spinning disk. WD Reds where chosen for the cheapest price possible, while still having proper TLER settings that will not cause them to drop out randomly and cause rebuild issues. They are basically prosumer grade drive, and if this lab had anything important I would upgrade to at least a WDRE4, Hitachi UltraStar, or Seagate Constellation NL-SAS drives. If this was production I’d likely be using 10K 900GB drives as the IOPS/capacity ratio is quite a bit better. A huge part of VSAN, CBRC, vFLASH and VMware’s storage policy engine is separating performance from capacity, so I”m going to likely push flash reservations and other technologies to their limits. The flash drives chosen where Intel DC S3500 as Intel has a strong pedigree for reliability, and the DC series introduces a new standard in consistency. Even full and under load they maintain consistent IOPS. While the 3500′s endurance is decent, its not really designed for large scale production write logging. If building a production system, 3700 or even the PCI based 910 Intel drives would be a much better selection for more than just the obvious jump in performance.

5. The Network – I’m sure everyone looking at the model numbers is supremely confused. The selection really boiled down to me wanting to test more than just the VSAN and do it on a budget. I wanted to test 10Gbps RJ-45, SR-IOV, Intel CNA’s, without spending 10K on Nic’s and switches and cables. Even going to Ebay for used Arista switches wasn’t going to keep the budget low enough. Netgear’s $1500 switch, delivers $125 ports with no need for GBIC’s, and Intel’s CNA’s pack a lot of features for a third the price of its optical cousins. I’ll admit the lack of published PPS specs, and anemic buffers may come back to haunt me. I can fall back on the 5 GigE nics and my old GigE switching if I need to, and this was all too cheap to not take a pass at. For a production upgrade (and possibly to replace this thing) I would look at least a Brocade 6650 (Billion PPS) switch or maybe even a VDX 6720 if I’m wanting something a little more exciting.

vSAN the cure for persistant VDI technical and political chalenges.

vsan

What do you mean I have to redesign my entire storage platform just so a user can install an application!?!
What do you mean my legacy array/vendor is not well suited for VDI!
“Am I going to have to do a forklift upgrade to every time I want to add xxx number of users?”
“Do I really want to have one mammoth VSP/VMAX serving that many thousand desktops and acting as a failure domain for that many users?”

I’m sure other VDI Architects and SE’s in the field have had these conversations, and its always an awkward one that needs some quick white boarding to clear up. Often times this conversation is after someone has promised users that this is a simple change of a drop down menu, or after it has been implemented and is filling up storage and bringing the array to its knees. At this point the budget is all gone and the familiar smell of shame and disappointment is filling the data center as you are asked to pull off a herculean task of making a broken system work to fulfill promises that never should have been made. To make this worse broken procurement process’s often severely limit getting the right design or gear to make this work.

We’ve worked around this in the past by using software (Atlantis) or design changes (Pod design using smaller arrays) but Ultimately we have been trying to cram a round peg (Modular design storage and non-persistent desktops) into a square hole (Scale out bursting random writes, and users expecting zero loss of functionality). We’ve rationalized these decisions, (Non-Persistent changes how we manage desktops for the better!) but ultimately if VDI is going to grow out of being a niche technology it needs and architecture that supports the existing use cases as well as the new ones. Other challenges include environments trying to cut corners or deploy systems that will not scale because a small pilot worked. (try to use the same SAN for VDI and servers until scale causes problems). Often times the storage administrators or an organization is strongly bound to a legacy or unnecessarily expensive vendor or platform (Do I really need 8 protocols, and 7 x 9’s of reliability for my VDI farm?)

The VSAN solves not only the technical challenges of persistent desktops (Capacity/performance scale out) but also solves the largest political challenge, the entrenched storage vendor.

I’ve seen many a VDI project’s cost rationalization break down because the storage admin forced it to use the existing legacy platform. This causes one of two critical problems.

1. The cost per IOPS/GB gets ugly, or requires seemingly random and unpredictable forklift upgrades with uneven scaling of cost.
2. The storage admin underestimates the performance needs and tries to shove it all on non-accelerated SATA drives.

vSAN allows the VDI team to do a couple things to solve these problems.

1. Cost Control. Storage is scaled at the same time as the hosts and desktops in a even/linear fashion. No surprise upgrades when you run out of ports/cache/storage processor power. Adjustments in IOPS to capacity can be made slowly as nodes are added, and changing basic factors like the size of drives does not require swing migrations or rip and replace of drives.

2. Agility. Storage can be purchased without going through the storage team, and the usual procurement tar pit that involves large scale array purchases. Servers can often be purchased with a simple PO from the standard vendor. Storage expansion beyond the current vendor generally requires a barking carnival of vendor pitches, and Apples to spaceship mismatched quotes and pitches In the bureaucratic government and fortune 1000 this can turn into a year long mess that ends up under delivering. Because of the object system with dynamic protection systems non-persistent disks can be deployed with RAID 0, on the same spindles

3. Risk Mitigation. A project can start with as small as three hosts. There is not an “all in” and huge commitment of resources to get the initial pilot and sizing done, and scaling is guaranteed by virtue of the scale out design.

vSphere Distributed Storage and why its not going to be “production ready” at VMworld

vSphere Distributed Storage (or vSAN) is a potentially game changing feature for VMware. Being able to run its own flash caching, auto mirroring/striping storage system that’s fully baked into the hypervisor is powerful. Given that storage is such a huge part of the build out, it makes sense that this is a market in need of disruption.

Now as we all hold our breath for VMworld I’m going to give my prediction that it will not be listed as production ready from day one and her are my reasons.

1. VMware is always cautious with new storage technologies. VMware got burned by the SCSI UNMAP fiasco, and since has been slow to release storage features dirrectly. NFS cloning for view underwent extensive testing, and tech preview status.
2. Vmware doesn’t like to release home grown products straight to production. They do this with acquisitions (mirage, View, Horizon Data, vCops) but they tread carefully with internal products. They are not Microsoft (shipping a broken snapshot feature for two versions was absurd).
3. The trust and disruption needs to happen slowly. Not everyone’s workload fits scale out, and encouraging people to “try it carefully” sets expectations right. I think it will be undersold by a lot, and talked down by a lot of vendors but ultimately people will realize that it “just works”. I’m looking for huge adoption in VDI where a single disk array often can cause awkward bottlenecks. This also blunts any criticisms from the storage vendor barking carnival, and lets support for it build up organically. Expect shops desperate for an easier cheaper way to scale out VDI, and vCloud environments turn to this. From a market side I expect an uptick in 2RU server’s being used, and the back plane network requirements pushing low latency top of rack 10Gbps switching further into mainstream for smaller shops and hosting providers who have been holding out.

These predictions I’m making are based on my own crystal ball. I’m not currently under any NDA for this product.
No clue what I’m talking about? Go check out this video

VMware View Persona and HNAS BluArc

A quick note on HNAS users, looking to enable VMware Persona. While the BluARC’s are fine for offloading profiles to, by default SMB2 is disabled. Since this is a requirement for Persona to work (Well I think its required for alternate data streams to work) you will need to enable this at the CLI by using the cifs-smb2-enable command.

Also of note windows 8, will REQUIRE SMB signing if it sees SMB2, but this can be disabled with powershell (example here) or by GPO.  The HNAS 3080 and 3090 do not support code signing as of the current release (I understand its a performance issue, they are working on).  Also this is something that will need to be turned on for each EVS needing this support, and is not a global or cluster wide setting.