Skip to content

Posts from the ‘Virtualization’ Category

KeyNote Part1

VMware: EVO:RAIL – It looks like our shift to SuperMicro for VSAN was the right choice. Will be be looking for
EVO:Rack – A vBlock without limits? We will see.

OpenStack – VMware is doing a massive amounts of code push to OpenStack so OpenStack can control vSphere, NSX etc allowing for people to run VMware API’s and OpenStack API’s for higher level functionality.

Containers – Docker, Google, Pivotal are allowing very clean and consistent operational deployments.

NSX – Moving security from the edge to Layer 2. Get ready to hear “Zero Trust networking”. The biggest challenge in Enterprise shops is they are going to have to define and understand their networking needs on a granular level. For once network security ability will outrun operational understanding. If your a Sysadmin today get ready to have to understand and defend every TCP connection your application makes, but take comfort in that policy engines will allow this discussion to only have to happen once.

Cloud Volumes – While I’m most excited about this as a replacement for Persona, there are so many use cases (Physical, Servers, Thinapp, Profiles, VDI) that I know its going to take some serious lab time to understand where all we can use this.

vCloud Air – In a final attempt to get SE’s everywhere to quit calling it “Veee – Cheese” VMware is re-branding the name. I was skeptical last year, but have found a lot of interest in clients in recent months as Hurricane Seasons closes in Houston

VMworld Day 1

I’m looking forward to this week and here are a few highlights of what I’ll be looking into.

On the tactical

1. Settling on a primary load balancing partner for VMware View. (Eying Kemp, anyone have any thoughts?). I’ve got a number of smaller deployments (few hundred users) that need non-disruptive maintenance operations, and patching on the infrastructure and are looking to take their smaller pilots or deployments forward.

2. Learn more about VDP-A designs, and best practices. I’ve seen some issues in the lab with snapshots not getting removed from the appliance and need to understand the scaling and design considerations better.

3. Check out some of the HOL updates. Find out if a VVOL lab in the office is worth the investment.

On the more general strategic goals.

Check out cutting edge vendors, and technologies from VMware.

CloudVolumes – More than just application layering. Server application delivery, Profile abstraction thats fast and portable, and a serious uplift to persona and ThinApp. Really interested in use cases having it used as a delivery method for Thinapp.

DataGravity – In the era of software defined storage this is a company making a case that an array can provide a lot of value still. Very interesting technology but the questions remain. Does it work? Does it Scale? Will they add more file systems, and how soon will EMC/HP buy them to bolt this logic into their Tier 1 arrays. Martin Glassborow has made a lot of statements that new vendors don’t do enough to differentiate, or that we’ve reached peak features (Snaps, replication, cache/tier flash, data reduction etc) but its interesting to see someone potentially breaking outside of this mold of just doing the same thing a little better or cheaper.

VSAN – Who is Marvin? What happened to Virsto? I’ve got questions and I hope someone has answers!

PSA: Developers and SQL admins do not understand storage

Thin Provisioning is one of my favorite technologies, but with all great technology comes great responsibility.

This afternoon I got a call from a customer having an issue with a SQL backup. They were preparing a major code push and were running a scripted full SQL backup to have a quick restore point if something goes wrong.
I was sent the following

10 percent processed.
20 percent processed.
30 percent processed.
40 percent processed.
50 percent processed.
60 percent processed.
70 percent processed.
80 percent processed.
90 percent processed.
Msg 64, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available.)

The server had frozen from a thin provisioning issue, but tracing through the workflow that caused this highlighted a common problem of SQL administrators everywhere. Backups where being done to the same volume/VMDK as the actual database. For every 1GB of SQL database there was another 10GB of backups, wasting expensive tier 1 storage.

The Problem:

SQL developers LOVE to make backups at the application level that they can touch/see/understand. They do not trust your magical Veeam/VDP-A. Combined with NTFS being a relatively thin unfriendly file system (always writing to new LBA’s when possible) this means that even if a database isn’t growing much, if backups get placed on the same volume any attempt at being thin even on the back end array is going to require extra effort to reclaim. They also do not understand the concept of a shared failure domain, or data locality. If left to their own devices put the backups on the same RAID group of expensive 15K or flash drives, and go so far as to put it on the same volume/VMDK even if possible. Outside of the obvious problems for performance, risk, cost, and management overhead, this also means that your Changed Block Tracking and backup software is going to be baking up (or having to at least scan) all of these full backups every day.

The Solution:

Give up on arguing with them that your managed backups are good enough. Let them have their cake, but at least pick where the cake comes from and goes.
Create a VMDK on a separate array (in a small shop something as cheap as a SATA backed Synology can provide a really cheap NFS/iSCSI target for this). Exclude this drive from your backups (or adjust when it runs so it doesn’t impact your backup windows).
Careful explain to them that this new VMDK (name the volume backups) is where backups go.
Now accept that they will ignore this and keep doing what they have been doing.
In Windows turn on file screens and block the file extensions for SQL backup files.
Next turn on reporting alerts to email you anytime someone tries to write such a file, so you’ll be able to preemptively offer to help them setup the maintence jobs so they will work.

Migrate from Windows to Linux Appliance vCenter Server

A quick post here for people migrating to the VCSA. I just wanted to point out that the Inventory Snapshot tool at VMware Flings is a great way to ease the migration from a Windows to a Linux vCenter Server or help “backup” the configuration of a vCenter Server. It doesn’t get everything You’ll still want to backup and restore distributing switching especially, as well as be aware you’ll loose historical performance information but this does simplify a lot of other re-work that would normally need to be done for the migration. The following does need to be redone or migrated separately but at least this can help quite a bit.

– Cluster rules
– Cluster DRS groups
– Cluster EVC mode setting
– Customization Specifications
– Scheduled tasks
– vDS

Are containers our future?

This is a quick post in reaction to Alex Benik’s post at Gigaom. While I like Gigaom’s commentary on the industry at large, they really don’t seem to understand infastruture always. Alex starts out by stating that the current industry practice of separating out applications with their own dedicated OS instances, and having low utilization is a terrible problem. He almost paints hypervisors as part of the problem. he cites 7% CPU usage on EC2 instances as a key example of what is wrong with virtualization and usage.

I’ve got a few quick thoughts on this.

1. The reason Amazon EC2 can be so cheap is because Amazon can over subscribe instances heavily. Low average CPU workloads is the foundation for virtualization and all kinds of other industries (Shared web hosting etc). He’s turned the reason for virtualization being a great cost saving technology into a problem that needs to be solved. If everyone was running them 100% all the time then there would be a problem.

2. He’s assuming that CPU is the primary bottleneck. As others (Jonathan Frappier) have pointed out that storage is often the bottleneck. There comes a point where you can only get so much disk IO to a virtual machine. In large enterprises with Shared Storage Arrays, eventually bottlenecks in storage IO (Queue Depths on HBA’s, LUN’s etc) start to crop up, and eventually it becomes easier to scale out to more hosts, than try to scale deep. VMware and others have created technologies (CBRC, vfrc, vSAN) that will help this. Also memory is helping and hurting this density problem.

3. Until the recent era of large memory hosts, memory was often a bottleneck. As 64 bit databases and applications became ever hungrier to cache data locally this waged a 2 factor war on CPU utilization. Hosts with VM’s with 16GB of ram quickly ran out of RAM before they ran out of CPU. Also Memory and disk IO subtly influence CPU in ways you don’t factor. Once memory is exhausted on a host, and over subscription is occurring. CPU usage can spike, as process’s take longer to finish. In memory workloads allow CPU’s to process data quicker, and jobs finish sooner. Vendor recommendations for ridiculous memory allocations don’t help the solution either (I’m looking at you Sage). When Vendors recommend 64GB of RAM for a database server serving 150 users, its become clear that SQL monkeys everywhere have given up on actually doing proper indexing or archiving and instead are relying on memory cache. This demand on memory causes hosts to fill up long before CPU usage can become a problem unless managers are willing to trust a balloon driver to intelligently swap out the “memory bloat”. (Internally and with customer vCloud deployments I’ve seen much better utilization by oversubscribing memory 2 or 3 times). This is not a bad thing unless an application has scale. (its now cheaper to throw hardware at the problem than write proper code/index/optimizations).

4. He’s also forgetting the reason we went with virtualization in the first place. To separate out applications so that we could update them independently from each other. No longer running into issues where rebooting a server to fix one application caused another application to go down. Anyone who’s worked in shared tenant container hosting can tell you that its not really that great. Comparability matrix’s, larger failure zones and all kinds of problems can come up. For homogenous web hosting its a fine solution. For the enterprise trying to mix diverse workloads it can be a nightmare. We use BSD containers internally for some websites, but beyond that we stick to the hypervisors, as a more general use, stable and easier to support platform. I’d argue JEOS, vFabric and other stripped down VM approaches are a better solution as they enforce instance isolation while giving us massive efficiency gains from the kitchen sync (I’m going to call out websphere on this)deployments of old.

Getting the ratio of CPU to Memory to Disk IO and capacity is hard. Painfully hard. Given that CPU is often one of the cheapest components (and most annoying to try to upgrade) its no wonder that IT managers everywhere who come from a history of CPU’s being the bottlenecks often get a little out of hand in overkill with CPU purchasing. I’ve been in a lot of meetings where I’ve had to argue with even internal IT staff that more CPU isn’t the solution (The graphs don’t lie!) while disk latency is out the roof. I’d strangely argue a current move to scale out (Nutanix/VSAN etc) might fix a lot of broken purchasing decisions (LOTS of CPU, low memory, disk IO).


I got the final parts (well enough to boot strap things) on Thursday so the building has begun.

A couple quick observations on the switch, and getting VSAN up and ready for vCenter Server.

NetGear XS712T

1. Just because you mark a port as Untagged doesn’t mean anything. To have your laptop be able to manage on a non-default VLAN you’ll need to setup a PVID (Primary VLAN ID) to the VLAN you want to use for management. Also management can only be done on a single IP/VLAN so make sure to setup a port with a PVID on that VLAN before you change it (otherwise its time for the reset switch).


2. Mac users should be advised that you can tag VLAN’s and create an unlimited number of virtual interfaces even on a thunderbolt adapter. Handy when using non-default VLANs for configuration. Click the plus sign in the bottom left corner of the network control panel to make a new interface, and then select the gear to manage it and change the VLAN.

3. It will negotiate 10Gig on a Cat5e cable (I’m going to go by Fry’s and get some better cables at some point here before benchmarking).

Its trivial to setup a single host deployment.
First create the VSAN.
esxcli vsan cluster join -u bef029d5-803a-4187-920b-88a365788b12
(Alternatively you can go generate your own unique UUID)
Next up find the NAA on a normal disk, and a SSD by running this command.
esxcli storage core device list
Next up add the disks to the VSAN.
esxcli vsan storage add -d naa.50014ee058fdb53a -s naa.50015178f3682a73
After this you’ll want to add a VMkernel for VSAN and add some hosts, but with these commands you can have a one node system up ready for vCenter Server installation in under 15 minutes.

For this lab I’ll be using the vCenter Server Appliance.

After installing the OVA you’ll want to run the setup script. You will need to first login to the command line interface. Mac users be warned, mashing the command key will send you to a different TTY.
The login is root/vmware. From the console run the network setup script. /opt/vmware/share/vami/vami_set_network
It can be run with parameters attached to more quickly setup.
/opt/vmware/share/vami/vami_set_network eth0 STATICV4
After doing this you can login in your browser using HTTPS and port 5480 and finish the setup. Example (

vSAN the cure for persistant VDI technical and political chalenges.


What do you mean I have to redesign my entire storage platform just so a user can install an application!?!
What do you mean my legacy array/vendor is not well suited for VDI!
“Am I going to have to do a forklift upgrade to every time I want to add xxx number of users?”
“Do I really want to have one mammoth VSP/VMAX serving that many thousand desktops and acting as a failure domain for that many users?”

I’m sure other VDI Architects and SE’s in the field have had these conversations, and its always an awkward one that needs some quick white boarding to clear up. Often times this conversation is after someone has promised users that this is a simple change of a drop down menu, or after it has been implemented and is filling up storage and bringing the array to its knees. At this point the budget is all gone and the familiar smell of shame and disappointment is filling the data center as you are asked to pull off a herculean task of making a broken system work to fulfill promises that never should have been made. To make this worse broken procurement process’s often severely limit getting the right design or gear to make this work.

We’ve worked around this in the past by using software (Atlantis) or design changes (Pod design using smaller arrays) but Ultimately we have been trying to cram a round peg (Modular design storage and non-persistent desktops) into a square hole (Scale out bursting random writes, and users expecting zero loss of functionality). We’ve rationalized these decisions, (Non-Persistent changes how we manage desktops for the better!) but ultimately if VDI is going to grow out of being a niche technology it needs and architecture that supports the existing use cases as well as the new ones. Other challenges include environments trying to cut corners or deploy systems that will not scale because a small pilot worked. (try to use the same SAN for VDI and servers until scale causes problems). Often times the storage administrators or an organization is strongly bound to a legacy or unnecessarily expensive vendor or platform (Do I really need 8 protocols, and 7 x 9’s of reliability for my VDI farm?)

The VSAN solves not only the technical challenges of persistent desktops (Capacity/performance scale out) but also solves the largest political challenge, the entrenched storage vendor.

I’ve seen many a VDI project’s cost rationalization break down because the storage admin forced it to use the existing legacy platform. This causes one of two critical problems.

1. The cost per IOPS/GB gets ugly, or requires seemingly random and unpredictable forklift upgrades with uneven scaling of cost.
2. The storage admin underestimates the performance needs and tries to shove it all on non-accelerated SATA drives.

vSAN allows the VDI team to do a couple things to solve these problems.

1. Cost Control. Storage is scaled at the same time as the hosts and desktops in a even/linear fashion. No surprise upgrades when you run out of ports/cache/storage processor power. Adjustments in IOPS to capacity can be made slowly as nodes are added, and changing basic factors like the size of drives does not require swing migrations or rip and replace of drives.

2. Agility. Storage can be purchased without going through the storage team, and the usual procurement tar pit that involves large scale array purchases. Servers can often be purchased with a simple PO from the standard vendor. Storage expansion beyond the current vendor generally requires a barking carnival of vendor pitches, and Apples to spaceship mismatched quotes and pitches In the bureaucratic government and fortune 1000 this can turn into a year long mess that ends up under delivering. Because of the object system with dynamic protection systems non-persistent disks can be deployed with RAID 0, on the same spindles

3. Risk Mitigation. A project can start with as small as three hosts. There is not an “all in” and huge commitment of resources to get the initial pilot and sizing done, and scaling is guaranteed by virtue of the scale out design.

vSphere Distributed Storage and why its not going to be “production ready” at VMworld

vSphere Distributed Storage (or vSAN) is a potentially game changing feature for VMware. Being able to run its own flash caching, auto mirroring/striping storage system that’s fully baked into the hypervisor is powerful. Given that storage is such a huge part of the build out, it makes sense that this is a market in need of disruption.

Now as we all hold our breath for VMworld I’m going to give my prediction that it will not be listed as production ready from day one and her are my reasons.

1. VMware is always cautious with new storage technologies. VMware got burned by the SCSI UNMAP fiasco, and since has been slow to release storage features dirrectly. NFS cloning for view underwent extensive testing, and tech preview status.
2. Vmware doesn’t like to release home grown products straight to production. They do this with acquisitions (mirage, View, Horizon Data, vCops) but they tread carefully with internal products. They are not Microsoft (shipping a broken snapshot feature for two versions was absurd).
3. The trust and disruption needs to happen slowly. Not everyone’s workload fits scale out, and encouraging people to “try it carefully” sets expectations right. I think it will be undersold by a lot, and talked down by a lot of vendors but ultimately people will realize that it “just works”. I’m looking for huge adoption in VDI where a single disk array often can cause awkward bottlenecks. This also blunts any criticisms from the storage vendor barking carnival, and lets support for it build up organically. Expect shops desperate for an easier cheaper way to scale out VDI, and vCloud environments turn to this. From a market side I expect an uptick in 2RU server’s being used, and the back plane network requirements pushing low latency top of rack 10Gbps switching further into mainstream for smaller shops and hosting providers who have been holding out.

These predictions I’m making are based on my own crystal ball. I’m not currently under any NDA for this product.
No clue what I’m talking about? Go check out this video

The Future of VMware EUC

So VMware View 5.2 is almost here, and there’s a sweeping set of changes.

First of all the primary way to buy the suite is moving from a per connection to a per named user (yes I know this is ugly for schools but I’ve been promised by project managers they are going to work around this/play lets make a deal to help people out).

Read more