Skip to content

Posts from the ‘Virtualization’ Category

LSI 2008 Dell H310 VSAN rebuild performance concerns

Just a quick note for anyone seeing VSAN performance issues with Dell H310 or LSI 2008 controllers. Its not a secret that the LSI 2008 and H310 Dell with stock firmwares have a very shallow queue depth of 25 (a LSI 2208 in comparison is 600, and a Dell H910 is 975). These are some of the weakest cards to be certified on the HCL and for a small ROBO deployment, or a low VM count with low contention should be fine. Remember part of the benefit of SDS is you can scale down.

For a quick look at firmware queue depths check out Duncan’s article on this.

Now one user tried to push things a little to far, running 5 hosts, with only 3 with storage with 70+ VM’s using Dell H310’s. Performance and experience was fine, until he lost one node and a rebuild kicked off. Running 70 VM’s on 2 hosts combined with the replication overhead was too much and caused an interruption of service (but no data corruption). VMware support tied it back to poor performance of the H310 and heavy load on a degraded 2 Node system trying to rebuild.

In Synchronet’s Lab’s one my earlier VSAN lab builds ran into some odd write latency, that I had initially suspected was the result of a excessively cheap 10Gbps switch. While building out a validation of a solidly performing SMB bundle this week I sought out to get to the root cause. A quick test last Friday (running the vCenter Appliance install wizard) showed at ~250 IOPS a perfectly good Intel S3700 200GB SSD drive spiking over 30ms of write latency and continuing upward. This test was performed against a regular SSD and not a VSAN based datastore isolating the issue. Previous IOMeter workloads showed thousands of read IOPS but write IOPS choking pretty quickly with high latency and low cache rates. Subsequent lab equipment did not have this issue, but had used newer switches muddling the root cause.

We have upgraded our LSI cards to the current firmware, and as of this evening confirmed that the queue depth on the 2008 is increased to 600. Now technically while not supported the H310 is an LSI 2008 controller, and if this is for a lab or a POC or your feeling brave you can follow this guide here. (Note this is not endorsed or supported by VMware/Dell). I’ll see how far I can push writes this afternoon but I expect this should have made our similar issues go away. Alternatively upgrading to something with a queue depth of at least 600 should help fix this (its generally ~$150 per host for a mediocre HBA/RAID Controller that supports decent queue depths).

This is a quick flash (And reminder) of why its a good idea to work with a VSAN partner who validates their builds, understands the technology, and is ready to support you from architecture to implementation (Ok thats my quick advertisement). One thing I am offering right now is anyone interested in a quick architecture call, and a setup of the VSAN assessment tool (TM) I can hopefully help get you the raw information you need to make intelligent decisions on things like Queue Depth, and cache size, and understanding things like the data skew that is important to setting a foundation for a solid VSAN design.

As with any new technology I encourage everyone to do their homework. There’s a lot of FUD going around (and just lack of training and knowledge from a lot of vendors) and issues like this is part of why I’ve been happy to work with VMware’s software defined storage team and the hardware OEM’s for our customer builds. Greater flexibility and response on helpful information (like JBOD mode on LSI 2208’s in a previous post, or adjustments to the HCL, or documentation and firmwares to fix the queue depth issue). For those of you looking for a quick summery. VSAN is a great product and very powerful. Remember a configuration that would be fine for 5 VM’s in a branch isn’t quite going to look the same as for 100 VM’s in a datacenter, or 1000 in a VDI farm. Trying to build the cheapest VSAN configuration that has enough capacity should not be a goal, and cutting corners in the wrong areas can sneak up on you. I do expect some updated guidance from support and the HCL on this. I am hearing 256 but realistically given how cheap the 600 Queue Depth, and how annoying it is to swap out an HBA and re-cable things I’d encourage everyone to start at this point if it makes sense.

Special thanks to…

VMware – for getting a quick Root Cause Analysis, and from reading the story provided a steady voice of reason on support so nothing further drastic was done and stayed involved until the situation was resolved.
JasonGill – for providing us a good read and not jumping to conclusions that this was a software bug.
Brandon Wardlaw – for braving the controller upgrades and somehow not bricking any of my old lab gear.
LSI – For making an updated firmware for their old controllers and not hiding it.

Note, Expect follow up posts and edits. As I get more benchmarks and numbers about various queue depth’s I’ll post them (or if anyone has any send them to me!). I’m also going to be benchmarking some different switches (Cisco, Juniper, Brocade, Netgear) over the following weeks and hopefully if I have time publish some of our results on things like a 140mpps 1Gbps Brocade vs. terrifyingly cheap Netgear 10Gbps switch.

*UPDATE*

We did some testing in the lab without VSAN, just doing a basic vCenter Server install to SSDs (Intel 3500) on this datastore. With the 25 queue depth default firmware we saw 30ms of write latency at 352 write iops. Using an LSI based upgrade firmware with 600 queue depth we pushed 1500 iops at a sub 1ms of latency (.17 ms). Its pretty clear that outside of light/ROBO usage the H310 controller is unsuitable for VSAN usage until Dell supports the LSI code upgrade. What is very concerning is that 4 out of the8 of the Dell VSAN nodes are based on this underpowered HBA. This includes one with 15K drives that I’m guessing is meant to be a performance option. This raises a few questions.

1. Does Dell have a firmware upgrade from LSI to push out (We know it exists) to help resolve this.
2. Did Dell run any benchmarks or have any storage architects look at these configs before they submitted them as a VSAN ready build?
3. I’m hearing reports from the field that Dell reps are saying that VSAN isn’t supported for production workloads. Is this underpowered config partly to blame for this?

DellVSANhuh

PSA: Developers and SQL admins do not understand storage

Thin Provisioning is one of my favorite technologies, but with all great technology comes great responsibility.

This afternoon I got a call from a customer having an issue with a SQL backup. They were preparing a major code push and were running a scripted full SQL backup to have a quick restore point if something goes wrong.
I was sent the following


10 percent processed.
20 percent processed.
30 percent processed.
40 percent processed.
50 percent processed.
60 percent processed.
70 percent processed.
80 percent processed.
90 percent processed.
Msg 64, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available.)

The server had frozen from a thin provisioning issue, but tracing through the workflow that caused this highlighted a common problem of SQL administrators everywhere. Backups where being done to the same volume/VMDK as the actual database. For every 1GB of SQL database there was another 10GB of backups, wasting expensive tier 1 storage.

The Problem:

SQL developers LOVE to make backups at the application level that they can touch/see/understand. They do not trust your magical Veeam/VDP-A. Combined with NTFS being a relatively thin unfriendly file system (always writing to new LBA’s when possible) this means that even if a database isn’t growing much, if backups get placed on the same volume any attempt at being thin even on the back end array is going to require extra effort to reclaim. They also do not understand the concept of a shared failure domain, or data locality. If left to their own devices put the backups on the same RAID group of expensive 15K or flash drives, and go so far as to put it on the same volume/VMDK even if possible. Outside of the obvious problems for performance, risk, cost, and management overhead, this also means that your Changed Block Tracking and backup software is going to be baking up (or having to at least scan) all of these full backups every day.

The Solution:

Give up on arguing with them that your managed backups are good enough. Let them have their cake, but at least pick where the cake comes from and goes.
Create a VMDK on a separate array (in a small shop something as cheap as a SATA backed Synology can provide a really cheap NFS/iSCSI target for this). Exclude this drive from your backups (or adjust when it runs so it doesn’t impact your backup windows).
Careful explain to them that this new VMDK (name the volume backups) is where backups go.
Now accept that they will ignore this and keep doing what they have been doing.
In Windows turn on file screens and block the file extensions for SQL backup files.
Next turn on reporting alerts to email you anytime someone tries to write such a file, so you’ll be able to preemptively offer to help them setup the maintence jobs so they will work.

Migrate from Windows to Linux Appliance vCenter Server

A quick post here for people migrating to the VCSA. I just wanted to point out that the Inventory Snapshot tool at VMware Flings is a great way to ease the migration from a Windows to a Linux vCenter Server or help “backup” the configuration of a vCenter Server. It doesn’t get everything You’ll still want to backup and restore distributing switching especially, as well as be aware you’ll loose historical performance information but this does simplify a lot of other re-work that would normally need to be done for the migration. The following does need to be redone or migrated separately but at least this can help quite a bit.

– Cluster rules
– Cluster DRS groups
– Cluster EVC mode setting
– Customization Specifications
– Scheduled tasks
– vDS

https://labs.vmware.com/flings/inventorysnapshot

Are containers our future?

This is a quick post in reaction to Alex Benik’s post at Gigaom. While I like Gigaom’s commentary on the industry at large, they really don’t seem to understand infastruture always. Alex starts out by stating that the current industry practice of separating out applications with their own dedicated OS instances, and having low utilization is a terrible problem. He almost paints hypervisors as part of the problem. he cites 7% CPU usage on EC2 instances as a key example of what is wrong with virtualization and usage.

I’ve got a few quick thoughts on this.

1. The reason Amazon EC2 can be so cheap is because Amazon can over subscribe instances heavily. Low average CPU workloads is the foundation for virtualization and all kinds of other industries (Shared web hosting etc). He’s turned the reason for virtualization being a great cost saving technology into a problem that needs to be solved. If everyone was running them 100% all the time then there would be a problem.

2. He’s assuming that CPU is the primary bottleneck. As others (Jonathan Frappier) have pointed out that storage is often the bottleneck. There comes a point where you can only get so much disk IO to a virtual machine. In large enterprises with Shared Storage Arrays, eventually bottlenecks in storage IO (Queue Depths on HBA’s, LUN’s etc) start to crop up, and eventually it becomes easier to scale out to more hosts, than try to scale deep. VMware and others have created technologies (CBRC, vfrc, vSAN) that will help this. Also memory is helping and hurting this density problem.

3. Until the recent era of large memory hosts, memory was often a bottleneck. As 64 bit databases and applications became ever hungrier to cache data locally this waged a 2 factor war on CPU utilization. Hosts with VM’s with 16GB of ram quickly ran out of RAM before they ran out of CPU. Also Memory and disk IO subtly influence CPU in ways you don’t factor. Once memory is exhausted on a host, and over subscription is occurring. CPU usage can spike, as process’s take longer to finish. In memory workloads allow CPU’s to process data quicker, and jobs finish sooner. Vendor recommendations for ridiculous memory allocations don’t help the solution either (I’m looking at you Sage). When Vendors recommend 64GB of RAM for a database server serving 150 users, its become clear that SQL monkeys everywhere have given up on actually doing proper indexing or archiving and instead are relying on memory cache. This demand on memory causes hosts to fill up long before CPU usage can become a problem unless managers are willing to trust a balloon driver to intelligently swap out the “memory bloat”. (Internally and with customer vCloud deployments I’ve seen much better utilization by oversubscribing memory 2 or 3 times). This is not a bad thing unless an application has scale. (its now cheaper to throw hardware at the problem than write proper code/index/optimizations).

4. He’s also forgetting the reason we went with virtualization in the first place. To separate out applications so that we could update them independently from each other. No longer running into issues where rebooting a server to fix one application caused another application to go down. Anyone who’s worked in shared tenant container hosting can tell you that its not really that great. Comparability matrix’s, larger failure zones and all kinds of problems can come up. For homogenous web hosting its a fine solution. For the enterprise trying to mix diverse workloads it can be a nightmare. We use BSD containers internally for some websites, but beyond that we stick to the hypervisors, as a more general use, stable and easier to support platform. I’d argue JEOS, vFabric and other stripped down VM approaches are a better solution as they enforce instance isolation while giving us massive efficiency gains from the kitchen sync (I’m going to call out websphere on this)deployments of old.

Getting the ratio of CPU to Memory to Disk IO and capacity is hard. Painfully hard. Given that CPU is often one of the cheapest components (and most annoying to try to upgrade) its no wonder that IT managers everywhere who come from a history of CPU’s being the bottlenecks often get a little out of hand in overkill with CPU purchasing. I’ve been in a lot of meetings where I’ve had to argue with even internal IT staff that more CPU isn’t the solution (The graphs don’t lie!) while disk latency is out the roof. I’d strangely argue a current move to scale out (Nutanix/VSAN etc) might fix a lot of broken purchasing decisions (LOTS of CPU, low memory, disk IO).

the-vsan-build-part-2

I got the final parts (well enough to boot strap things) on Thursday so the building has begun.

A couple quick observations on the switch, and getting VSAN up and ready for vCenter Server.

NetGear XS712T

1. Just because you mark a port as Untagged doesn’t mean anything. To have your laptop be able to manage on a non-default VLAN you’ll need to setup a PVID (Primary VLAN ID) to the VLAN you want to use for management. Also management can only be done on a single IP/VLAN so make sure to setup a port with a PVID on that VLAN before you change it (otherwise its time for the reset switch).

VLANThunderbolt

2. Mac users should be advised that you can tag VLAN’s and create an unlimited number of virtual interfaces even on a thunderbolt adapter. Handy when using non-default VLANs for configuration. Click the plus sign in the bottom left corner of the network control panel to make a new interface, and then select the gear to manage it and change the VLAN.

3. It will negotiate 10Gig on a Cat5e cable (I’m going to go by Fry’s and get some better cables at some point here before benchmarking).

VSAN/VCenter.
Its trivial to setup a single host deployment.
First create the VSAN.
esxcli vsan cluster join -u bef029d5-803a-4187-920b-88a365788b12
(Alternatively you can go generate your own unique UUID)
Next up find the NAA on a normal disk, and a SSD by running this command.
esxcli storage core device list
Next up add the disks to the VSAN.
esxcli vsan storage add -d naa.50014ee058fdb53a -s naa.50015178f3682a73
After this you’ll want to add a VMkernel for VSAN and add some hosts, but with these commands you can have a one node system up ready for vCenter Server installation in under 15 minutes.

For this lab I’ll be using the vCenter Server Appliance.

After installing the OVA you’ll want to run the setup script. You will need to first login to the command line interface. Mac users be warned, mashing the command key will send you to a different TTY.
The login is root/vmware. From the console run the network setup script. /opt/vmware/share/vami/vami_set_network
It can be run with parameters attached to more quickly setup.
/opt/vmware/share/vami/vami_set_network eth0 STATICV4 172.16.44.100 255.255.255.0 172.16.44.1
After doing this you can login in your browser using HTTPS and port 5480 and finish the setup. Example (https://172.16.55.100:5480).

vSAN the cure for persistant VDI technical and political chalenges.

vsan

What do you mean I have to redesign my entire storage platform just so a user can install an application!?!
What do you mean my legacy array/vendor is not well suited for VDI!
“Am I going to have to do a forklift upgrade to every time I want to add xxx number of users?”
“Do I really want to have one mammoth VSP/VMAX serving that many thousand desktops and acting as a failure domain for that many users?”

I’m sure other VDI Architects and SE’s in the field have had these conversations, and its always an awkward one that needs some quick white boarding to clear up. Often times this conversation is after someone has promised users that this is a simple change of a drop down menu, or after it has been implemented and is filling up storage and bringing the array to its knees. At this point the budget is all gone and the familiar smell of shame and disappointment is filling the data center as you are asked to pull off a herculean task of making a broken system work to fulfill promises that never should have been made. To make this worse broken procurement process’s often severely limit getting the right design or gear to make this work.

We’ve worked around this in the past by using software (Atlantis) or design changes (Pod design using smaller arrays) but Ultimately we have been trying to cram a round peg (Modular design storage and non-persistent desktops) into a square hole (Scale out bursting random writes, and users expecting zero loss of functionality). We’ve rationalized these decisions, (Non-Persistent changes how we manage desktops for the better!) but ultimately if VDI is going to grow out of being a niche technology it needs and architecture that supports the existing use cases as well as the new ones. Other challenges include environments trying to cut corners or deploy systems that will not scale because a small pilot worked. (try to use the same SAN for VDI and servers until scale causes problems). Often times the storage administrators or an organization is strongly bound to a legacy or unnecessarily expensive vendor or platform (Do I really need 8 protocols, and 7 x 9’s of reliability for my VDI farm?)

The VSAN solves not only the technical challenges of persistent desktops (Capacity/performance scale out) but also solves the largest political challenge, the entrenched storage vendor.

I’ve seen many a VDI project’s cost rationalization break down because the storage admin forced it to use the existing legacy platform. This causes one of two critical problems.

1. The cost per IOPS/GB gets ugly, or requires seemingly random and unpredictable forklift upgrades with uneven scaling of cost.
2. The storage admin underestimates the performance needs and tries to shove it all on non-accelerated SATA drives.

vSAN allows the VDI team to do a couple things to solve these problems.

1. Cost Control. Storage is scaled at the same time as the hosts and desktops in a even/linear fashion. No surprise upgrades when you run out of ports/cache/storage processor power. Adjustments in IOPS to capacity can be made slowly as nodes are added, and changing basic factors like the size of drives does not require swing migrations or rip and replace of drives.

2. Agility. Storage can be purchased without going through the storage team, and the usual procurement tar pit that involves large scale array purchases. Servers can often be purchased with a simple PO from the standard vendor. Storage expansion beyond the current vendor generally requires a barking carnival of vendor pitches, and Apples to spaceship mismatched quotes and pitches In the bureaucratic government and fortune 1000 this can turn into a year long mess that ends up under delivering. Because of the object system with dynamic protection systems non-persistent disks can be deployed with RAID 0, on the same spindles

3. Risk Mitigation. A project can start with as small as three hosts. There is not an “all in” and huge commitment of resources to get the initial pilot and sizing done, and scaling is guaranteed by virtue of the scale out design.

vSphere Distributed Storage and why its not going to be “production ready” at VMworld

vSphere Distributed Storage (or vSAN) is a potentially game changing feature for VMware. Being able to run its own flash caching, auto mirroring/striping storage system that’s fully baked into the hypervisor is powerful. Given that storage is such a huge part of the build out, it makes sense that this is a market in need of disruption.

Now as we all hold our breath for VMworld I’m going to give my prediction that it will not be listed as production ready from day one and her are my reasons.

1. VMware is always cautious with new storage technologies. VMware got burned by the SCSI UNMAP fiasco, and since has been slow to release storage features dirrectly. NFS cloning for view underwent extensive testing, and tech preview status.
2. Vmware doesn’t like to release home grown products straight to production. They do this with acquisitions (mirage, View, Horizon Data, vCops) but they tread carefully with internal products. They are not Microsoft (shipping a broken snapshot feature for two versions was absurd).
3. The trust and disruption needs to happen slowly. Not everyone’s workload fits scale out, and encouraging people to “try it carefully” sets expectations right. I think it will be undersold by a lot, and talked down by a lot of vendors but ultimately people will realize that it “just works”. I’m looking for huge adoption in VDI where a single disk array often can cause awkward bottlenecks. This also blunts any criticisms from the storage vendor barking carnival, and lets support for it build up organically. Expect shops desperate for an easier cheaper way to scale out VDI, and vCloud environments turn to this. From a market side I expect an uptick in 2RU server’s being used, and the back plane network requirements pushing low latency top of rack 10Gbps switching further into mainstream for smaller shops and hosting providers who have been holding out.

These predictions I’m making are based on my own crystal ball. I’m not currently under any NDA for this product.
No clue what I’m talking about? Go check out this video

The Future of VMware EUC

So VMware View 5.2 is almost here, and there’s a sweeping set of changes.

First of all the primary way to buy the suite is moving from a per connection to a per named user (yes I know this is ugly for schools but I’ve been promised by project managers they are going to work around this/play lets make a deal to help people out).

Read more