Go check out storagehub.vmware.com
Posts from the ‘Virtualization’ Category
We’ve all been there…
Maybe its the streets of NYC, or a corner stall in a mall in Bangkok, or even Harwin St here in Houston. Someone tried to sell you a cut rate watch or sunglasses. Maybe the lettering was off, or the gold looked a bit flakey but you passed on that possibly non-genuine watch or sunglasses. It might have even been made in the same factory, but it is clear the QC might have issues. You would not expect the same outcome as getting the real thing. The same thing can happen in ReadyNodes.
Real ReadyNodes for VMware vSAN have a couple key points.
They are tested. All of the components have been tested together and certified. Beware anyone in software-defined storage who doesn’t have some type of certification program as this opens the doors to lower quality components, or hardware/driver/firmware compatibility issues. VMware has validated satisfactory performance with the ReadyNode configurations. A Real ReadyNode looks beyond “will these components physically connect” and if they will actually deliver.
vSAN ReadyNodes offer choice. ReadyNodes are available from over a dozen different server OEM’s. The VMware vSAN Compatibility Guide offers over a thousand verified hardware components also to supplement these ReadyNodes for further customization. ReadyNodes are not limited to a single server or compoennt vendor.
They are 100% supported by VMware. Real VMware ReadyNodes don’t require virtual machines to mount, present or consume storage, or non-VMware supported VIBs be installed.
They are Mature. They run a 7th release, battle-tested, mature hypervisor integrated storage stack.
So what do you do if you’ve ended up with a fake ReadyNode? Unlike the fake watch I had to throw away, you can check with the vSAN compatibility list and see if you can with minimal controller or storage devices changes convert your system in place over to vSAN. Remember if your running ESXI 5.5 update 1 or newer, you already have vSAN software installed. You just need to license and enable it!
I would like to say that this post was inspired by Chad’s guide to storage architectures. When talking to customers over the years a recurring problem surfaced. Storage historically in the smaller enterprises tended towards people going “all in” on one big array. The idea was that by consolidating the purchasing of all of the different application groups, and teams they could get the most “bang for buck”. The upsides are obvious (Fewer silo’s and consolidation of resources and platforms means lower capex/opex costs). The performance downsides were annoying but could be mitigated. (normally noisy neighbor performance issues). That said the real downside to having one (or a few) big arrays are often found hidden on the operational side.
- Many customers trying to stretch their budget often ended up putting Test/Dev/QA and production on the same array (I’ve seen Fortune 100 companies do this with business critical workloads). This leads to one team demanding 2 year old firmware for stability, and the teams needing agility trying to get upgrades. The battle between stability and agility gets fought regularly in the change control committee meetings further wasting more people’s time.
- Audit/regime change/regulatory/customer demands require an air gap be established for a new or existing workload. Array partitioning features are nice, but the demands often extend beyond this.
- In some cases, organizations that had previously shared resources would part ways. (divestment, operational restructuring, budgetary firewalls).
Some storage workloads just need more performance than everyone else, and often the cost of the upgrade is increased by the other workloads on the array that will gain no material benefit. Database Administrators often point to a lack of dedicated resources when performance problems arise. Providing isolation for these workloads historically involved buying an exotic non-x86 processor, and a “black box” appliance that required expensive specialty skills on top of significant Capex cost. I like to call these boxes “cloaking devices” as they often are often completely hidden from the normal infrastructure monitoring teams.
A benefit to using a Scale out (Type III) approach is that the storage can be scaled down (or even divided). VMware VSAN can evacuate data from a host, and allow you to shift its resources to another cluster. As Hybrid nodes can push up to 40K IOPS (and all flash over 100K) allowing even smaller clusters to hold their own on disk performance. It is worth noting that the reverse action is also possible. When a legacy application is retired, the cluster that served it can be upgraded and merged into other clusters. In this way the isolation is really just a resource silo (the least threatening of all IT silos). You can still use the same software stack, and leverage the same skill set while keeping change control, auditors and developers happy. Even the Database administrators will be happy to learn that they can push millions of orders per minute with a simple 4 node cluster.
In principal I still like to avoid silos. If they must exist, I would suggest trying to find a way that the hardware that makes them up is highly portable and re-usable and VSAN and vSphere can help with that quite a bit.
Spiceworks Dec 1st @ 1PM Central- “Is blade architecture dead” a panel discussion on why HCI is replacing legacy blade designs, and talk about use cases for VMware VSAN.
Micron Dec 3rd @ 2PM Central – “Go All Flash or go home” We will discuss what is new with all flash VSAN, what fast new things Micron’s performance lab is up to, and an amazing discussion/QA with Micron’s team. Specifically this should be a great discussion about why 10K and 15K RPM drives are no longer going to make sense going forward.
Intel Dec 16th @ 12PM Central – This is looking to be a great discussion around why Intel architecture (Network, Storage, Compute) is powerful for getting the most out of VMware Virtual SAN.
Ok, I’ll admit this is an incredibly misleading click bait title. I wanted to demonstrate how the economics of cheaper flash make VMware Virtual SAN (and really any SDS product that is not licensed by capacity) cheaper over time. I also wanted to share a story of how older slower flash became more expensive.
Lets talk about a tale of two cities who had storage problems and faced radically different cost economics. One was a large city with lots of purchasing power and size, and the other was a small little bedroom community. Who do you think got the better deal on flash?
Just a small town data center….
A 100 user pilot VDI project was kicking off. They knew they wanted great storage performance, but they could not invest in a big storage array with a lot of flash up front. They did not want to have to pay more tomorrow for flash, and wanted great management and integration. VSAN and Horizon View were quickly chosen. They used the per concurrent user licensing for VSAN so their costs would cleanly and predictably scale. Modern fast enterprise flash was chosen that cost ~$2.50 per GB and had great performance. This summer they went to expand the wildly successful project, and discovered that the new version of the drives they had purchased last year now cost $1.40 per GB, and that other new drives on the HCL from their same vendor were available for ~$1 per GB. Looking at other vendors they found even lower cost options available. They upgraded to the latest version of VSAN and found improved snapshot performance, write performance and management. Procurement could be done cost effectively at small scale, and small projects could be added without much risk. They could even adopt the newest generation (NVMe) without having to forklift controllers or pay anyone but the hardware vendor.
Meanwhile in the big city…..
The second city was quite a bit larger. After a year long procurement process and dozens of meetings they chose a traditional storage array/blade system from a Tier 1 vendor. They spent millions and bought years worth of capacity to leverage the deepest purchasing discounts they could. A year after deployment, they experienced performance issues and wanted to add flash. Upon discussing with the vendor the only option was older, slower, small SLC drives. They had bought their array at the end of sale window and were stuck with 2 generations old technology. It was also discovered the array would only support a very small amount of them (the controllers and code were not designed to handle flash). The vendor politely explained that since this was not a part of the original purchase the 75% discount off list that had been on the original purchase would not apply and they would need to pay $30 per GB. Somehow older, slower flash had become 4x more expensive in the span of a year. They were told they should have “locked in savings” and bought the flash up front. In reality though, they would locking in a high price for a commodity that they did not yet need. The final problem they faced was an order to move out of the data center into 2-3 smaller facilities and split up the hardware accordingly. That big storage array could not easily be cut into parts.
There are a few lessons to take away from these environments.
- Storage should become cheaper to purchase as time goes on. Discounts should be consistent and pricing should not feel like a game show. Software licensing should not be directly tied to capacity or physical and should “live” through a refresh.
- Adding new generations of flash and compute should not require disruption and “throwing away” your existing investment.
- Storage products that scale down and up without compromise lead to fewer meetings, lower costs, and better outcomes. Large purchases often leads to the trap of spending a lot of time and money on avoiding failure, rather than focusing on delivering excellence.
I’m in Vegas rounding out the conference tour (VMworld,SpiceWorld,VMworld,DellWorld) for what looks to be a strong finish. This is my first time at VeeamOn and I’m looking forward to briefings across the full Veeam portfolio. I’m looking forward to being shamed by the experts in Lab Warz and getting my hands dirty with the v9.
More importantly I”m looking forward to some great conversations. The reason why I value going to conferences goes beyond great sessions and discussions with vendors at the solutions expo. The conversations with end users (small, large and giant) help you learn where the limits are (and how to push past them) in the tools you rely on. I’ve had short conversations over breakfast that saved me six months of expensive trial and error that others had been through. A good conference will attract both small and massive scale customers and bring together great conversations that will help everyone change their perspective and get things done.
I started my IT career as a customer. It was great having complete ownership of the environment but eventually I wanted more. I moved to the partner side and the past five years have been amazing. I have worked with more environments than I can count. It exposed me to diverse technical and operational challenges. It gave me the opportunity to see first hand past the marketing what worked and what did not work. I would like to thank everyone (customers, co-workers) and all of the people who I was able to directly work with who helped me reach this point in my career. I also want to thank people who freely share to the greater community. Their blogs, their words of caution, their advice, their presentations at conferences all contributed in helping me succeed. I will miss the amazing team at Synchronet but it was time for change.
Starting today, I will be in a new role at VMware in Technical Marketing for VMware VSAN. I am excited for this change, and look forward to the challenges ahead. In this position I hope to learn and give back to the greater community that has helped me reach this point. I will still blog various musings here, but look for VSAN and storage content at Virtual Blocks.
I look forward to the road ahead!
In my life as a VMware consultant I run into the following Mad Lib when trying to solve storage problems for Business critical Applications.
A customer discovers they have run out of (IOPS/Capacity/Throughput/HCL) with their existing (EMC/Dell/HP/Netapp) array. They sized only for Capacity without understanding that (RAID 6 with NL-SAS is slow, 2GB of Cache doesn’t deliver 250K IOPS). The have spent all their (Budget/rackspace/Power/Political Power/Moxie). There is also an awkward quiet moment where its realized that (Thick provisioning on Thick provisioning is wasteful, I can’t conjure IOPS out of a hat, Dedupe is only 6%, Snapshots are wasting 1/2 of their array and are still not real backs, They can’t use COW so SRM can’t test failover). Searching for solutions they hear from a junior tech that there is this new (home-made/SOHO appliance) that can meet their (Capacity/IOPS) needs at a cheap price point. And if they buy it, it probably will work… For a while.
Here’s whats missing from the discussion.
1. The business needs more than 3-5 days for parts replacement, or tickets being responded to. (Real experiences with these devices).
2. The business needs something not based on desktop class non-ECC RAM motherboards.
3. The Business needs REAL HCL’s that are verified and not tested on customers. (QNAP was saying Green drives that lacked proper TLER, and are not designed for RAID would be fine to use for quite a while).
4. The Business needs systems that are actually secured
Now I’ve heard the other argument “but John I’ll have 2 of them and just replicate!”
This is fine (once you realize that RSYNC and VMDK’s don’t play nice) until you get bit buy a code bug that hits both platforms. While technically on the VMware HCL, these guys are using open source targets (iSCSI and NFS) and are so incredibly removed from the upstream developers that they can’t quickly get anything fixed or verified quickly. 2 Systems that have a nasty iSCSI MPIO bug, or have a NFS timeout problem are worse than 1 system that “just works”. Also as these boxes are black box’s they often miss out from the benefits of open source (you patch and update on their schedules, which is why My QNAP had a version of OpenSSL at one point that was 4 years old despite being on the newest release). If both systems have hardware problems because of a power surge, or thermal problems, or user error or a bad batch your still stuck waiting days to get a fix. If its software you may be holding your breath for quite a while. With a normal server OEM or Tier 1 storage provider you have parts in 4 hours, and reliability and freedom that these boxes can’t match.
Now at this point your probably saying “but John, I need 40K IOPS and I don’t have 70K to shovel into an array.
And thats where Software Defined Storage bridges the gap. Now with SuperMicro You can get solid off the shelf servers with 4 hour support agreements without breaking the bank (This new parts support program is global BTW). For storage software you can use VMware VSAN, a platform that reduces, costs, complexity, and delivers great performance. You massively reduce your support foot print (one company for hardware, one for software) reducing operational costs and capital costs.
Nothing against the Synology, QNAP, Drobo of the world, but lets stick to the right tool for the right job!
VMware: EVO:RAIL – It looks like our shift to SuperMicro for VSAN was the right choice. Will be be looking for
EVO:Rack – A vBlock without limits? We will see.
OpenStack – VMware is doing a massive amounts of code push to OpenStack so OpenStack can control vSphere, NSX etc allowing for people to run VMware API’s and OpenStack API’s for higher level functionality.
Containers – Docker, Google, Pivotal are allowing very clean and consistent operational deployments.
NSX – Moving security from the edge to Layer 2. Get ready to hear “Zero Trust networking”. The biggest challenge in Enterprise shops is they are going to have to define and understand their networking needs on a granular level. For once network security ability will outrun operational understanding. If your a Sysadmin today get ready to have to understand and defend every TCP connection your application makes, but take comfort in that policy engines will allow this discussion to only have to happen once.
Cloud Volumes – While I’m most excited about this as a replacement for Persona, there are so many use cases (Physical, Servers, Thinapp, Profiles, VDI) that I know its going to take some serious lab time to understand where all we can use this.
vCloud Air – In a final attempt to get SE’s everywhere to quit calling it “Veee – Cheese” VMware is re-branding the name. I was skeptical last year, but have found a lot of interest in clients in recent months as Hurricane Seasons closes in Houston
I’m looking forward to this week and here are a few highlights of what I’ll be looking into.
On the tactical
1. Settling on a primary load balancing partner for VMware View. (Eying Kemp, anyone have any thoughts?). I’ve got a number of smaller deployments (few hundred users) that need non-disruptive maintenance operations, and patching on the infrastructure and are looking to take their smaller pilots or deployments forward.
2. Learn more about VDP-A designs, and best practices. I’ve seen some issues in the lab with snapshots not getting removed from the appliance and need to understand the scaling and design considerations better.
3. Check out some of the HOL updates. Find out if a VVOL lab in the office is worth the investment.
On the more general strategic goals.
Check out cutting edge vendors, and technologies from VMware.
CloudVolumes – More than just application layering. Server application delivery, Profile abstraction thats fast and portable, and a serious uplift to persona and ThinApp. Really interested in use cases having it used as a delivery method for Thinapp.
DataGravity – In the era of software defined storage this is a company making a case that an array can provide a lot of value still. Very interesting technology but the questions remain. Does it work? Does it Scale? Will they add more file systems, and how soon will EMC/HP buy them to bolt this logic into their Tier 1 arrays. Martin Glassborow has made a lot of statements that new vendors don’t do enough to differentiate, or that we’ve reached peak features (Snaps, replication, cache/tier flash, data reduction etc) but its interesting to see someone potentially breaking outside of this mold of just doing the same thing a little better or cheaper.
VSAN – Who is Marvin? What happened to Virsto? I’ve got questions and I hope someone has answers!