Skip to content

Posts from the ‘HCI’ Category

Peanut Butter is Not Supported with vSphere/Storage Networking/vSAN/VCF

 From time to time I get oddball questions where someone asks about how to do something that is not supported or a bad idea. I’ll often fire back a simple “No” and then we get into a discussion about why VMware does not have a KB for this specific corner case or situation. There are a host of reasons why this may or may not be documented but here is my monthly list of “No/That is a bad idea (TM)!”.

How do I use VMware Cloud Foundation (VCF) with a VSA/Virtual Machine that can not be vMotion’d to another host?

This one has come up quite a lot recently with some partners, and storage vendors who use VSA’s (A virtual machine that locally consumes storage to replicate it) incorrectly claiming this is supported. The issue is that SDDC Manager automates upgrade and patch management. In order to patch a host, all running virtual machines must be removed. This process is triggered when a host is placed into maintenance mode and DRS carefully vMotions VMs off of the host. If there is a virtual machine on the host that can not be powered off or moved, this will cause lifecycle to fail.

What about if I use the VSA’s external lifecycle management to patch ESXi?

The issue comes in, running multiple host patching systems is a “very bad idea” (TM). You’ll have issues with SDDC Manager not understanding the state of the hosts, but also coordination of non-ESXi elements (NSX perhaps using a VIB) would also be problematic. The only exception to using SDDC manager with external lifecycle tooling tools are select vendor LCM solutions that done customization and interop (Examples include VxRAIL Manager, the Redfish to HPE Synergy integration, and packaged VCF appliance solutions like UCP-RS and VxRACK SDDC). Note these solutions all use vSAN and avoid the VSA problem and have done the engineering work to make things play nice.

JAM also not supported!

Should I use a Nexus 2000K (or other low performing network switch) with vSAN?

While vSAN does not currently have a switch HCL (Watch this space!) I have written some guidance specifically about FEXs on this personal blog. The reality is there are politics to getting a KB written saying “not to use something”, and it would require cooperation from the switch vendors. If anyone at Cisco wants to work with me on a joint KB saying “don’t use a FEX for vSAN/HCI in 2019” please reach out to me! Before anyone accuses me of not liking Cisco, I’ll say I’m a big fan of the C36180YC-R (ultra deep buffers RAWR!), and have seen some amazing performance out of this switch recently when paired with Intel Optane.

Beyond the FEX, I’ve written some neutral switch guidance on buffers on our official blog. I do plan to merge this into the vSAN Networking Guide this quarter. 

I’d like to use RSPAN against the vDS and mirror all vSAN traffic, I’d like to run all vSAN traffic through a ASA Firewall or Palo Alto or IDS, Cisco ISR, I’d like to route vSAN traffic through a F5 or similar requests…

There’s a trend of security people wanting to inspect “all the things!”.  There are a lot of misconceptions about vSAN routing or flowing or going places.

Good Ideas! – There is some false assumptions you can’t do the following. While they may add complexity or not be supported on VCF or VxRAIL in certain configurations, they certainly are just fine with vSAN from a feasibility standpoint.

  1. Routing storage traffic is just fine. Modern enterprise switches can route OSPF/Static routes etc at wire-speed just fine all in the ASIC offloads. vSAN is supported over layer 3 (may need to configure static routes!) and this is a “Good idea” on stretched clusters so spanning tree issues don’t crash both datacenters!
  2. vSAN over VxLAN/VTEP in hardware is supported.
  3. VSAN over VLAN backed port groups on NSX-T is supported.

Bad Ideas!

Frank Escaros-Buechsel with VMware support once told someone “While we do not document that as not supported, it’s a bit like putting peanut butter in a server. Some things we assume are such bad idea’s no one would try them, and there is only so much time to document all bad ideas.

  1. Trying to mirror high throughput flows of storage or vMotion from a VDS is likely to cause performance problems. While I”m not sure of a specific support statement, i’m going to kindly ask you not to do this. If you want to know how much traffic is flowing and where, consider turning on SFLOW/JFLOW/NetFlow on the physical switches and monitoring from that point. vRNI can help quite a bit here!
  2. Sending iSCSI/NFS/FCoE/vSAN storage traffic to an IDS/Firewall/Load balancer. These devices do not know how to inspect this traffic (trust me, they are not designed to look at SCSI or NVMe packets!) so you’ll get zero security value out of this process. If you are looking for virus binaries, your better off using NSX guest introspection and regular antivirus software. Because of the volume, you will hit the wire-speed limits of these devices. Outside of path latency, you will quickly introduce drops and re-transmits and murder storage traffic performance. Outside of some old Niche inline FC encryption blades (that I think Netapp used to make), inline storage security devices are a bad idea. While there are some carrier-grade routers that can push 40+ Gbps of encryption (MLXe’s I vaguely remember did this) the costs are going to be enormous, and you’ll likely be better off just encrypting at the vSCSI layer using the VM Encryption VAIO filter. You’ll get better security that IPSEC/MACSEC without massive costs.

Did I get something wrong?

Is there an Exception?

Feel free to reach out and lets talk about why your environment is a snowflake from these general rules of things “not to do!”

When is the right time to transition to vSAN?


When is the right time to swap to vSAN?

Some people say: When you refresh storage!

Others say it’s: When you refresh Servers!

They are both right. It’s not an “or” both are great times to look at it. Let us dig deeper….

Amazing ROI on switching to HCI can come from a full floor sweep that is tied to refreshing with faster servers, and newer loss cost to acquire and maintain storage. There are even awsome options for people who want another level of wrapped support and deployment (VxRAIL, HCP-UC).

But what about for cases where an existing server or storage investment makes a wholesale replacement seem out of reach?  What about the guy who just bought storage or servers yesterday and learned about vSAN (or new features that they needed like Encryption or local protection today?

Lets split these situations up and discuss how to handle them.

What happens when my existing storage investment is largely meeting my needs? What should I do with the server refresh?

Nothing prevents you from buying ReadyNodes without drives and adding them later as needed without disruption. Remember ESXi includes the vSAN software so there will be nothing to “install” other than drives in the hosts. HBA’s  are the most common missing feature from a new server and a proper high queue depth vSAN certified HBA is relatively cheap (~$300). That’s a solid investment. Not having to take a server offline later to raise the hood and install something is instant ROI on those components. Remember with Dell/Lenovo/SuperMicro/Fujitsu vSAN Config assist will handle deploying the right driver/firmware for you at the push of a button.

Some other housecleaning items to do when your deploying new hosts (on the newest vSphere!) to get you vSAN ready down the road.

  1. See if the storage is vVols compatible. If it is, start deploying it. SPBM is best way to manage storage going forward, and vSAN and vVols both share this management plane. As you move forward into vSAN, having vRA, vCloud Director, OpenStack and other tools that leverage SPBM configured to use it will allow you to leverage your existing storage investment more efficiently. It’s also a great way to familiarize yourself with vSAN management. Being able to expose storage choice into vRA to end users is powerful. Remember, VAIO and VM Encrypt also use SPBM. so it’s time to start migrating your storage workflows over to it!
  2. Double check your upcoming support renewals to make sure that you don’t have a spike creeping up on you. Having a cluster of vSAN deployed and testedand with hosts ready to expand rapidly puts you in a better position to avoid getting cornered into one more year of expensive renewals. Also watch out for other cost creep. Magic stretched cluster virtualization devices or licensing, FCoE gear, fabric switches, structured cabling for Fibre Channel expansion, and special monitoring tools for fabrics all have hidden capex and support costs. [LOL]
  3. Look at expansion costs on that storage array. Arrays will often be discounted deeply on the initial purchase but expansion can sometimes be 2-3x what the initial purchase cost was! Introducing vSAN for expansion guarantee’s  lower cost per GB as you expand (vSAN doesn’t tax drives or RAM like other solutions).
  4. Double check those promised 50x dedupe ratios and insanely low latency figures! Often data efficiency claims are made and include  Snapshots, Thin Provisioning, linked clones and other basic features.   Also, check to see that you’re getting the performance you need.

What happens when my servers were just refreshed, but I need to replace storage?

If your servers are relatively new (Xeon v3/v4/Intel Scalable/AMD EPYC) then there is a good chance that adding the needed pieces to turn them into ReadyNodes is not far off. Check out the ready node bill of materials to see if your existing platform will work. See what it needs and reach out to your server vendor for the needed HBA (and possibly NIC) upgrades to get them ready for vSAN. Your vSAN SE’s and account teams can help!