This question has come up a few times with customer networking teams and it’s one that I must admit confuses me that we are having to have in 2019.
The short response is no. You should avoid using these devices with vSAN, and in general with virtualization or storage traffic.
They were designed for a time when low utilization of physical servers or low-density virtualization was the norm. At the same time, the price for 10Gbps ports on fast switches was incredibly expensive.
Cisco’s troubleshooting notes on Cisco FEX make a few statements.
Move any servers with bursty traffic flows such as storage arrays and video endpoints off of the FEX and connect them directly to the base ports of the parent switch.
Common questions that have come up at VMworld and other discussions:
Q: why should I listen to a guy who does storage and virutalization about networking?
A: I don’t disagree. How about one of the Co-Flounders of the company that built the FEX?
Q: What is VMware doing to fix this with vSAN
A: This isn’t really a VMware problem. Storage or other large traffic flows like vMotion suffer on Cisco FEX devices. Note other east/west heavy traffic flows suffer in light buffered oversubscribed environments. vMotion, and NSX are also not going to perform there best without real switch ports.
Q: What are some model numbers for the device?
Q: My networking team told me they are just like an external line card for a switch chassis?
A: Your networking team is incorrect. A real switch port can send traffic to another port without hair-pinning through another device. It’s arguable that a hub would provide a more direct route for packets from one port to another than what the FEX product line offers. Modern switches also offer much larger buffers that can help mitigate TCP incast and other issues that you will see at scale.
Q: How do I determine if my networking teams have deployed Cisco FEX devices?
A: This can be difficult without physical inspection to known issues with Cisco Discovery Protocol) not working correctly with some configurations of the devices. One sign is if the port on the switch has incredibly high designations 100/1/1 you may be looking at a FEX. It’s best to have your data center operation teams inspect the racks, and take note of model numbers in the same way you would have them physically inspect for cardboard or other things you don’t want in your datacenter. Ultimately the best solution is preventative. Talk to your networking teams about the risks of using FEX devices before they are deployed.
Q: What are some alternatives to look at?
I’m happy to take comments from other networking people about this but I’ve seen two general choices that customers use instead.
For Cisco customers looking for a device that need FCoE, the Nexus 56xx, 6000, and 7000 offer real switch ports as well as larger buffers. Note: older Nexus 50xx and 55xx have relatively small VoQ buffers that tend to not scale well with larger clusters.
For customers not needing FCoE support (which should be most customers in 2019), the C36180YC-R offers:
- 10/25Gbps access ports
- A massive 8 GB of port buffer
- A fast modern multi-core ASIC
This is a topic that comes up quite a bit. A lot has been written previously about how big should your vSphere clusters be and Duncan’s musings on this topic are still very valid.
It generally starts with:
“I have 1PB in my storage frame today, can I build a 1PB vSAN cluster?”
The short response is yes, you can certainly build a PB vSAN cluster, and build 64 node clusters (there are customers who have broken 2 PB within a cluster, and customers with 64 node clusters), but you stop and think if you should.
You want 16PB in a single rack, and 99.9999999% availability?
We have to stop and think about things beyond cost control when designing availability. I always chuckle when people talk about arrays having seven 9’s of availability. The question to ask yourself is if the storage is up, but the network is down does anyone care? Once we include things “outside of storage” we often find that the reality of uptime is often more limited. The actual environmental (Power, Cooling) of a datacenter are rated at best 99.98% by the uptime institute. Traditionally we tried to make the floor tile that our gear sat in to be as resilient as possible.
James Hamilton of Amazon has pointed to WAN connectivity to being another key bottleneck to uptime.
“The way most customers work is that an application runs in a single data center, and you work as hard as you can to make the data center as reliable as you can, and in the end you realize that about three nines (99.9 percent uptime) is all you’re going to get,”
The uptime institute has done a fair amount of research in this space, and historically their definition of a Tier IV facility involved providing only up to 99.99% uptime (4 nines).
Getting beyond 4 nines of uptime for remote users (who are the mercy of half finished internet standards like BGP) is possible but difficult.
Availability most be able to account for the infastructure it rests on, and resiliency in storage and applications must account for the physical infrastructure.
Lets review traditional storage cost and operational concepts and why we today have reached a point where customers are putting over 1PB into a storage pool.
- Capital Costs – Some features may be licensed per frame, and significant discounts may be given if large purchase are made up front rather than as capacity is needed. Sparing capacity and overhead as a % of a storage pool become smaller if your growth rate is fixed.
- Opex – While many storage frames may have federation tools, there are still process’s that are often done manually, particularly for change control reasons because of the scale of an outage of a frame (I talked to a customer who had one array fail and take out 4000 VM’s including their management virtual machines).
- Performance – wide striping or on hybrid systems aggregating cache and controllers and ports reduced the change of a bottleneck being reached.
The next Change Control Window for my Array is 2022
Patching/Change Control – Talking to a lot of customers they are often running the same firmware that their storage array came with. The risk, or the 15 second “gap” in IO as controllers are upgraded is often viewed as a huge risk. This is made worst by the most risk averse application on the cluster effectively dictates patching and change control windows. No one enjoys late night all hands on deck patching windows for storage arrays.
- Parallel remediation in patch windows – Deploying more storage systems means more manual intervention. Traditional arrays often lack good tools for management and monitoring of parallel remediation. Often times more storage arrays means more change control windows.
- Aligning the planets on the HCL – To upgrade a Fibre Channel Array, you must upgrade ESXi, the Array, The Fabric Version, the Fibre Channel HBA firmware, and the server BIOS to align with the ESXi upgrade. This is a lot of moving parts, all of which that carry risks of a corner case being identified.
Lets review how vSAN dresses these costs without driving you to put everything in one giant cluster..
- Capital Costs – vSAN licensing is per socket and hosts can be deployed with empty drive bays. Drives for regular severs regularly fall in in price, making it cheaper to purchase what you need now and add drives to hosts as needed to meet capacity growth. Overhead for spare capacity for rebuilds does reduce as you add hosts, but nothing forces you to fill each host with capacity up front and no additional licensing costs will be invoked by having partially full servers.
- Opex – vSAN’s normal management plane (vCenter) is easily federated and storage policies span clusters without any additional work. Lifecycle management like controller updates from the Config assist, and health monitoring alerts easily roll up to a single pane of glass.
- Performance – All Flash has changed the game. You no longer need 1000 spindles and wide striping to get fast or consistent performance. Pooling workloads with 3 tier storage architecture and storage arrays actually increases the chance that you might saturate throughput, or buffers on fibre channel switching.
- Patching – vSAN patching can be done simply using existing tools for updating ESXi (VMware Update Manager), and lifecycle update for storage controllers can be pushed by a simple click from the UI in vSAN 6.6. Customers already have ESXi patching windows and processes deployed and maintenance mode with vMotion is as trusted and battle tested means to evacuate a host.
- VMware Update manager (VUM) can remediate multiple clusters in parallel. This means you can patch as many (or as few) clusters, and when used with DRS this is fully automated including placement of virtual machines.
- Additional intelligence has been deployed for vSAN to include remediation of Firmware. Given that vSAN does not use proprietary Fibre Channel fabrics, is integrated into ESXi, and lacks the need for proprietary fabric HBA’s this significantly reduces the number of planets to align when planning an upgrade window.
In summery I wanted to say. While vSAN can certainly scale to the multi-PB cluster size, you should look if you actually need to scale up this much. In many cases you would be better served by at scale running multiple clusters.
We’ve all been there…
Maybe its the streets of NYC, or a corner stall in a mall in Bangkok, or even Harwin St here in Houston. Someone tried to sell you a cut rate watch or sunglasses. Maybe the lettering was off, or the gold looked a bit flakey but you passed on that possibly non-genuine watch or sunglasses. It might have even been made in the same factory, but it is clear the QC might have issues. You would not expect the same outcome as getting the real thing. The same thing can happen in ReadyNodes.
Real ReadyNodes for VMware vSAN have a couple key points.
They are tested. All of the components have been tested together and certified. Beware anyone in software-defined storage who doesn’t have some type of certification program as this opens the doors to lower quality components, or hardware/driver/firmware compatibility issues. VMware has validated satisfactory performance with the ReadyNode configurations. A Real ReadyNode looks beyond “will these components physically connect” and if they will actually deliver.
vSAN ReadyNodes offer choice. ReadyNodes are available from over a dozen different server OEM’s. The VMware vSAN Compatibility Guide offers over a thousand verified hardware components also to supplement these ReadyNodes for further customization. ReadyNodes are not limited to a single server or compoennt vendor.
They are 100% supported by VMware. Real VMware ReadyNodes don’t require virtual machines to mount, present or consume storage, or non-VMware supported VIBs be installed.
They are Mature. They run a 7th release, battle-tested, mature hypervisor integrated storage stack.
So what do you do if you’ve ended up with a fake ReadyNode? Unlike the fake watch I had to throw away, you can check with the vSAN compatibility list and see if you can with minimal controller or storage devices changes convert your system in place over to vSAN. Remember if your running ESXI 5.5 update 1 or newer, you already have vSAN software installed. You just need to license and enable it!