This question has come up a few times with customer networking teams and it’s one that I must admit confuses me that we are having to have in 2019.
It’s 2019. FEXs are not switches and you realllllly should stop buying/deploying/using them.— John Nicholson (@Lost_Signal) May 2, 2019
The short response is no. You should avoid using these devices with vSAN, and in general with virtualization or storage traffic.
They were designed for a time when low utilization of physical servers or low-density virtualization was the norm. At the same time, the price for 10Gbps ports on fast switches was incredibly expensive.
Cisco’s troubleshooting notes on Cisco FEX make a few statements.
Move any servers with bursty traffic flows such as storage arrays and video endpoints off of the FEX and connect them directly to the base ports of the parent switch.
Common questions that have come up at VMworld and other discussions:
Q: why should I listen to a guy who does storage and virutalization about networking?
A: I don’t disagree. How about one of the Co-Flounders of the company that built the FEX?
As a co-founder of Nuova Systems, where we invented FEX, I heartily agree with this sentiment.— Tom Lyon (@aka_pugs) May 4, 2019
Q: What is VMware doing to fix this with vSAN
A: This isn’t really a VMware problem. Storage or other large traffic flows like vMotion suffer on Cisco FEX devices. Note other east/west heavy traffic flows suffer in light buffered oversubscribed environments. vMotion, and NSX are also not going to perform there best without real switch ports.
Q: What are some model numbers for the device?
Q: My networking team told me they are just like an external line card for a switch chassis?
A: Your networking team is incorrect. A real switch port can send traffic to another port without hair-pinning through another device. It’s arguable that a hub would provide a more direct route for packets from one port to another than what the FEX product line offers. Modern switches also offer much larger buffers that can help mitigate TCP incast and other issues that you will see at scale.
Q: How do I determine if my networking teams have deployed Cisco FEX devices?
A: This can be difficult without physical inspection to known issues with Cisco Discovery Protocol) not working correctly with some configurations of the devices. One sign is if the port on the switch has incredibly high designations 100/1/1 you may be looking at a FEX. It’s best to have your data center operation teams inspect the racks, and take note of model numbers in the same way you would have them physically inspect for cardboard or other things you don’t want in your datacenter. Ultimately the best solution is preventative. Talk to your networking teams about the risks of using FEX devices before they are deployed.
Q: What are some alternatives to look at?
I’m happy to take comments from other networking people about this but I’ve seen two general choices that customers use instead.
For Cisco customers looking for a device that need FCoE, the Nexus 56xx, 6000, and 7000 offer real switch ports as well as larger buffers. Note: older Nexus 50xx and 55xx have relatively small VoQ buffers that tend to not scale well with larger clusters.
For customers not needing FCoE support (which should be most customers in 2019), the C36180YC-R offers:
- 10/25Gbps access ports
- A massive 8 GB of port buffer
- A fast modern multi-core ASIC
|HCI1473BU||The vSAN I/O Path Deconstructed: A Deep Dive into the Internals of vSAN|
|???||Mystery Session: 7/27 at 3:30PM|
|HCI1769BU||We Got You Covered: Top Operational Tips from vSAN Support Insight|
|HCI3331BU||Better Storage Utilization with Space Reclamation/UNMAP|
The vSAN I/O Path Deconstructed is an interesting inside look at the IO path of vSAN and the reasoning behind it.
We Got You Covered: Top Operational Tips from vSAN Support Insight shows off the phone home capabilities of vSAN and can help address your questions about what and how this data is used. We are also going to discuss how you can leverage similar views of performance as GSS and engineering to identify how to get the most out of vSAN.
HCI3331BU is a session that has been years in the making for me. “Where did my space go” is a question I get often. We will explain where that missing PB of storage went and how to reclaim it. The savings from implementing UNMAP should be able to fund your next VMworld trip!
Lastly, I’ve got a mystery session that should be unveiled later. Follow me on Twitter @Lost_Signal, and I’ll talk about what it will be when the time comes.
Pete and I will be recording for the vSpeakingPodcast Podcast LIVE! At the HCI Zone (Found near the VMware booth). We’ve got some new guests as well as some favorites lined up.