Should I use a Nexus 2000 series (Cisco FEX) with VMware vSAN?
This question has come up a few times with customer networking teams and it’s one that I must admit confuses me that we are having to have in 2019.
It’s 2019. FEXs are not switches and you realllllly should stop buying/deploying/using them.— John Nicholson (@Lost_Signal) May 2, 2019
The short response is no. You should avoid using these devices with vSAN, and in general with virtualization or storage traffic.
They were designed for a time when low utilization of physical servers or low-density virtualization was the norm. At the same time, the price for 10Gbps ports on fast switches was incredibly expensive.
Cisco’s troubleshooting notes on Cisco FEX make a few statements.
Move any servers with bursty traffic flows such as storage arrays and video endpoints off of the FEX and connect them directly to the base ports of the parent switch.
Common questions that have come up at VMworld and other discussions:
Q: why should I listen to a guy who does storage and virutalization about networking?
A: I don’t disagree. How about one of the Co-Flounders of the company that built the FEX?
As a co-founder of Nuova Systems, where we invented FEX, I heartily agree with this sentiment.— Tom Lyon (@aka_pugs) May 4, 2019
Q: What is VMware doing to fix this with vSAN
A: This isn’t really a VMware problem. Storage or other large traffic flows like vMotion suffer on Cisco FEX devices. Note other east/west heavy traffic flows suffer in light buffered oversubscribed environments. vMotion, and NSX are also not going to perform there best without real switch ports.
Q: What are some model numbers for the device?
Q: My networking team told me they are just like an external line card for a switch chassis?
A: Your networking team is incorrect. A real switch port can send traffic to another port without hair-pinning through another device. It’s arguable that a hub would provide a more direct route for packets from one port to another than what the FEX product line offers. Modern switches also offer much larger buffers that can help mitigate TCP incast and other issues that you will see at scale.
Q: How do I determine if my networking teams have deployed Cisco FEX devices?
A: This can be difficult without physical inspection to known issues with Cisco Discovery Protocol) not working correctly with some configurations of the devices. One sign is if the port on the switch has incredibly high designations 100/1/1 you may be looking at a FEX. It’s best to have your data center operation teams inspect the racks, and take note of model numbers in the same way you would have them physically inspect for cardboard or other things you don’t want in your datacenter. Ultimately the best solution is preventative. Talk to your networking teams about the risks of using FEX devices before they are deployed.
Q: What are some alternatives to look at?
I’m happy to take comments from other networking people about this but I’ve seen two general choices that customers use instead.
For Cisco customers looking for a device that need FCoE, the Nexus 56xx, 6000, and 7000 offer real switch ports as well as larger buffers. Note: older Nexus 50xx and 55xx have relatively small VoQ buffers that tend to not scale well with larger clusters.
For customers not needing FCoE support (which should be most customers in 2019), the C36180YC-R offers:
- 10/25Gbps access ports
- A massive 8 GB of port buffer
- A fast modern multi-core ASIC