Posts tagged ‘Performance’

May 4

Should I use a Nexus 2000 series (Cisco FEX) with VMware vSAN?

This question has come up a few times with customer networking teams and it’s one that I must admit confuses me that we are having to have in 2019.

It’s 2019. FEXs are not switches and you realllllly should stop buying/deploying/using them.

— John Nicholson (@Lost_Signal) May 2, 2019

The short response is no. You should avoid using these devices with vSAN, and in general with virtualization or storage traffic.

They were designed for a time when low utilization of physical servers or low-density virtualization was the norm. At the same time, the price for 10Gbps ports on fast switches was incredibly expensive.

Cisco’s troubleshooting notes on Cisco FEX make a few statements.

Move any servers with bursty traffic flows such as storage arrays and video endpoints off of the FEX and connect them directly to the base ports of the parent switch.

Common questions that have come up at VMworld and other discussions:

Q: why should I listen to a guy who does storage and virutalization about networking?

A: I don’t disagree. How about one of the Co-Founders of the company that built the FEX?

As a co-founder of Nuova Systems, where we invented FEX, I heartily agree with this sentiment.

— Tom Lyon (@aka_pugs) May 4, 2019

Q: What is VMware doing to fix this with vSAN

A: This isn’t really a VMware problem. Storage or other large traffic flows like vMotion suffer on Cisco FEX devices. Note other east/west heavy traffic flows suffer in light buffered oversubscribed environments. vMotion, and NSX are also not going to perform there best without real switch ports.

Q: What are some model numbers for the device?

A: Devices:

N2K-C2148T-1GE

N2K-C2224TP-1GE

N2K-C2248TP-1GE

N2K-C2232PP-10GE

N2K-C2232TM-10GE

N2K-C2248TP-E-1GE

N2K-C2232TM-E-10GE

N2K-C2232TM-E-10GE

N2K-C2248PQ-10GE

N2K-C2348UPQ-10GE

B22

Q: My networking team told me they are just like an external line card for a switch chassis?

A: Your networking team is incorrect. A real switch port can send traffic to another port without hair-pinning through another device. It’s arguable that a hub would provide a more direct route for packets from one port to another than what the FEX product line offers. Modern switches also offer much larger buffers that can help mitigate TCP incast and other issues that you will see at scale.

Q: How do I determine if my networking teams have deployed Cisco FEX devices?

A: This can be difficult without physical inspection to known issues with Cisco Discovery Protocol) not working correctly with some configurations of the devices. One sign is if the port on the switch has incredibly high designations 100/1/1 you may be looking at a FEX. It’s best to have your data center operation teams inspect the racks, and take note of model numbers in the same way you would have them physically inspect for cardboard or other things you don’t want in your datacenter. Ultimately the best solution is preventative. Talk to your networking teams about the risks of using FEX devices before they are deployed.

Q: What are some alternatives to look at?

I’m happy to take comments from other networking people about this but I’ve seen two general choices that customers use instead.

For Cisco customers looking for a device that need FCoE, the Nexus 56xx, 6000, and 7000 offer real switch ports as well as larger buffers. Note: older Nexus 50xx and 55xx have relatively small VoQ buffers that tend to not scale well with larger clusters.

For customers not needing FCoE support (which should be most customers in 2019), the C36180YC-R offers:

10/25Gbps access ports
A massive 8 GB of port buffer
A fast modern multi-core ASIC

It’s unsafe at most speeds for storage. pic.twitter.com/smMM0i5T1R

— John Nicholson (@Lost_Signal) May 2, 2019

May 8

Virtual SAN performance Service: What is it? (And what about these other things)

One of the newest exciting features of Virtual SAN 6.2 is the new performance service. This is an ESXi native performance monitoring system with API, as well as UI access.

One misconception I wanted to be clear on is that it does not require the use of vCenter Operations Manager, or the vCenter database. Instead, Virtual SAN performance service uses the Virtual SAN object store to store its data in a distributed and protected fashion. For smaller environments who do not want the overhead of VSOM this is a great solution, and will complement the existing tools.

Now why would you want to deploy VSOM if this turnkey simple, low overhead performance system is native? Quite a few reasons:

VSOM offers longer term granular performance tracking. The native Virtual SAN performance service uses the same roll up schedule as vCenter’s normal performance graphs.
VSOM allows for forecasting and capacity planning as it analysis trends.
VSOM allows overlaying performance from multiple area’s and systems (Including things like switching, application KPI’s) to do root cause and anomaly analysis and correlation.
VSOM offers powerful integration with LogInsight allowing event correlation with performance graphs.
VSOM allows for rolling up performance information across hundreds (or thousands of sites) into larger dashboards.
In heterogenous enivrements using traditional storage, VSOM allows collecting fabric, and array performance information.

vCenter Server vDisk Advanced metrics

So if I don’t enable this service (or deploy VSOM) what do I get? You still get basic Latency, IOPS, throughput information from the normal vCenter performance graphs by looking at the vDisk layer. You miss out on back end component views (things like internal SSD queues and latency) as well as datastore/cluster wide metrics, but you can still troubleshoot basic issues with the built in performance graphs.

What about VSAN Observer? For those of you who remember previously this information was only available by using the Ruby vCenter shell interface (RVC). VSAN observer provides powerful visibility, but it had a number of limits:

It was designed originally for internal troubleshooting and lacks consistency with the vCenter UI.
It ran on its own web service separately and was not integrated into the existing vCenter graphs.
It was manually enabled from the RVC CLI
It could not be accessed by API
It was not recommended to run it continuously, or to deploy a separate Virtual machine/Container to run it from.

All of these limitations have been addressed with the Virtual SAN performance service.

I expect the performance service will largely replace VSAN Observer uses. VSAN observer will still be useful for customers who have not upgraded to VSAN 6.2 or where you do not have capacity available for the performance database.

There is an extensive amount of metrics that can be reviewed. It offers “top down” visibility of cluster wide performance, and virtual machine IOPS and latency.

Individual device metrics

Virtual SAN Performance service also offers “bottom up” visibility into device latency and queues on individual capacity and cache devices. For quick troubleshooting of issues, or verification of performnace it is a great and simple tool that can be turned on with a single checkbox.

Requirements:

vSphere 6.0u2

vCenter 6.0u2 (For UI)

Up to 255GB of capacity on the Virtual SAN datastore (You can choose the storage policy it uses).

In order to enable it simple follow these instructions.

Musing of a Storage guy in a Virtual World..

Posts tagged ‘Performance’

Should I use a Nexus 2000 series (Cisco FEX) with VMware vSAN?

Virtual SAN performance Service: What is it? (And what about these other things)

Categories

Archives