Skip to content

Maximum supported vSAN/vSphere/VCF Cluster

So lets talk about “maximum supported” and why it’s just a weird thing to focus on. Example building the biggest vSAN cluster possibly supported. It is today ONLY 64 nodes, but lets unpack what I can run in 64 nodes.

vSAN 7 supports 32TB Capacity devices. 5 disk groups *7 devices that’s ~1.12PB per host of raw capacity (Please don’t do this without calling me, but hey it is there).

With AMD 2 socket servers 128 Cores currently. Intel 56 core that’s 224 threads (yes, I’m aware the 9200 may require it’s own nuke plant to run and cool) I’m going to ignore quad socket for this conversation but yes, we support quad sockets like Synergy 660 Gen10 for SAP HANA.

Maximum GPU’s served for a vSAN Cluster is actually a fun one. 16GPU per host, is going to need interesting cooling solutions…. or will it?

BitFusion allows for remote CUDA calls to be served from remote hosts (and ones not even in the cluster). So GPU workload scaling potentially could get pretty nuts and I’m going to leave others to speculate on how many GPU cores could serve a cluster.

Maximum Memory is 16TB DRAM per host, and 12TB of PMEM per host. So a PB of DRAM, and 768TB of PMEM per cluster.

Now, addressable memory gets more fun as TPS, Memory ballooning, and DRS (with new cool capabilities in 7) mean that the actually allocated could be a bit higher. And when you are spending the price of what I assume a G5 Jet costs, that you are going to use these features. Should you design to these maximums? In general no. Most people have other reasons to split a cluster up (Remember we can do shared nothing migrations between clusters always). People will want to limit the blast zone of a management domain etc.

Part of the benefit of HCI is you can easily scale it down to 2 node even… Also remember Hosts can be reclaimed and moved to other workload domains in VMware Cloud Foundation.

Lastly, just because you can, doesn’t mean you should. Most sane people don’t need or want 40 drives in a host, or want an HA event to result in 6TB of memory worth of VM’s rebooting at once. Respect operational reasons to limit blast zones and sizing. As VMware makes it easier to migrate or share resources between clusters these kinds of limits matter less and less.

What happens when someone typo’s the RFP to say “2000LB rubber ducky instead of 2000 rubber duckies”