Skip to content

Archive for

How to make a vSAN storage only node? (and not buy a mainframe!)

I get asked on occasion, “can I buy a vSAN storage only node?” It’s generally followed by a conversation how they were told that storage only nodes are the only way to “control costs in an HCI future”. Generally they were told this by someone who doesn’t support external storage, doesn’t support easy expansion of existing hosts with more drives, and has management tools that are hostile to external storage and in some cases not support entire protocols.

It puzzled me at first as it’s been a long time since someone has tried to spin only being able to buy expansion storage from a single vendor in large chunks as a good thing. You would think it’s 1976 and we are taking about storage for mainframes.

 

 

By default vSAN allows you to use all hosts in a cluster for both storage and compute and encourages you to scale both out as you grow.

First off, this is something that can be avoided with a few quick tricks.

  1. If you are concerned about growing storage asymmetrically, I encourage you to design some empty drive bays in your hosts so that you can add additional disk groups in place (It’s not uncommon to see customers double their storage by just purchasing drives and not having to pay more for VMware licensing!). I see customers put 80TB in a host, and with all flash RAID 5 and Dedupe and Compression you can get a LOT of data in a host! I’ve seen a customer buy a R730XD and only use 8 drive bays to start and triple their storage capacity in place by simply buying some (Cheaper, as it was a year later) drives!
  2. If this is request is because of HIGHLY asymmetric growth of cold data (I have 50 TB of data for hot VM’s, and 600TB per host worth of cold data growth) I’d encourage you to use vSAN for the hot data and look at vVols for the cold data. VMware is the only HCI platform that gives you a seamless management framework (SPBM) for managing both HCI storage, as well as external storage. vSAN is great for 80% of total use cases (and more than often enough for 100% of many customers) but for corner cases we have a great way to use both. I’ve personally run a host with vSAN, iSCSI, FC and NFS and it works and is supported just fine. Having vVols to ease the management overhead of those other profiles can make things a lot better! If your growing bulk cold data with NL-SAS drives at large scale like this JBOD’s on a modular array are going to be the low cost option.

Now back to the question at hand. What about if the above approaches don’t work. I just need a little more storage, (maybe another host or 3’s worth) and my storage IO profile is growing with my data so it’s not a hot/cold problem and I’d rather keep it all on vSAN. Also you might have a concern about licensing as you have workloads that if they use a CPU for compute will need to license the host (Oracle, Windows etc).  In this case you have two options for a vSAN storage only node.

First lets define what a storage only node is.

  1. A storage only node is a node that does not provide compute to the cluster. It can not run virtual machines without configuration changes. 
  2. A storage only node while not providing compute adds storage performance and capacity to the cluster.

The first thing is to determine what licensing you are using.

If you are using vSphere Enterprise Plus here is how to make a storage node

Lets assume we are using all flash and purchase a 2RU host with 24 drive bays of 2.5” drives and fill it full of storage (~80TB of SSD can be put into a host today, but as bigger drives are certified in the future this could easily be a lot more!). Now to keep licensing costs down we are going to get a single socket CPU, and get fewer cores (but keep the clock speed high). This should help control power consumption.

you can leverage DRS “Anti-affinity” rules to keep virtual machines from running on a host. Make sure to use the “MUST” rules, and define that virtual machines will never run on a host.

Deploy LogInsight. It can track vMotions and power on events and give you a log that shows that a host was never used for licensing/auditing purposes.

At this point we just need a single CPU license for vSphere, and a single vSAN socket license and we are ready to roll. If down the road we decide we want to allow other workloads (maybe something that is not licensed per socket) we can simply tune our DRS rules and allow that host to be used for those virtual machines (maybe carve out a management DRS pool and put vROPS, LI, and the vCSA on those storage hosts?).

Next up, if you are using a licensing tier that does NOT have access to DRS you can still make a storage only node.  

Again, we buy our 2RU server with a single CPU and a token amount of ram to keep licensing costs down and stuff it full of 3.84TB drives!

Now since we don’t have DRS we are going to have to find other ways to prevent a VM from being powered onto a host, or vMotioned to a host.

Don’t extend the Virtual Machine port groups to that host!

Deploy a separate vDS for the storage hosts, and do not setup virutal machine port groups. A virtual machine will not power up on a host that it can not find it’s port group on.

What if I’m worried someone might create a port group?

Just take away their permissions to create them, or change them on Virutal Machines!

In this case your looking at a single socket of vSphere and a single socket of vSAN. Looking at the existing price for drives, in this case the “Premium” for software for this storage only node would be less than 10% of the costs of the drives. As someone who used to sell storage arrays I’d put the licensing costs as comparable to what I’d pay for an empty JBOD shelf. There’s a slight premium here for the server, but as your adding additional controller capacity, for workloads that are growing IO with capacity this isn’t really a bad thing as the alternative was overbuying controller capacity up front to handle this expansion.

The other thing to note, is that your investment in vSAN and vSphere licensing is a perpetual one. In 3 years when 16TB drives are low costs nothing stops you from upgrading some disk groups and using your existing licensing. In this way your perpetual license for vSAN is getting cheaper every year.

If you want to control storage and licensing costs, VMware gives you a lot of great options. You can expand vSAN in place, you can add storage only nodes for a low cost for perpetual licenses, and you can serve wildly diverse storage needs with VVOls hand the half a dozen protocols we support. Buying into a platform that can only be expanded by a single vendor runs counter to the promise of a software defined datacenter. This leads us back to the dark ages of mainframes.

Using SD cards for embedded ESXi and vSAN?

*Update to include corruption detection script, and better KB on endurance and size requirements for  boot devices*

I get a lot of questions about embedded installations of VMware vSAN.

Cormac has written some great advice on this already.

This KB explains how to increase the crash dump partition size for customers with over 512GB of RAM.

vSAN trace file placement is discussed by Cormac here.

Given that vSAN does not support running VMFS on the same RAID controller used for pass thru this often causes customers to look at embedded ESXi installs. Today a lot of deployments are done using embedded SD cards because they support a basic RAID 1 mirror system.

The issue

While not a vSAN issue directly this issue can impact vSAN customers. We have identified this issue on non-vsan hosts.

GSS has seen challenges with lower quality SD cards exhibiting significantly higher failure rates as bad batches in the supply chain have caused cascading failures in clusters. VMware has researched the issue and found that a amplification of reads is making the substandard parts fail quicker. Note the devices will not outright fail, but can be detected by running a hash of the first 20MB repeatedly and getting different results. This issue is commonly discovered on a reboot. As a result of this in 6.0U3 we have a method of redirecting the VMTools to a RAMDisk as this was found to be the largest source of reads to the embedded install. The process for setting this as follows.

Prevention

Log into each host using an SSH connection and set the ToolsRamdisk option to “1”:

1. esxcli system settings advanced set -o /UserVars/ToolsRamdisk -i 1
2. Reboot the ESXi host
3. Repeat for remaining hosts in the cluster.

Thanks to GSS/Engineering for hunting this issue down and getting this work around out. More information can be found on the KB here. As a proactive measure I would recommend all embedded SD card and USB device deployments use this flag, as well as any environment that seeks faster VMTools performance.

Detection

Knowing is half the battle!

This host will likely not survive a reboot!

What if you do not know if you are impacted by this issue?  William Lam has written this great script that will check the MD5 hash of the first 20MB in 3 passes, to detect if you are impacted by this issue. (Thanks to Dan Barr for testing).

Going forward I expect to see more deployments with High endurance SATADOM devices, as well as in future server designs embedded M.2 slots for boot devices becoming more common and SD cards retired as the default option. While these devices may lack redundancy I would expect a higher MTBF for one of these than a pair of low quality/cost SD cards. The lack of end to end nexus checking on embedded devices vs a full drive also contribute to this. Host profiles and configuration backups can mitigate a lot of the challenges of rebuilding one in the event of a failure.

Mitigation

Check out this KB for how to Backup your ESXi configuration (somewhere other than the local device).

Evacuate the host swap in the new device with a fresh install and restore the configuration.

Looking for a new Boot Device?

Although a 1GB USB or SD device suffices for a minimal installation, you should use a 4GB or larger device. The extra space will be used for an expanded coredump partition on the USB/SD device. Use a high quality USB flash drive of 16GB or larger so that the extra flash cells can prolong the life of the boot media, but high quality drives of 4GB or larger are sufficient to hold the extended coredump partition. See Knowledge Base article http://kb.vmware.com/kb/2004784.

Looking for guidance on what the endurance and size you need for a embedded boot device (as well as vSAN advice?). Check out KB2145210 that breaks out what different use cases need.