What is the CPU overhead of vSAN ESA compression and deduplication?

Deduplication and Compression is now broken out in the UI as separate metrics

New in vSAN 9.1 are two key space saving technologies:

  1. General availability of global deduplication
    • It now supports operation with encrypted clusters
    • A new compression family is in use that achieves much better compression. This builds on advancements in ESA compression over OSA compression (block granularity is much smaller).

vSAN Compression 9.1

The compression process is “in-line”. This means it happens “in the I/O path” on all inbound writes. While in theory this costs some CPU to perform, it reduces CPU in other ways (reduces the amount of TCP overhead to push packets over the storage network, reduces the data that has to be committed to disk etc). The savings helps offset the cost of compression. As a result of the trade-offs and benefits here in compute overhead compression is now ALWAYS ON in 9.1. If you write data it’s compressed and then sits on disk and is “done”…. or is it…

Deduplication I/O path

Deduplication is instead an optional feature, and it operates post-process. This means it is NOT in the write I/O path. I get asked a lot

“How much compute should I plan for deduplication to need with vSAN ESA”

“What is the performance impact on writes for deduplication.”

These are valid concerns, as inline deduplication can be quite CPU thirsty, as well as it also tends to potentially bottleneck write throughput (or require other compromises, or cost overheads like dedicated accelerators).

By operating as a post process system, vSAN ESA deduplication can dynamically run when there is free compute resources on the hosts. This means from a CPU budgeting in sizing you need to allocate 0% CPU to deduplication. Remember, free CPU cycles can not be saved in a jar for next week, and idle CPU cycles only cost small variations in power usage on an otherwise idle core. CPU is fundamentally a real time commodity, and as such we do not need to concern ourselves with it’s overheads.

If compression is inline, “Why am I seeing the compression ratio go up and down on an otherwise idle cluster?”. Clearly, there must be a post-process occurring with compression? Yes and no…

Yes, there is a post-process (deduplication), but no, it’s not compression that is running on idle data. What you are seeing is that as deduplication runs, it will deduplicate highly compressed data (which makes the compression ratio go down) as well as deduplicate poorly compressed data (which makes the compression ratio go up).

vSAN ESA in 9.1 brings highly competitive deduplication and compression ratios to the main stage and does so while keeping CPU usage under control. This ensures that the consolidation of compute workloads can be just as efficient as storage.

vSAN 9 Depreciation announcements about Hybrid vSAN

The hybrid configuration in vSAN Original Storage Architecture feature will be discontinued in a future VCF release.

Hybrid vSAN, was a great start to vSAN and continued along for a while for “lowest cost storeage” but it quickly makes less and less sense because of of a combination of software and hardware changes. Over time the place of slow NL-SAS drives in the I/O path has been pushed farther and farther away from virtual machine storage. What has changed with vSAN to make NVMe backed vSAN Express Storage Architecture the logical replacement for vSAN Original Storage Architecture Hybrid?

Software improvements of ESA vs. vSAN OSA Hybrid

The highly optimized I/O path that VSAN ESA offers, gives RAID 1 type performance with RAID 6 being used. This gives better data protection, better capacity (300x overhead for FTT=2 hybrid vs. 1.5x).

vSAN ESA offers compression and global de-duplication across the entire cluster helping compact data even data compaction.

vSAN ESA also negates any need for “cache devices” further reducing the bill of materials cost. The “bad” performance of magetic drives was blunted by use of larger cache devices but now your paying for flash that does not increase the capacity of the system.

Hardware Improvements:

vSAN ESA today supports “Read Intensive” flash drives that are 21 cents per GB. Combined with the above mentioned data efficiency gains it’s possible to have single digit “cents” per GB effective storage for a VCF customer. Yes I”m viewing software as a sunk cost here, please do your own TCO numbers, and competitively bid out multiple OEMs for flash drives!

The reality of Magnetic drives

As the industry has abandoned “Fast” 15K/10K drives, the only drives shipped in any quantity are 7.2K NL-SAS drives. These drives at best tend to perform “80-120” IOPS (In out operations per second). If you have worked with storage you may have noticed that this number has not improved in the last decade. It’s actually worse than that, as the 1TB NL-SAS drive you were looking at 12 years ago had a higher density of performance per GB than current generation.


Note that 100 does’t mean sub-ms latency, it means a minimum of Queue Depth of 4-8 and Native Command Queueing re-ordering commands in the most optimal way and latency is going to be 45-80ms (Which is considered unacceptable in 2025 for transactional applications). Users are no longer willing to “Click” and wait “seconds” for responses in applications that need multiple I/O checks. It’s not uncommon to hear storage administrators refer to NL-SAS 7.2K RPM drives as “Lower latency tape.” Large NL-SAS drives (10TB+) are suitable only for archives, backups, or cold object storage where data sits idle. They are generally too slow for active workloads.

The delta in performance between between magnetic drives and the $3 per GB SATA drives when vSAN launched was large, but the jump to NVMe drives requires a logarithmic scale to measure in a meainful way that does not look ridiculous.

Single Tier ESA vs. Cache design

1. Cache Misses as capacity scaled increasingly are brutal. It’s a bit like shifting from 6th gear to 1st gear while driving 200MPH.

2. You eventually start throwing so much cache at a workload that you have to ask “Why didn’t I just tier the data there… and why don’t I just go all flash and get better capacity efficiencies”
Even the cache drives used by OSA in the early days are anemic in performance compared to modern NVMe drives Modern NVMe drives are absolute monsters, and even with data services, and distributed data protection and other overheads in the I/O path they can deliver sub ms response times and six figure IOPS per nodes.

What about low activity edge locations?

I sometimes see people ask about hybrid for small embedded edge sites, trying to chase low cost but other benefits to ESA > OSA hybrid continue.

  1. ESA supports nested RAID inside the hosts for added durability. A simple 2+1 RAID 5 can be used inside of a 2 node configuration that can survive the loss of a host, the witness, and a drive in the remaining host.
  2. The Environmental tolerance of NVMe drives (heat, vibration, lower quality air) tends to result in less operationally expensive drive replacements.
  3. The Bit error rate of NVMe flash drives are significantly superior to even Enterprise NL-SAS drives.

What about Backup targets?

This blog is not highlighting a new trend. Archive tier and “copies of copies” generally are the only place that NL-SAS drives end up deployed. Increasing the industry is seeing QLC chosen as a better target for “Primary backups.” Waiting 7 hours to recover a production environment from ransomware or accidental deletion. The NL-SAS backed storage makes more sense for “beyond 90 day” and “legally mandated multi-year retention” archives. Ideally a fully hydrated “replica” on a VLR replicated cluster is ideal for stuff that wants a “minutes” recovery time objective, rather than an hour or days recovery time that NL-SAS backed dedupe appliances often deliver on.

To be fair, NL-SAS drives (like tape!) are continuing to be sold in high volume. Their use case (like tape!) though has been pushed out of the way for newer, faster more cost effective solutions for production virtual machine and container workloads. If you have nostalgia for NL-SAS drives feel free to copy your offsite DSM backup target to a hyperscaler object storage bucket as that likely is where the bulk of NL-SAS drives are ending up these days. I will caution though that QLC and private cloud object storage repo’s are coming increasing for that use case.


https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-0/release-notes/vmware-cloud-foundation-90-release-notes/platform-product-support-notes/product-support-notes-vsan.html

Updating ESXi using ESXCLI + Broadcom Tokens

I was going to update a lab host I have at home that is currently not managed by an external vCenter server. Historically I would do something like this to accomplish this task.

esxcli software profile update -p ESXi-8.0U2d-24585300-standard \
-d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml

As has been discussed the update mirrors for vSphere now require a token to download from. In addition the paths are changing.


Current URL
Replace with
https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xmlhttps://dl.broadcom.com/<Download Token>/PROD/COMP/ESX_HOST/main/vmw-depot-index.xml
https://hostupdate.vmware.com/software/VUM/PRODUCTION/addon-main/vmw-depot-index.xmlhttps://dl.broadcom.com/<Download Token>/PROD/COMP/ESX_HOST/addon-main/vmw-depot-index.xml
https://hostupdate.vmware.com/software/VUM/PRODUCTION/iovp-main/vmw-depot-index.xmlhttps://dl.broadcom.com/<Download Token>/PROD/COMP/ESX_HOST/iovp-main/vmw-depot-index.xml 
https://hostupdate.vmware.com/software/VUM/PRODUCTION/vmtools-main/vmw-depot-index.xmlhttps://dl.broadcom.com/<Download Token>/PROD/COMP/ESX_HOST/vmtools-main/vmw-depot-index.xml

So what does my command look like?

First go get your token.

Replacing the token into the URL path it goes something like this:

esxcli software profile update -p ESXi-8.0U3b-24280767-standard -d https://dl.broadcom.com/TOKEN_GOES_HERE/PROD/COMP/ESX_HOST/main/vmw-depot-index.xml --no-hardware-warning

What if I get a memory error?

Run the following 4 commands. William Lam has a blog on this workaround.

esxcli system settings advanced set -o /VisorFS/VisorFSPristineTardisk -i 0 
cp /usr/lib/vmware/esxcli-software /usr/lib/vmware/esxcli-software.bak sed -i 's/mem=300/mem=500/g' /usr/lib/vmware/esxcli-software.bak
mv /usr/lib/vmware/esxcli-software.bak /usr/lib/vmware/esxcli-software -f
esxcli system settings advanced set -o /VisorFS/VisorFSPristineTardisk -i 1

You’ll also need to open and close the firewall.

esxcli network firewall ruleset set -e true -r httpClient

SO let’s put all of that into a single copy paste block?

esxcli system settings advanced set -o /VisorFS/VisorFSPristineTardisk -i 0 
cp /usr/lib/vmware/esxcli-software /usr/lib/vmware/esxcli-software.bak sed -i 's/mem=300/mem=500/g' /usr/lib/vmware/esxcli-software.bak
mv /usr/lib/vmware/esxcli-software.bak /usr/lib/vmware/esxcli-software -f
esxcli system settings advanced set -o /VisorFS/VisorFSPristineTardisk -i 1
esxcli network firewall ruleset set -e true -r httpClient
esxcli software profile update -p ESXi-8.0U3b-24280767-standard -d https://dl.broadcom.com/TOKEN_GOES_HERE/PROD/COMP/ESX_HOST/main/vmw-depot-index.xml --no-hardware-warning
esxcli network firewall ruleset set -e false -r httpClient

While most people will use vCenter etc to manage hosts, for anyone with a stand alone host, or in a home lab this is a handy command to quickly patch something.

What if I get the following error?

 [MetadataDownloadError]
Could not download from depot at https://dl.broadcom.com/TOKEN_GOES_HERE/PROD/COMP/ESX_HOST/main/vmw-depot-index.xml, skipping (('https://dl.broadcom.com/TOKEN_GOES_HERE/PROD/COMP/ESX_HOST/main/vmw-depot-index.xml', '', 'HTTP Error 404: Not Found'))
url = https://dl.broadcom.com/TOKEN_GOES_HERE/PROD/COMP/ESX_HOST/main/vmw-depot-index.xml
Please refer to the log file for more details.

You forgot to replace the placeholder for the token code I put in the syntax. 🙂

If it runs successfully you should be greeted with a big wall of text.

Picking out drive cages for a HPE vSAN ESA ReadyNode (DL360 Gen Edition!)

This comes from a twitter thread here, and ThreadReader roll up (For people not signed into twitter)

Is this Server vSAN ESA Compatible @HPE DL360 Gen Edition! A BOM Review 🧵

First off the key things we want to focus on are:

What’s on the BOM:

“HPE ProLiant DL360 Gen11 8SFF x4 U.3 Tri-Mode Backplane Kit”

“HPE 15.36TB NVMe Gen4 High Performance Read Intensive SFF BC U.3 PM1733a SSD”

What’s not on the BOM:

SmartArray/RAID controller 

First off: HPE 15.36TB NVMe Gen4 High Performance Read Intensive SFF BC U.3 PM1733a SSD

Here is a search for all HPE drives on the vSAN VCG:

https://www.vmware.com/resources/compatibility/search.php?deviceCategory=ssd&details=1&vsan_type=vsanssd&ssd_partner=515&ssd_tier=4&keyword=PM1733a&vsanrncomp=true&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc

Here is the drive in question:

Note while you are here click the “Subscribe” button in the bottom corner for updates to changes on the VCG and note this uses the inbox driver, with the newest supported firmware being HPK5. Also certified for ESA.

vmware.com/resources/comp…


https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=ssd&productid=51437&deviceCategory=ssd&details=1&vsan_type=vsanssd&ssd_partner=515&ssd_tier=4&keyword=PM1733a&vsanrncomp=true&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc

While I”m here I’ll check what that firmware version fixed. Looks fairly serious from a stability basis so I’ll make sure to use HPE’s HSM + vLCM to patc this drive to the current firmware.

https://support.hpe.com/hpesc/public/docDisplay?docId=a00112800en_us&docLocale=en_US

Next up let’s look at P48896-B21: 1 HPE ProLiant DL360 Gen11 8SFF x4 U.3 Tri-Mode Backplane Kit

So this is the drive cage you NEED to use NVMe in a DL360 as it gives 4 PCI-E lanes to each drive (vs the cheaper basic one that only is 1x and only supports SATA in pass through. 

Looking at the QuickSpecs There are a 3 other options for ESA vSAN.

LFF 3.5”: Can’t do NVMe pass through.

24G x 1 NVMe/SAS U3. Can’t do NVMe pass through, and frankly will underperform with NVMe drives even if used for RAID.

20EDSFF – Supported for ESA.

LFF Server (not supported for VSAN ESA)

Rambling out loud, I think E3 form factor stuff is a better play in the long term as it allows more density. 2.5” SFF really is going to end up legacy for greenfield that shouldn’t be needed.

Note the E3 config will support 300TB, 2x the SFF ones.
(Please go 100Gbps networking if your doing something that dense!)

We do have a NS204i-u, but that is only for a pair of M.2 boot devices (and a GREAT idea for boot, stop doing SD card, boot from SAN weird stuff!). This WILL NOT and cannot be used with the larger SFF or E3 format drives (and that’s a good thing!). 

Next up what’s missing. There is NOT a RAID controller (Generally starts with MR or SR). If there’s one of these the NVMe drives will potentially be cabled to it (and that’s bad, and not supported by vSAN ESA).

Per the quickspecs:

“Includes Direct Access cables and backplane power cables. Drive cages will be connected to Motherboard (Direct Attach) if no Internal controller is selected. Direct Attach is capable of supporting all the drives (SATA or NVMe).”

Now I’ll note this BOM only supports 8 drives, but if your willing to not have an optical drives, or a front USB/Display port There is a way to get 2 more cabled in:

HPE ProLiant DL360 Gen11 2SFF x4 U.3 BC Tri-Mode Enablement Kit P48899-B21 

One other BOM review item. They went 4 x 25Gbps. If you don’t already have 25Gbps TOR switches I would honestly go 2 x 100Gbps. It’s about 20% more cost all in with cables and optics, but it’s 2x the bandwidth and the rack will look prettier. 

There’s also not an Intel VROC license/config item on here. This is a “software(ish) RAID option for NVMe. We don’t need/want this for vSAN ESA. In theory there might be a way to use this for a boot device but use the NS controller instead for now.

In general talk to your HPE Solution architects, their quoting tools should be able to help (HPE always had really good channel tools), if possible start with a ReadyNode/vSAN ESA option to lock out bad choices.

Thanks to Dan R for providing me some insight into this. 

I’m sure @plankers already noticed the lack of a TPM.

It’s now embedded, and disabled if your servers is going to China.

I’m glad HPE stoped making this an removable option.

nother point, for anyone playing with the new memory Tiering, you also are going to want that cabled this way, as that feature is not supported through a RAID controller either. 

What Happens When I Change the Key Provider, KMIP, Native Key Provider, NKP, for vSAN Encryption?

What Happens When I Change the Key Provider, KMIP, Native Key Provider, NKP, for vSAN Encryption? vSAN encryption provides easy, fast data at rest encryption, as well as a unique data in transit encryption option. Data at rest encryption specifically requires a key provider to be used. This can either be an external KIMP provider (Certification list found here), as well as a native key provider option that is bundled with the vCenter Server. For various reasons a customer may wish to switch keys, or even switch to keys provided by a different key provider.

“Can I change the Key provider, KMIP, Native Key Provider, NKP, for vSAN/vSphere Encryption?” The short response is “yes” this is quick/easy and supported. Within the UI you will change to the new keys used, anda shallow rekey operation will kick-off.

What happens when I change the keys? Changing the keys is a shallow rekey operation, NOT a deep rekey operation. What does that mean? A deep key swaps the KEK and DEK and forces a re-write of all of the data to the disk groups one at a time,this kind of operation can take a rather long time. A shallow re-key is rather quick as it will create new anew KEK for the cluster and push it to the hosts. Each device’s DEK will then be re-wrapped with the new KEK+DEK combination.

  • The full process to change the keys from within the UI is as follows:
  • The initial KMS configuration is in place
  • The administrator selects an alternate KMS Cluster
  • The new KMS configuration is pushed to the vSAN hosts
  • A new host key is generated
  • vSAN performs a Shallow Rekey

More information on vSAN Encryption operations can be found in the VSAN Encryption Services Tech note.

Auto-Policy Remediation Enhancements for the ESA in vSAN 8 U2Auto-Policy Remediation

Auto-Policy Remediation Enhancements for the ESA in vSAN 8 U2

vSAN 8 U1 introduced a new Auto-Policy Management feature that helps administrators run their ESA clusters with the optimal level of resilience and efficiency. This helps takes the guesswork and documentation consultation after deploying or expanding a cluster on what is the most optimal policy configuration. In vSAN 8 U2, we’ve made this feature even more capable.

Background

Data housed in a vSAN datastore is always stored in accordance with an assigned storage policy, which prescribes a level of data resilience and other settings. The assigned storage policy could be manually created, or a default storage policy created by vSAN. Past versions of vSAN used a single “vSAN Default Storage Policy” stored on the managing vCenter Server to serve as the policy to use if another policy wasn’t defined and applied by an administrator. Since this single policy was set as the default policy for all vSAN clusters managed by the vCenter Server, it used settings such as a failures to tolerate of 1 (FTT=1) using simple RAID-1 mirroring to be as compatible as possible with the size and the capabilities of the cluster.This meant that the default storage policy wasn’t always optimally configured for a given cluster. The types, sizes, and other characteristics of a cluster might be very different. Changing a policy rule optimized for one cluster may not be ideal, or even compatible with another cluster. We wanted to address this, especially since the ESA eliminates compromises in performance between RAID-1 and RAID-5/6.

Auto-Policy Management for ESA

Configuration of the policy is covered in the 8U1 feature blog here. Once configured, this will automatically create the relevant SPBM policy for the cluster.



Upon the addition or removal of a host from a cluster, the Auto-Policy Management feature will evaluate if the optimized default storage policy needs to be adjusted. If vSAN identifies the need to change the optimized default storage policy, it does so by providing a simple button in the triggered health finding to change the affected storage policy, at which time it will reconfigure the cluster-specific default storage policy with the new optimized policy settings. It will also rename the policy to reflect the newly suggested settings. This guided approach is intuitive, and simple for administrators to know their VM storage policies are optimally configured for their cluster. This change specifically addresses improved behavior for ongoing adjustments in the cluster. Upon a change to the cluster size, Instead of creating a new policy (as it did in vSAN 8 U1), the Auto-Policy Management feature will change the existing, cluster specific storage policy.

Upon a reconfiguration of the Auto-Policy generated storage policy, the automatically generated name will also be adjusted. For example, in a 5-host standard vSAN cluster without host rebuild reserve enabled, the auto-policy management feature will create a RAID-5 storage policy, and use the name of: “cluster-name – Optimal Datastore Default Policy – RAID5”



If an additional host is added to the cluster, after a 24 hour period, the following events will occur:

The Administrator will be prompted with an optional button “Update Cluster DS Policy.”

This will trigger two events The existing policy is changed to RAID-6 The existing policy’s name is changed to “cluster-name – Optimal Datastore Default Policy – RAID6

As described in the steps above, vSAN 8 U2 still does not automatically change the policy without their knowledge. The difference with vSAN 8 U2 is that upon a change of a host count in a cluster, we not only suggest the change, but upon an administrator manually click on the button “Update Cluster DS Policy” we will make this adjustment for them. A host in maintenance mode does not impact this health finding. The number of hosts in a cluster are defined by those that have been joined in the cluster


Configuration Logic for Optimized Storage Policy for Cluster

The policy settings the optimized storage policy uses are based on the type of cluster, the number of hosts in a cluster, and if the Host Rebuild Reserve (HRR) capacity management feature is enabled on the cluster. A change to any one of the three will result in vSAN making a suggested adjustment to the cluster-specific, optimized storage policy. Note that the Auto-Policy Management feature is currently not supported when using the vSAN Fault Domains feature.

Standard vSAN clusters (with Host Rebuild Reserve turned off):

  • 3 hosts without HRR : FTT=1 using RAID-1
  • 4 hosts without HRR: FTT=1 using RAID-5 (2+1)
  • 5 hosts without HRR: FTT=1 using RAID-5 (2+1)
  • 6 or more hosts without HRR: FTT=2 using RAID-6 (4+2)

    Standard vSAN clusters (with Host Rebuild Reserve enabled)
  • 3 hosts with HRR: (HRR not supported with 3 hosts)
  • 4 hosts with HRR: FTT=1 using RAID-1
  • 5 hosts with HRR: FTT=1 using RAID-5 (2+1)
  • 6 hosts with HRR: FTT=1 using RAID-5 (4+1)
  • 7 or more hosts with HRR: FTT=2 using RAID-6 (4+2)

vSAN Stretched clusters

  • 3 data hosts at each site: Site level mirroring with FTT=1 using RAID-1 mirroring for a secondary level of resilience
  • 4 hosts at each site: Site level mirroring with FTT=1 using RAID-5 (2+1) for secondary level of resilience.
  • 5 hosts at each site: Site level mirroring with FTT=1 using RAID-5 (2+1) for secondary level of resilience.
  • 6 or more hosts at each site: Site level mirroring with FTT=2 using RAID-6 (4+2) for a secondary level of resilience.

vSAN 2-Node clusters:

2 data hosts: Host level mirroring using RAID-1

Summary

The new improved Auto-Policy Management feature in vSAN 8 U2 serves as a building block to make vSAN ESA clusters even more intelligent, and easier to use. It gives our customers confidence that resilience settings for their environment are optimally configured.

What should I be paying for NVMe drives for ESA VSAN? (October 2024)

It’s come to my attention that a lot of people shopping storage really don’t know what to expect for NVMe server drives. Also looking at some quotes recently I can say some of you are getting great prices, and some of you are getting…. Well a quote…

I’m seeing discounted prices in the 12 cents (Read Intensive, Datacenter Class drives) to closer to 30 cents (Mixed use, fancier Enterprsie class drives) depending on volume and order. I’m also seeing some outliers (OEMs charging 60 cents per GB?!?!). Seeing better/worse pricing? message me on twitter @Lost_Signal.

I did look around the ecosystem and see one seller closing in on 10 cents per GB for one of the Samsung drives in an OEM caddy.



While DRAM and other component costs matter, vSAN Storage only clusters with dense nodes (200-300TiBs of NVMe) will typically see over 80% of the hardware BOM be the NVMe drives. This is driving a lot of focus on drive pricing and some awkward questions with server sales/accounting teams trying to explain charging 4-5x the going rate for drives.

So why is there such a difference in drive prices?

Drive Types


First off there’s a number of critera that can influence the price of a drive:

  1. What’s the endurance? Mixed-Use (3 drive write per day) drives are what vSAN ESA started with, but it is worth noting they cost more. How much more? ~20% more than Read Intensive drives that only support 1 DWPD. Do I need Mixed use? In short most of you do not, but you should check your change rate, or write rate. Very high throughput data warehouses doing tons of ETLs or large automation farms may see the need to pay the fancier drive that will last longer, and likely have better high end write throughput. I would expect 90% of clusters can use Read Intensive drives though at this point.
  2. Enterprise or Datacenter class TLC drives – Much like “value SAS” before a cheaper slightly less featured (single port vs. 2 port which does NOT matter inside a server), slightly less performant class of NVMe drive is showing up on quotes. I’m so far a fan, for anything but ultra high write throughput workloads it should save you some money. It’s positioned well to replace SATA. and furthers the argument that vSAN OSA is a legacy platform, and ESA should be all new builds. Speaking to one vendor recently they were skeptical of the need for QLC NAND when the cheaper “Datacenter class” TLC can hit pretty solid price points without some of the performance and endurance limits that QLC currently faces (To be fair, we all said the same thing about SLC, and MLC and TLC before, so in the long run I”m sure we will end up on QLC and PLC eventually).
  3. SAS/SATA are not supported by vSAN ESA, but frankly I’m seeing prices at the same or frankly worse for similarish SAS drives. I don’t expect SAS/SATA to show up in the datacenter much going forward beyond maybe M.2 boot devices.

Price List and Discounting

  1. Price list price. Note there are two factors at play here. A vendor will have a list price that is HIGHLY inflated (think 10-12x the component cost to them or even a normal person purchasing that device). These price lists are not consistent vendor to vendor. Price lists are not always universal, they might be per country, by quarter, by contract vehicle and by company. Negotiated price lists can do some weird things. vehicles that are not updated quarterly effectively mean you have committed to worse prices over time (As market prices go down). Also older price lists will not include newer drives or SKUs that are cheaper, sometimes forcing customers to purchase older servers/drives etc at higher cost.
  2. Discount % – When I ask people what they pay for drives or servers they often reply with a discount percent, with a slight bit of excitement and zero context. This is a bit like me telling people I paid 30% off for an air filter yesterday. (30% of WHAT?) Discussing discount without knowing the price list markup is a bit like buying a car without knowing what currency you are negotiating in. Different OEMs have different blends of markup, and base discounts. One Tier 1 vendor OEM’s example of expected discounts are:

    55% – Anyone with a pulse should get this discount.
    65% – If you found a partner and they felt like making 20% off of you, this is your normal pricing for a small order from a small company.
    75% – A reasonable normal discount
    85% – A large order, or an order from a large company who does a lot of purchasing.
    90%+ You bought a railcar sized order.

    Some factors that can influence discount size:



    Note Tier 2/3 OEMs tend to have much more “Street ready pricing” by default.

Some factors that can influence discount size

  • Size of deal – Larger orders can discount more.
  • Financial Shenanigans – Some server vendors are currently trying to operate as a SaaS companies in their financial reporting to wallstreet. As part of this cosplaying as a subscription service, they will only quote sane discounts/prices if you structure the deal as a subscription deal. They may require this have a cloud connected component that in reality has no real value, but I assure you is required by auditors to comply with ASC 606 accounting regulations and totally isn’t dubiously stretching the line at the unique value requirements of the cloud bits. If you do not want a quote that costs 3x what it should, and would like servers delivered this year instead of 2026, I suggest you roll your eyes, and ask for that new cloud thing!.
  • Competitive pressure – Competitive deals (meaning there is another vendor quoting servers or drives) typically unlocks 10-30% better pricing for the sales team. If you NEVER quote anyone else (even as a benchmark) you will discover your pricing power even at scale slowly atrophies over time. Seriously, go invite Lenovo, or Hitachi, Fujitsu or some other vendor to throw a quote at the wall. Even if you likely plan to stick with your existing OEM, you will find this helps keep pricing a bit more honest.

Vendor doesn’t want to sell you the drives (because they want to sell something else!)– This one is weird, but if you are asking a VAR/Server vendor who also sells storage to quote you NVMe drives for vSAN… They may have a perverse incentive to mis-price them so they can sell you a higher margin external storage array. Server components (especially to partners, I used to work for one!) tend to offer less margin, and vendor sales reps may have quota buckets they need to fill in storage. This reminds me of the wise words of Eric.

Common factors for higher prices

“The customer gets ONE of the votes on what they get to buy” – Enterprise Storage sales rep who I saw make 700K in commission.

You specified a very specific drive they don’t have in stock – Vendors have gotten increasingly annoyed with being forced to stock like for like parts for replacement, and supply chain management of 40 different NVMe drive SKUs (performance, encryption, endurance, capacity variables) has allowed their supply chain guys to offer discounts for “Agnostic SKUs” (where you get something that meets the spec). While I am partial to some specific drive SKUs this can cost you anywhere from 20% to 100% as well as delays in shipping. By discounting drives they have to sell and want to sell they can make sure the server gets sold in THIS quarter so they can book revenue now.

Sandbagging, SPIFFS and other odd sales behaviors – People who sell most of the time want to help the customer solve a problem. That said they also are driven by a long list of various incentives to sell specific things at specific times. This is referred to as “Coin-Operated” behavior. Sandbagging is a term used when a sales team purposely slows down a deal. This could be because they have hit a ceiling on how much commission they can earn, or accelerators to their commission. SPIFFs are one off payments for selling specific things, often paid not by the sales teams employer but a manufacturer or partner directly. It frankly always felt strange to have a storage vendor trying to pay me in Visa Gift cards on the side (I generally refused these, as it felt like a illicit transaction) but it does happen.

#vSAN #ESA #NVMe #TCO #Price

My Dog

Meet Otto. He’s a rescue who’s I guess around 10 years old, weighs 11-12 pounds and likes children and all people.

His likes are naps, sunning himself in the yard, and making me take him on 2-3 walks a day to keep me in shape.

He is weirdly quiet (Doesn’t bark), and doesn’t really make a mess or scratch at things.

I was told by my agent to attach a photo for landlords to see to applications, and finding some of their platforms don’t make this an option, will just embed a link to this blog post. If you see him on a greenbelt trail you or your children are welcome to pet him, he’s extremely harmless.

is HPE Tri-Mode Supported for ESA?

No.

Now, the real details are a bit more complicated than that. It’s possible to use the 8SFF 4x U.3 TriMode (not x1) backplane kit, but only if the server was built out with only NVMe drives, and no RAID controller/Smart Array. Personally I’d build of of E.3 drives. For a full BOM review and a bit more detail check out this twitter thread on the topic where I go step by step through the BOM outlining what’s on it, why and what’s missing.

How to configure a fast end to end NVMe I/O path for vSAN

A quick blog post as this came up recently. Someone who was looking t NVMoF with their storage was asking how do they configure a similar end to end vSAN NVMe I/O path that avoids SCSI or serial I/O queues.

Why would you want this? NVMe in general uses significantly less CPU per IOP compared to SCSI, has simpler hardware requirements commonly (no HBA needed), and can deliver higher throughput and IOPS at lower latency using parallel queuing.

This is simple:

  1. Start with vSAN certified NVMe drives.
  2. Use vSAN ESA instead of OSA (It was designed for NVMe and parralel queues in mind, with additional threading at the DOM layer etc).
  3. Start with 25Gbps ethernet, but consider 50 or 100Gbps if performance is your top concern.
  4. Configure the vNVMe adapter instead of the vSCSI or LSI Buslogic etc. controllers.
  5. (Optional) – Want to shed the bonds of TCP and lower networking overhead? Consider configuring vSAN RDMA (RCoE). This does require some specific configuration to implement, and is not required but for customers pushing the limits of 100Gbps in throughput this is something to consider.
  6. Deploy the newest vSAN version. The vSAN I/O path has seen a number of improvements even since 8.0GA that make it important to upgrade to maximize performance.

To get started adda. NVMe Controller to your virtual machines, and make sure VMtools is installed in the guest OS of your templates.

Note you can Migrate existing VMDKs to vNVMe (I recommend doing this with the VM powered off). Also before you do this you will want to install VMtools (So you have the VMware paravirtual NVMe controller driver installed).