September 2014 - Virtual Ramblings

In my life as a VMware consultant I run into the following Mad Lib when trying to solve storage problems for Business critical Applications.

A customer discovers they have run out of (IOPS/Capacity/Throughput/HCL) with their existing (EMC/Dell/HP/Netapp) array. They sized only for Capacity without understanding that (RAID 6 with NL-SAS is slow, 2GB of Cache doesn’t deliver 250K IOPS). The have spent all their (Budget/rackspace/Power/Political Power/Moxie). There is also an awkward quiet moment where its realized that (Thick provisioning on Thick provisioning is wasteful, I can’t conjure IOPS out of a hat, Dedupe is only 6%, Snapshots are wasting 1/2 of their array and are still not real backs, They can’t use COW so SRM can’t test failover). Searching for solutions they hear from a junior tech that there is this new (home-made/SOHO appliance) that can meet their (Capacity/IOPS) needs at a cheap price point. And if they buy it, it probably will work… For a while.

Here’s whats missing from the discussion.

1. The business needs more than 3-5 days for parts replacement, or tickets being responded to. (Real experiences with these devices).

2. The business needs something not based on desktop class non-ECC RAM motherboards.

3. The Business needs REAL HCL’s that are verified and not tested on customers. (QNAP was saying Green drives that lacked proper TLER, and are not designed for RAID would be fine to use for quite a while).

4. The Business needs systems that are actually secured

Now I’ve heard the other argument “but John I’ll have 2 of them and just replicate!”

This is fine (once you realize that RSYNC and VMDK’s don’t play nice) until you get bit buy a code bug that hits both platforms. While technically on the VMware HCL, these guys are using open source targets (iSCSI and NFS) and are so incredibly removed from the upstream developers that they can’t quickly get anything fixed or verified quickly. 2 Systems that have a nasty iSCSI MPIO bug, or have a NFS timeout problem are worse than 1 system that “just works”. Also as these boxes are black box’s they often miss out from the benefits of open source (you patch and update on their schedules, which is why My QNAP had a version of OpenSSL at one point that was 4 years old despite being on the newest release). If both systems have hardware problems because of a power surge, or thermal problems, or user error or a bad batch your still stuck waiting days to get a fix. If its software you may be holding your breath for quite a while. With a normal server OEM or Tier 1 storage provider you have parts in 4 hours, and reliability and freedom that these boxes can’t match.

Now at this point your probably saying “but John, I need 40K IOPS and I don’t have 70K to shovel into an array.

And thats where Software Defined Storage bridges the gap. Now with SuperMicro You can get solid off the shelf servers with 4 hour support agreements without breaking the bank (This new parts support program is global BTW). For storage software you can use VMware VSAN, a platform that reduces, costs, complexity, and delivers great performance. You massively reduce your support foot print (one company for hardware, one for software) reducing operational costs and capital costs.

Nothing against the Synology, QNAP, Drobo of the world, but lets stick to the right tool for the right job!

Month: September 2014

Why you shouldn’t run BCA off a Synology (or QNAP, or other cheap Linux NAS)…