How to handle isolation with scale out storage
I would like to say that this post was inspired by Chad’s guide to storage architectures. When talking to customers over the years a recurring problem surfaced. Storage historically in the smaller enterprises tended towards people going “all in” on one big array. The idea was that by consolidating the purchasing of all of the different application groups, and teams they could get the most “bang for buck”. The upsides are obvious (Fewer silo’s and consolidation of resources and platforms means lower capex/opex costs). The performance downsides were annoying but could be mitigated. (normally noisy neighbor performance issues). That said the real downside to having one (or a few) big arrays are often found hidden on the operational side.
- Many customers trying to stretch their budget often ended up putting Test/Dev/QA and production on the same array (I’ve seen Fortune 100 companies do this with business critical workloads). This leads to one team demanding 2 year old firmware for stability, and the teams needing agility trying to get upgrades. The battle between stability and agility gets fought regularly in the change control committee meetings further wasting more people’s time.
- Audit/regime change/regulatory/customer demands require an air gap be established for a new or existing workload. Array partitioning features are nice, but the demands often extend beyond this.
- In some cases, organizations that had previously shared resources would part ways. (divestment, operational restructuring, budgetary firewalls).
Some storage workloads just need more performance than everyone else, and often the cost of the upgrade is increased by the other workloads on the array that will gain no material benefit. Database Administrators often point to a lack of dedicated resources when performance problems arise. Providing isolation for these workloads historically involved buying an exotic non-x86 processor, and a “black box” appliance that required expensive specialty skills on top of significant Capex cost. I like to call these boxes “cloaking devices” as they often are often completely hidden from the normal infrastructure monitoring teams.
A benefit to using a Scale out (Type III) approach is that the storage can be scaled down (or even divided). VMware VSAN can evacuate data from a host, and allow you to shift its resources to another cluster. As Hybrid nodes can push up to 40K IOPS (and all flash over 100K) allowing even smaller clusters to hold their own on disk performance. It is worth noting that the reverse action is also possible. When a legacy application is retired, the cluster that served it can be upgraded and merged into other clusters. In this way the isolation is really just a resource silo (the least threatening of all IT silos). You can still use the same software stack, and leverage the same skill set while keeping change control, auditors and developers happy. Even the Database administrators will be happy to learn that they can push millions of orders per minute with a simple 4 node cluster.
In principal I still like to avoid silos. If they must exist, I would suggest trying to find a way that the hardware that makes them up is highly portable and re-usable and VSAN and vSphere can help with that quite a bit.