Skip to content

Posts from the ‘Storage’ Category

PSA: Developers and SQL admins do not understand storage

Thin Provisioning is one of my favorite technologies, but with all great technology comes great responsibility.

This afternoon I got a call from a customer having an issue with a SQL backup. They were preparing a major code push and were running a scripted full SQL backup to have a quick restore point if something goes wrong.
I was sent the following

10 percent processed.
20 percent processed.
30 percent processed.
40 percent processed.
50 percent processed.
60 percent processed.
70 percent processed.
80 percent processed.
90 percent processed.
Msg 64, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available.)

The server had frozen from a thin provisioning issue, but tracing through the workflow that caused this highlighted a common problem of SQL administrators everywhere. Backups where being done to the same volume/VMDK as the actual database. For every 1GB of SQL database there was another 10GB of backups, wasting expensive tier 1 storage.

The Problem:

SQL developers LOVE to make backups at the application level that they can touch/see/understand. They do not trust your magical Veeam/VDP-A. Combined with NTFS being a relatively thin unfriendly file system (always writing to new LBA’s when possible) this means that even if a database isn’t growing much, if backups get placed on the same volume any attempt at being thin even on the back end array is going to require extra effort to reclaim. They also do not understand the concept of a shared failure domain, or data locality. If left to their own devices put the backups on the same RAID group of expensive 15K or flash drives, and go so far as to put it on the same volume/VMDK even if possible. Outside of the obvious problems for performance, risk, cost, and management overhead, this also means that your Changed Block Tracking and backup software is going to be baking up (or having to at least scan) all of these full backups every day.

The Solution:

Give up on arguing with them that your managed backups are good enough. Let them have their cake, but at least pick where the cake comes from and goes.
Create a VMDK on a separate array (in a small shop something as cheap as a SATA backed Synology can provide a really cheap NFS/iSCSI target for this). Exclude this drive from your backups (or adjust when it runs so it doesn’t impact your backup windows).
Careful explain to them that this new VMDK (name the volume backups) is where backups go.
Now accept that they will ignore this and keep doing what they have been doing.
In Windows turn on file screens and block the file extensions for SQL backup files.
Next turn on reporting alerts to email you anytime someone tries to write such a file, so you’ll be able to preemptively offer to help them setup the maintence jobs so they will work.

Why software storage is far less riskey to your buisness

I was talking to a customer who was worried about the risks of a software based storage system, but thinking back I keep thinking of all of the risks of buying “hardware” defined storage systems. Here’s a few situations over the years I’ve seen (I’m not picking on any of these vendors here, just explaining situations with context).

1. Customer buys IBM N-Series. Customers FAS unit hits year 4 of operation. Customer discovers support renewal for 1 year will cost 3x buying a new system. Drives have custom firmware and can not be purchased 2nd hand in event system needs emergency life support as tier 2 system.

Solution: Customer can extend support on HP/Dell Servers without ridiculous markups. StarWind/Vmware VSAN and other software solutions don’t care that your in “year 4”.

2. Customer has an old VNXe/VNX kit. Customer would like to use flash or scale up the device with lots and lots of drives. Sadly, The flare code running on this was not multi-threaded. Customer discovers that this critical feature is coming out but will require a forklift. Customer wonders why they were sold an array with multi-core processors that were bragged about when the core storage platform couldn’t actually use them. Flash storage pool is pegging out a CPU core and causing issues with the database.

Solution: Software companies want everyone on the new version. Most storage/software companies (VMware VSAN, Starwind etc.) include new features in the new version. Occasionally there will be something crazy good thats a added feature, but at least your not looking at throwing away all the disks (and investments in controllers) you’ve made just for a single much needed feature.

3. Customer bought MD3000i. One year later VMware puts out a new version, and fail over quits working on the MD3000i. Dell points out the device is end of support and LSI isn’t updating it. Customer gets sick of all path down situations and keeps their enviroment on an old ESXi release, realizing that their 2 year old array is an albatross.
Discussions of sketchy NFS front end kludge come up but in the end the customer is stuck.

Solution: Had another customer have this happen (Was with Datacore) but this customer was running it on COTS (DL180+MSA’s stacked). Customer could easily switch to a different software/storage vendor (Starwind etc). In this case they were coming up on a refresh so we just threw on CentOS and turned the thing into a giant Veeam Target.

Software based storage fundamentally protects you from the #1 unpredictable element in storage. The vendor….

VSAN build #2 Part 1 JBOD Setup and Blinkin Lights

(Update, the SM2208 controller in this system is being removed from the HCL for pass through.  Use RAID 0)

Its time to discuss the second VSAN build. This time we’ve got something more production ready, properly redundant on switching and ready to deliver better performance. The platform used is the SuperServer F627R2-F72PT+

The Spec’s for the 4 node’s

2 x 1TB Seagate Constellation SAS drives.
1 x 400GB Intel SSD S3700.
12 x 16GB DDR3 RAM (192GB).
2 x Intel Xeon E5-2660 v2 Processor Ten-Core 2.2GHz
The Back end Switches have been upgraded to the more respectable M7100 NetGear switches.

Now the LSI 2208 Controller for this is not a pass through SAS controller but an actual RAID controller. This does add some setup, but it does have a significant queue depth advantage over the 2008 in my current lab (25 vs 600). Queues are particularly important when dropping out of cache bursts of writes to my SAS drives. (Say from a VDI recompose). Also Deep queues help SSD’s internally optimize commands for write coalescence internally.

If you go into the GUI at first you’ll be greeted with only RAID 0 as an option for setting up the drives. After a quick email to Reza at SuperMicro he directed me to how to use the CLI to get this done.

CNTRL + Y will get you into the Megaraid CLI which is required to set JBOD mode so SMART info will be passed through to ESXi.

$ AdpGetProp enablejbod -aALL // This will tell you the current JBOD setting
$ AdpSetProp EnableJBOD 1 -aALL //This will set JBOD for the Array
$ PDList -aALL -page24 // This will list all your devices
$ PDMakeGood -PhysDrv[252:0,252:1,252:2] -Force -a0 //This would force drives 0-2 as good
$ PDMakeJBOD -PhysDrv[252:0,252:1,252:2] -a0 //This sets drives 0-2 into JBOD mode

They look angry don't they?

They look angry don’t they?

Now if you havn’t upgraded the firware to at least MR5.5 (23.10.0.-0021) you’ll discover that you have red drive lights on your drives. You’ll want to grab your handy dos boot disk and get the firmware from SuperMicro’s FTP.

I’d like to thank Lucid Solution’s guide for ZFS as a great reference.

I’d like to give a shout out to the people who made this build possible.

Phil Lessley @AKSeqSolTech for introducing me to the joys of SuperMicro FatTwin’s some time ago.
Synchronet, for continuing to fund great lab hardware and finding customers wanting to deploy revolutionary storage products.

What you mean to say about VSAN

Having spent some time with VSAN, and talking to customers there is a lot of excitement. It goes without fail that some people who are selling other scale out storage, and traditional solutions might be a little less excited. I have a few quick thoughts on Henderson.

He points out that VSAN is not concerned with data location. He goes on to say that this will limit performance as data will have to be read over the network and this will prevent scalability as VM’s sprawl across hosts, that the increased east west traffic back to the storage will cause bottlenecks. In reality a 16 node VSAN cluster will likely be served by a single stack of 10Gbps Core or TOR switches and this will potentially have fewer hops than a large centralized Netapp or Pure or other traditional big iron array that is trying to serve multiple clusters and having to contend with switch uplinks. Limiting this traffic to the cluster actually makes it easier to handle, as there is less contention and stress points. All traditional vendors require random reads reach out over a storage network (and In Netapp Cluster Mode or Isilon deployments may be any number of different nodes). Given that VSAN does not use NFS or iSCSI, but a simpler more lightweight protocol would actually imply that this might even put a simpler, lower load on the network. IBM’s XIV system (Their Tier 1.5 solution for anyone who does not need a DS 8000) even uses a similar design internally. This is not a “fragile” or 1.0 design. It is one used extensively in the storage world today.

Next he goes on to dwell on recommendations of 10Gbps for the storage network. This is no different than what a typical architect would design for a high throughput Netapp or Pure deployment. If you need lots of storage IO, you deploy lots of network. This is nothing particularly novel. While he cites Duncan saying 10Gbps is recommended he ignores Duncan’s great article on how VSAN can be deployed on a 2 x 10Gbps connected host using vSphere Network IO Control to maintain performance and control port costs.

He points out that a VSAN with 16 core host could use up 2 cores (in reality I’m not seeing this in my lab). I would question, what his thoughts on Nutanix (I’ve heard as many as 8 vCPU’s for the CVM?). There is always a tradeoff when offloading storage (or adding fancy features like inline dedupe). I will agree When you are paying for expensive Oracle, SQL and Datacenter licenses by the socket, anything that robs CPU can get expensive. That is why VSAN was designed to be lightweight on CPU, was placed in the kernel, and does not include other vCPU sucking features like Compression and Dedupe. Considering CPU power seems to be the new benchmark of licensing, keeping this under control is key if VSAN is going to be used for business critical applications.

I think he missed the point of Application centric virtual machine storage. It is not about having a single container to put all the virtual machines in. Its about being able to dynamically assign policies to virtual machines. Its about being able to have applications reach into VMware, and using VASA and the native API’s to define their own striping and mirroring, and caching policies on the fly. An early example of this is VMware View automatically defining and assigning unique policies for linked clones and replica’s that are optimized based on their IO and protection needs. Honestly it wouldn’t take much to layer on a future storage DRS, that added striping or caching based on SLA enforcement (And realistically its something you could hack together with with power-shell if you think about it).

His final sendoff seems an attempt at putting VSAN in a SMB discount box.

“The product itself is less mature, unproven in a wide cross-section of production data centers, and lacking core capabilities needed to deliver the reliability, scalability, and performance that customers require.”

I feel that this is a bit harsh for a product that can define quadruple mirroring of a VM or VMDK, can strike close to a million read IOPS on a cluster (and a few hundred thousand write), can handle 16 node clusters, can scale almost as well as a flash array at VDI.

Set Brocade FC ports to Loop mode

So you want to do a small VMware cluster, and you don’t need Fibre Channel switches. By default most array’s and HBA’s are in point to point mode (used for switches). You will want to setup Loop mode in both your Array (In my HUS this is under the FC Port config). Next up if you have brocade HBA”s they likely have some ancient 3.0 firmware that does not support loop mode. Here’s how to upgrade your HBA’s, and how to set the port mode’s (make sure to set it for BOTH ports on the HBA).

esxcli software vib install -d /tmp/

cd /opt/brocade/bin
./bcu port –topology 1/0 loop
./bcu port –disable 1/0
./bcu port –enable 1/0
./bcu port –topology 1/1 loop
./bcu port –disable 1/1
./bcu port –enable 1/1

Sub 20K Arrays like a HUS 110 that can support up to 4 hosts, make for a great storage option for the discerning SMB or remote office. Down the road you can always add a switch, so it gives a nice flexible middle zone between using direct SAS, and 10Gbps iSCSI. Also this is useful if you have a business critical application and want dedicated target queue’s and really simple troubleshooting and lower latency.

VSAN Flexability for VDI POC and Beyond

Quick thoughts on VSAN flexibility compared to the Hyper Converged offerings, and solving the “how do I do a cost effective POC, —> Pilot —> Production roll out?” without having to overbuild or forklift out undersized gear.

Traditionally I’ve not been a fan of scale out solutions because they force you to purchase storage and compute at the same time (and often in fixed ratio’s). While this makes solving capacity problems easier (Buy another node is the response to all capacity issues) you often end up with extra compute and memory to address unstructured and rarely utilized data growth. This also incurs additional per socket license fees as you get forced into buying more sockets to handle the long tail storage growth (VMware, Veeam, RedHat/Oracle/Microsoft). Likewize if storage IO is fine, your stuck still buying more to address growing memory needs.

Traditional modular non-scale out designs have the problem in that you tend to have to overbuild certain elements (Switching, Storage Controllers or Cache) up front, to hedge against costly and time consuming forklift upgrades. Scale out systems solve this, but the cost of growth can get more expensive than a lot of people like, and for the reasons listed above limit flexibility.

Here’s a quick scenario I have right now that I’m going to use VSAN to solve and cheaply scale through each phase of the projects growth. This is an architects worst nightmare. No defined performance requirements for users and poorly understood applications, and a rapid testing/growth factor where the spec’s for the final design will remain organic.

I will start with a proof of concept for VMware View for 20 users. If it meets expectations It will grow into a 200 user Pilot, and if that is liked, the next growth point can quickly reach 2000 users. I want a predictable scaling system with reduced waste, but I do not yet know the Memory/CPU/Storage IO ratio and expect to narrow down the understanding during the proof of concept and pilot. While I do not expect to need offload cards (APEX, GRID) during the early phases I want to be able to quickly add them if needed. If we do not stay ahead of performance problems, or are not able to quickly adapt to scaling issues within a few days the project will not move forward to the next phases. The datacenter we are in is very limited on rack space. power or is expensive and politically unpopular with management. I can not run blades due to cooling/power density concerns. Reducing unnecessary hardware is as much about savings on CAPEX as OPEX.

For the Proof of Concept start with a single 2RU 24 x 2.5” bay server. (Example Dell R710, or equivalent Superstorage 2027R-AR24NV 2U).

For storage, 12 x 600GB 10K drives, and a PCI-Express 400GB Intel 910 Flash drive. The intel presents 2 x200Gb LUN’s and can serve 2 x 6 disk disk groups.
A pair of 6Core 2.4Ghz Intel Processors 16 x 16GB DIMMs for memory.
For Network connectivity I will purchase 2 x 10Gbps NIC’s but likely only use GigE as the switches will not be needed to be ordered until I add more nodes.

I will bootstrap VSAN onto a single node (not a supported config, but will work for the purposes of testing in a Proof of Concept) and build out a vCenter Appliance, a single Composer, Connection and security server, and two dozen virtual machines. At this point we can begin testing, and should have a solid basis for testing Memory/CPU/Disk IOPS as well as delivering a “fast” VDI experience. If GRID is needed or a concern it can also be added to this single serve to test with and without it (as well as APEX tested for CPU offload of heavy 2D video users).

As We move into the Pilot with 200 users, we have an opportunity to adjust things. With adding 2-3 more nodes we can also expand the disks by doubling the number of spindles to 24, or keep the disk size/flash amount at existing ratio’s. If Compute is heavier than memory we can back down to 128GB (cannibalize half the dimms in the first node even) or even adjust to more core’s or offload cards. At this point we have the base cluster of 3-4 nodes with disk, we can get a bit more radical in future adjustments. At this point 10Gbps or Infiniband switching will need to be purchased. Existing stacked switches though may have enough interfaces to avoiding having to buy new switches or modules for chassis.

As we move into production and nodes 4-8 use 1000 VM’s and up the benefits of VSAN really shine. If we are happy with the disk performance of the first nodes, we can simply add more spindles and flash to the first servers. If we do not need offload cards, dense TWIN Servers, Dell C6000, or HP Sl2500t can be used to provide disk-less nodes. If we find we have more complicated needs, we can resume expanding with the larger 2RU boxes. Ideally we can use the smaller nodes to improve density going forward. At this point we should have a better understanding of how many nodes we will need for full scaling and have desktops from the various user communities represented and be able to predict the total node count. This should allow us to size the switching purchase correctly.

VMware Expands VSAN supported Controller list

VMware has a 1.0 supported controller list that is starting to shape up. Considering Cisco uses the LSI, this gives us 3 solid vendors to choose from from Day one. Also, AHCI controller support is good as there was previously a nasty bug that caused data loss with them. Hoping for PEX to give us a street date (generally there is a release within a week or two of PEX, so I’m hoping March).

HP HBA H220i
HP SMART Array p420i
Dell PERC H200
Dell PERC H310
Dell PERC H710
LSI 9207-8i
LSI 9211-8i
LSI 9240-8i
LSI 9271-8i
AHCI controllers (AHCI Driver only)

Fun with VSAN Storage Profiles

When VASA and storage profiles first came out, I really thought it was not important or over rated for smaller shops. Now that VSAN has broken my lab free of the rule of “one size performance and data protection fits all” I’ve decided to get a bit creative to demonstrate what all can be done. I’ve included some sample tiers, as well as my own guidance on when to use them for staff. Notice how Gold is not the highest tier (A traditional design mistake in Clouds/Labs). The reason for this is simple. If someone asks for something, I can simply ask them “Do you want that on gold tier” and not end up giving them space reservations, Cache reservations, triple mirroring or striping after they demand gold tier for everything. This is key to reducing space wastage in environments where politics trumps resources in provisioning practices.


VSAN a Call for workloads!

After some brief fun setting up distributed switching I’m starting my first round of benchmarks. I’m going to run with Jumbo’s first then without Jumbo’s. My current testing harness is a copy of View Planner 3.0, and 3 instances if VMware IO analyzer. If anyone has any specific vSCSI traces they would like to send me I’m up for running a couple (Anyone got any crazy oracle workloads?).photo-1


I got the final parts (well enough to boot strap things) on Thursday so the building has begun.

A couple quick observations on the switch, and getting VSAN up and ready for vCenter Server.

NetGear XS712T

1. Just because you mark a port as Untagged doesn’t mean anything. To have your laptop be able to manage on a non-default VLAN you’ll need to setup a PVID (Primary VLAN ID) to the VLAN you want to use for management. Also management can only be done on a single IP/VLAN so make sure to setup a port with a PVID on that VLAN before you change it (otherwise its time for the reset switch).


2. Mac users should be advised that you can tag VLAN’s and create an unlimited number of virtual interfaces even on a thunderbolt adapter. Handy when using non-default VLANs for configuration. Click the plus sign in the bottom left corner of the network control panel to make a new interface, and then select the gear to manage it and change the VLAN.

3. It will negotiate 10Gig on a Cat5e cable (I’m going to go by Fry’s and get some better cables at some point here before benchmarking).

Its trivial to setup a single host deployment.
First create the VSAN.
esxcli vsan cluster join -u bef029d5-803a-4187-920b-88a365788b12
(Alternatively you can go generate your own unique UUID)
Next up find the NAA on a normal disk, and a SSD by running this command.
esxcli storage core device list
Next up add the disks to the VSAN.
esxcli vsan storage add -d naa.50014ee058fdb53a -s naa.50015178f3682a73
After this you’ll want to add a VMkernel for VSAN and add some hosts, but with these commands you can have a one node system up ready for vCenter Server installation in under 15 minutes.

For this lab I’ll be using the vCenter Server Appliance.

After installing the OVA you’ll want to run the setup script. You will need to first login to the command line interface. Mac users be warned, mashing the command key will send you to a different TTY.
The login is root/vmware. From the console run the network setup script. /opt/vmware/share/vami/vami_set_network
It can be run with parameters attached to more quickly setup.
/opt/vmware/share/vami/vami_set_network eth0 STATICV4
After doing this you can login in your browser using HTTPS and port 5480 and finish the setup. Example (