Skip to content

Archive for

PSA: Developers and SQL admins do not understand storage

Thin Provisioning is one of my favorite technologies, but with all great technology comes great responsibility.

This afternoon I got a call from a customer having an issue with a SQL backup. They were preparing a major code push and were running a scripted full SQL backup to have a quick restore point if something goes wrong.
I was sent the following

10 percent processed.
20 percent processed.
30 percent processed.
40 percent processed.
50 percent processed.
60 percent processed.
70 percent processed.
80 percent processed.
90 percent processed.
Msg 64, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available.)

The server had frozen from a thin provisioning issue, but tracing through the workflow that caused this highlighted a common problem of SQL administrators everywhere. Backups where being done to the same volume/VMDK as the actual database. For every 1GB of SQL database there was another 10GB of backups, wasting expensive tier 1 storage.

The Problem:

SQL developers LOVE to make backups at the application level that they can touch/see/understand. They do not trust your magical Veeam/VDP-A. Combined with NTFS being a relatively thin unfriendly file system (always writing to new LBA’s when possible) this means that even if a database isn’t growing much, if backups get placed on the same volume any attempt at being thin even on the back end array is going to require extra effort to reclaim. They also do not understand the concept of a shared failure domain, or data locality. If left to their own devices put the backups on the same RAID group of expensive 15K or flash drives, and go so far as to put it on the same volume/VMDK even if possible. Outside of the obvious problems for performance, risk, cost, and management overhead, this also means that your Changed Block Tracking and backup software is going to be baking up (or having to at least scan) all of these full backups every day.

The Solution:

Give up on arguing with them that your managed backups are good enough. Let them have their cake, but at least pick where the cake comes from and goes.
Create a VMDK on a separate array (in a small shop something as cheap as a SATA backed Synology can provide a really cheap NFS/iSCSI target for this). Exclude this drive from your backups (or adjust when it runs so it doesn’t impact your backup windows).
Careful explain to them that this new VMDK (name the volume backups) is where backups go.
Now accept that they will ignore this and keep doing what they have been doing.
In Windows turn on file screens and block the file extensions for SQL backup files.
Next turn on reporting alerts to email you anytime someone tries to write such a file, so you’ll be able to preemptively offer to help them setup the maintence jobs so they will work.

Why VDI?

I was reading Justin Paul Justifying the Cost of Virtual Desktops: Take 2 and had some thoughts on where he see’s the cost model of VDI. I know Brian Madden has talked at great length of all the false cost models for VDI that exist (and I’ve seen it in the field) .

1. I Agree with Justin on power with some narrow changes. Unless its a massive deployment, another 4 hosts in the data center isn’t going to break the bank. Unless your forcing people to use thin clients, your also not saving anything real on the client side (and certain thing (Lync, MMR etc) require Windows Embedded clients at a minimum anyways. The only case where I’ve successfully made this was a call center that was 24/7 and handled disaster operations in Houston. After IKE everyone learned how hard it is find fuel, anything that reduces the generator and battery backup budget actually has real implications.

2. Justin does make good points about SA and keeping up with the Windows OS releases on physical machines is just as expensive as VDA. Sadly this is only true if companies are not just standardizing on Windows 7 and running it into the ground for the next 5-7 years. Hey it worked for XP right?

3. While I agree a ticket system helps track time spent restoring machines etc, no one makes non-billable IT resources track time to the level of detail and meta tags/search to make building an in house ROI model possible. The best luck I’ve had is having people do a week survey with 15 minute intervals broken down is as close as you’ll get in house IT to do. Its painful to get even that done. Unless your desktop support is outsourced (And you have access to their reports!) This is going to always be sadly a fuzzy poorly tracked cost. I’d argue VMware Mirage (or equally good application streaming/imaging system) can provide a lot of the opex benefits without the consolidation and other pro/cons of VDI. VDI extends beyond imaging and breakfix. Its about mobility, security, and

4. People work from home today with VPN, and Shadow IT (LogMeIn etc). The ability to do this isn’t what you sell, its the execution and polish (Give a sales person a well maintained, PCoIP desktop and they will grab their iPad and never come back to the office). Its the little things (like Thin Print letting them print to their home PC). Ultimately it isn’t the “occasional” or snow day remote users that sell VDI. its the road warriors and branch offices (who are practically the same thing with as little attention as they get from central IT typically).

Why software storage is far less riskey to your buisness

I was talking to a customer who was worried about the risks of a software based storage system, but thinking back I keep thinking of all of the risks of buying “hardware” defined storage systems. Here’s a few situations over the years I’ve seen (I’m not picking on any of these vendors here, just explaining situations with context).

1. Customer buys IBM N-Series. Customers FAS unit hits year 4 of operation. Customer discovers support renewal for 1 year will cost 3x buying a new system. Drives have custom firmware and can not be purchased 2nd hand in event system needs emergency life support as tier 2 system.

Solution: Customer can extend support on HP/Dell Servers without ridiculous markups. StarWind/Vmware VSAN and other software solutions don’t care that your in “year 4”.

2. Customer has an old VNXe/VNX kit. Customer would like to use flash or scale up the device with lots and lots of drives. Sadly, The flare code running on this was not multi-threaded. Customer discovers that this critical feature is coming out but will require a forklift. Customer wonders why they were sold an array with multi-core processors that were bragged about when the core storage platform couldn’t actually use them. Flash storage pool is pegging out a CPU core and causing issues with the database.

Solution: Software companies want everyone on the new version. Most storage/software companies (VMware VSAN, Starwind etc.) include new features in the new version. Occasionally there will be something crazy good thats a added feature, but at least your not looking at throwing away all the disks (and investments in controllers) you’ve made just for a single much needed feature.

3. Customer bought MD3000i. One year later VMware puts out a new version, and fail over quits working on the MD3000i. Dell points out the device is end of support and LSI isn’t updating it. Customer gets sick of all path down situations and keeps their enviroment on an old ESXi release, realizing that their 2 year old array is an albatross.
Discussions of sketchy NFS front end kludge come up but in the end the customer is stuck.

Solution: Had another customer have this happen (Was with Datacore) but this customer was running it on COTS (DL180+´╗┐MSA’s stacked). Customer could easily switch to a different software/storage vendor (Starwind etc). In this case they were coming up on a refresh so we just threw on CentOS and turned the thing into a giant Veeam Target.

Software based storage fundamentally protects you from the #1 unpredictable element in storage. The vendor….