Not Doing DR Is Dumb

This article left me really rather annoyed. TechCrunch are significantly impressed with Squarespace and Fog Creek keeping their systems up during Sandy by carrying fuel to the generator when the fuel pump flooded. Serious effort. Yep. Impressive use of brute force labor. Yep. Really good example of a lack of planning? IMHO. Yep.

So when I first wrote this post I was pretty annoyed. And it was clear from the response that my message was totally missed as I obviously poorly articulated my issues with the post and the situation. That’s totally my fault and I own both the tone and the message. So I am going to rewrite it in the cold light of day and make it absolutely clear what my concerns were and make an attempt to temper the tone. Those of you who know me well know I am forthright. I don’t mince words and to use a common phrase: “call a spade a spade.” In this case I clearly called a spade an axe.

There are two aspects of the post and the situation that annoyed me. The first is the post itself. The post, in my view, celebrates the worst of Operation’s hero culture. Not once in the post did the journalist question the “whys” of the situation or provide any critical analysis of the “hows” of the situation. It seemed resoundingly a celebratory piece. Indeed, whilst I think what the team did is an impressive physical effort, I don’t think it was one that should be celebrated. Considered a cautionary tale, Yes. Celebrated. No. Other’s mileage may vary here. Or perceive that in the broader situation it’s unfair to critique their response. I accept I could have done so in a calmer tone but I don’t accept that I can’t present that critique.

It was also clear to me when I saw the response to the post that I had leap-frogged some of the explanation of the logic that lead me to describe what they did as stupid. I own that lack of exposition and I wanted to try to share it now to help people understand how I got there.

The reason I think this best serves as a cautionary tale stems from the situation itself. Firstly, I was struck by the plan they came up with. It felt somehow … cowboy’ish. I started to try to analyse what risk thinking went into that response and I couldn’t make it add up. This, to me, was a seriously dangerous plan. The potential risk for accident and injury in such a situation is huge. I asked myself would I ever make the decision, in an IT availability situation, to carry diesel fuel up 17 flights of stairs in the dark? Almost certainly no. I think that’s an extremely risky potential approach. Indeed an approach that stood to put my colleagues at risk of physical harm. Indeed when I read about what they had chosen to do it triggered a really visceral response for me.

I did not articulate any of that response in my post and that resulted in it being a somewhat loaded document that focussed on fundamentally the wrong issue: the “why they made that call” not the “how they made that call”. I leapt straight from the “my word that’s a really weird risk decision to make” to “okay they made that decision because clearly they didn’t have DR.” With that leap and without that context it was a poorly articulated response that was insufficiently nuanced.

I still believe it is dumb not to have DR. And by “have DR” I mean something. A plan even. Not a total, automated DR failover site by default1 but something on the spectrum: cold, cold’ish, warm, hot, etc where you’ve thought about the risk and made an appropriate investment. I don’t think, for any business which has a baseline of “we have customers who store their data with us and pay money for that service”, that making a risk decision that says “We aren’t going to do ANY DR” is rational or reasonable. So I think that every business with that baseline should have something. I also don’t think those baseline services, backups and a plan for example, are hard to implement.

So I come back to my response with hopefully some more context. We have a team of people here running a service. They chose to make a risk decision to enact a specific plan. I disagree with that risk decision and suggest it was dangerous and, entirely in my view, a poor decision. Hence I see this whole situation as a cautionary tale and not one that anyone should celebrate.

  1. Although you may, if the risk equation works for you, have a fully-fledged hot site with automated failover. That may be a perfectly reasonable investment for your business. [return]
comments powered by Disqus