I was recently asked to talk through how to make change in heavily corporate, ITIL and/or regulated environments. I threw together some rough thoughts that I figure are worth repeating here.
What I tell people when they are facing down the internal nay-sayers and the change resistant is: “The data and the customer trumps everything.” Despite impressions there is nothing in ITIL that says “change should be slow.” Or cumbersome. ITIL is about providing audit, insight and risk management in your IT services/operations. If you have all that and you ship 20 times a day then that isn’t actually “bad”. Despite what some service and change managers might say.1
However, as I am sure you’ve discovered there are two things that you need to overcome to get there:
- Change theatre.
- The “burnt my hand on the stove” effect.
The first is pretty simple and translates into: “We’re a big company and we should be careful, planned, thoughtful and risk adverse.” This results in change requiring 30 people to approve being considered “thoughtful”.2 This is all about people covering their arses too. “If I get everyone to approve then I am CYA’ed if it all goes wrong because Bill, Jane and Claire all said it was good to go.”
The second is closely linked to the first but is more subtle and deeply cultural. Almost every big company I’ve worked in as a strong blame culture. When things go wrong “management” always wants someone to blame: a SysAdmin, a Dev, the vendor, an outsourced partner. So anytime something goes wrong everyone does the “It’s not my fault dance”. This means everyone is adverse to making change because they are terrified of getting burnt(/fired) when/if something goes wrong.
So how to overcome it? Well it’s not easy. So much of this thinking is deeply ingrained, cultural experience for the participants. It’s myth.
Firstly, I recommend a showing by doing approach. Make a deal with your
change management folks: If you can prove a change can be made safely
and repeatedly multiple times you should be able to back down on the
change paperwork required for that change. If you use a tool like Puppet
then demonstrate the before state, the proposed change and the resulting
state via the Puppet code, via reports from
--noop or simulation runs,
and from the results of the change on testing/staging systems. Produce a
graph showing the success rate: “We’ve made this change 400 times and
it’s only failed once because we’re automated, monitored and we can
remediate errors fast with minimum impact to the service. And here’s the
data to prove it.”
Secondly, declare your shop a blame-free zone. Post-mortem everything and start every session with: “This is not about blame. I don’t want to hear, know or care about who’s fault it is. All I want to hear is: what went wrong, how did we fix it, how can we stop it happening again in the future.” Everything should be totally emotion neutral. Even people mistakes. See this for a much larger treatment of this.
Thirdly, hammer home to your change management folks what “risk” actually means. This recent post from Jez Humble is also a good reference. Not every one of your services is priority one and even those have varying levels of care factor about them. If change management is putting excessive conditions on applications, services, etc that are low risk (especially if this has been done knee-jerk in response to a failure) then demonstrate the actual quantifiable impact and reveal just what a waste of resources it is to treat these low risk applications as if they are critical. Additionally, explain the cost. If it takes six people, five days to deploy a change then this is a pretty expensive exercise. If the potential loss for this is not that big then this is overkill.
Lastly, and I think most importantly, identify the customer and their needs. Find out who the customer is, understand their actual service requirements and their required speed to market. Find out if they are unhappy and if so what about: “It takes so long to deploy a marketing campaign we lose half our customers/momentum/opportunity to respond to a competitor.” Document this and then ask them about what they consider important and what they would sacrifice to achieve better results in certain domains. CAP theorem works quite well as an analogy here. Take this data back to the change management people and then use it to calibrate what your change management/operations actually looks like: “Look they don’t actually care about this site but this site over here they change daily. Can we focus on it rather than that thing.” No one wants the business to yell at them or be unhappy with them about results. What we want is the business to say we’re awesome and consider us worthy of massive bonuses. Explain to your colleagues that with the right protocols, process, automation and without this massive overkill that you can make the business happy, make your lives easier and focus on the interesting problems.
This is not going to be easy. Especially in regulated environments. As an aside, if they fall back on regulations then go read those yourself. Odds are they haven’t. Odds are too that they are using a rote response, some weird interpretation, historical/myth beliefs: “it’s always been like this because of PCI DSS”. But if you persevere then having the good results and the data to back up your claims will get change done.
That being said sometimes an organization is too far gone, too toxic or too bogged down to ever change. If that’s the case then vote with your feet and walk. People with good DevOps skills are in killer demand.