The Evolution of DevOps to DevTools
‘DevOps’: such a simple word to convey so much meaning. What is DevOps? Ask 5 people and you’ll probably get 5 different answers. It is a mysterious and ephemeral ideology - many people will tell you it’s a great idea, but relatively few will really understand it. Like so many powerful ideas, at its heart DevOps is a very simple concept; but there is enormous complexity in its application, because it must be applied by organizations, not individuals, and not even just teams. The application of these concepts is often beleaguered by resistance to change, for many reasons - rational, emotional and inertial.
If this sounds familiar, that may be because there is another, similar, methodology that has revolutionized the software industry over the last decade or two: Agile Development. If ‘Agile’ used the same naming convention as ‘DevOps’, we might call it ‘BizDev’; it’s focused on the interaction between the business part of the organization and the development part. Agile and DevOps are reflections of an even more fundamental concept: that entrenched silos are harmful to an organization, and that success is best achieved through collaboration, with everyone working towards the goal of the organization, rather than just the goal of some local subset of that organization. This core emphasis on working towards organizational success rather than personal or team success is not new - these things are evident in the Toyota Production System, the Theory of Constraints and other Lean Manufacturing methodologies that were established decades ago.
So if DevOps is so much like Agile, why is it so misunderstood? Agile is actually relatively mature now: the Agile Manifesto was signed in 2001. By comparison, the term ‘DevOps’ itself was coined in late 2009. So in fact Medidata was actually a very early adopter - I believe our DevOps team was originally formed at around the same time (end of 2009) to help ship our early Ruby on Rails apps using Amazon’s cloud, although I’m not sure when the ‘DevOps’ label was acquired. So did we create a team to ‘do DevOps’, or did we see a ‘DevOps team’ and label it as such? I think the answer is ‘both’ - the dev and ops specialists involved in those apps’ early lives were collaborating very closely and actually implementing a DevOps methodology before the term existed, and then once we became aware of the term it was applied to the ops team.
Let’s take a step back for a moment, and consider what it means for an organization to have ‘a DevOps team’, and specifically what it has meant for Medidata.
To start with, a software company’s team structure looks roughly like this:
This is representative of just a few people trying to get started. Everybody is responsible for everything - everyone is on call for production issues, answering the phones, selling to customers and watching the bottom line. Is this DevOps? Absolutely, and more than that. Everyone is focused on the same goal.
As the organization grows, we get more specialization:
Here, we have distinct Dev and Ops teams, but there’s a substantial crossover of skills. Devs have some visibility of and participation in Ops, and vice-versa. Is this DevOps? Of course, this is the ‘classic’ example of DevOps. When you read or hear about ‘lean startups’ and promising small businesses that are demonstrating the value of DevOps, this is the kind of team structure they’re probably dealing with. The reports of DevOps in practice (blogs, tweets, etc.) are disproportionately represented by this category, where applying DevOps is mostly pretty trivial.
As the organization grows more, often the above state progresses to something more like this:
Here we can see Dev and Ops are becoming silos of knowledge, with no major cross-functional efforts. Dev creates and updates applications, and Ops deploys them and keeps them running. It’s a very obvious next step from the above - as the organization scales up, we split the responsibilities into teams. Dev gets more capacity to add features, and Ops gets better at handling the increased load. What could possibly go wrong?
The typical next stage is this:
Dev keeps on churning out those changes, but no longer has a good idea of how the apps are running, and frankly has few reasons to care - they’re incentivized to keep adding new features to the product. This means the apps don’t actually operate that well, with lots of unseen or ignored errors in the logs, declining performance and a frequent need for interventions like app restarts. And since Dev doesn’t really understand the production environment or load that well, bugs start to be more common in production, and Ops is left to pick up the pieces.
Ops is only really incentivized to keep things running predictably, so in order to try to restore some stability, Ops throws up a wall of process. Now Dev can’t ship new code without providing large quantities of information to Ops to help them deal with the operational problems. But Ops doesn’t really know what they don’t know, so the information they ask for isn’t actually that useful, and to Dev the whole process is clearly a waste of time and they just fill in the forms with whatever information will get Ops to deploy without making life any more difficult. Dev and Ops become adversaries. It’s all quite tragic, but as long as the organization’s competition is doing the same thing, success can still be achieved this way.
For many large organizations, this is their steady state. This is ‘just how things are done’ when you’re ‘an enterprise company’. In heavily regulated industries like banking, or indeed pharmaceuticals, the regulations have actually been written with the implicit assumption that this is how organizations are structured, making it even harder to break out of this mold.
The above is a slight caricature of reality, but Medidata has looked a lot like this in the past; I expect some would say it still does, but I’m pretty confident things are improving. So to try to avoid these problems with our first Ruby on Rails apps, a new ‘DevOps team’ emerged, mostly from within engineering, with a much more collaborative approach. It looked a lot like this:
This should look familiar - it’s structurally identical to the second diagram above describing small businesses, with Dev and Ops collaborating. The only difference is that Ops has been renamed as DevOps. It worked tremendously well at Medidata - those product teams performed wonderfully well, combining some of the most cutting-edge technologies for rapidly developing high-quality web applications (Ruby on Rails and Amazon Web Services) with some of the most cutting-edge deployment and monitoring tools (Capistrano, Chef, New Relic), and collaborating very closely with a DevOps methodology to achieve commercial success.
So DevOps worked well here. But these teams were pretty much operating as a small business within Medidata, and DevOps at that small scale is almost a default state now. One understandable response to this success was a desire to extend DevOps practices to much more of the organization, and so since we now had a team named ‘DevOps’, that team took on more responsibilities for handling what was essentially just a more integrated kind of operations, with more automation using the latest tools.
Meanwhile, there was still an IT Operations group handling Ops for the other apps that DevOps wasn’t covering. At some point, the decision was made to merge these teams, creating a much larger ‘DevOps’ team that looked something like this:
This gave us a sort of DevOps wrapper around the Ops group. DevOps and Ops were both part of the hosting organization, and DevOps started to be more incentivized towards traditional Ops goals: maintaining uptime and lowering costs. The relationship with Dev became a little less collaborative and more service-based. Now the Ops team is part of the picture, but in this situation there is still no overlap between Dev and Ops - DevOps is still the interface between Dev and Ops. It’s also worth noting that as far as Dev is concerned, it looks like nothing has changed - they’re just talking to DevOps like they always did. All that’s changed is the implementation of the ‘DevOps service’, while the interface remains the same.
This situation did not last very long, and the next transition was to move most of the original DevOps team back to the Engineering organization, leaving a state like this:
This worked better than the previous state, as the goals of Dev and DevOps were more closely aligned, but really very little else had changed - DevOps was still the medium through which collaboration between Dev and Ops occurred. Dev and Ops still had their own silos of knowledge. So in order to effectively share operational information with Dev, DevOps needed to have deep knowledge of Ops; but in order to effectively collaborate with and inform Ops about the applications that needed to be run, DevOps needed to have a deep knowledge of Dev. It is very difficult to scale a team like this up to act as an effective conduit for collaboration around all Dev and all Ops activities in a large organization.
So, what could we do about this? The basic problem was that DevOps is not an activity that can be siloed any more than Agile can be: collaboration is not something you can outsource. Dev and Ops needed to come back together and share their expertise, working towards the common goal of the organization as a whole. The incentives should be set to encourage this collaboration - most obviously, Dev should be incentivized to reduce the operational costs of their changes. But if it was really that straightforward, we wouldn’t have so many organizations falling into these unproductive patterns. There are a few organizations that really make DevOps methodologies work well for them, and their structure is much more like this:
This is the current state here at Medidata. Here we have a decent crossover between Dev and Ops, with the addition of a tools team to help them to collaborate. The team responsible for these tools is often named ‘DevTools’ in successful, DevOps-embracing companies. DevTools acts as a catalyst for the collaboration between Dev and Ops, fostering that collaboration without needing to directly participate at every point. The tools can give Dev much better visibility and authority over their operational environment, which can lead to much more responsibility. Ops can happily hand over that responsibility for managing details of apps for which they are not experts, and can instead concentrate on ensuring the availability of the infrastructure upon which Dev builds the apps. In fact, in this case the ‘wall of process’ is still there, but it is automated and is much less of an impediment to progress.
In reality, the DevOps team had been somewhat split internally between Dev and Ops for some time - even before the merge with IT Operations. DevOps itself had grown and made the transition from ‘tiny startup’ to ‘small business’ in its internal structure and had closely-collaborating Dev and Ops subteams, with the Dev subteam working mostly on the tools, and acting as a proxy for the wider Dev team to provide engineering expertise when necessary for operations.
As the DevOps team grew and changed, there had been an increasing misunderstanding of DevOps’ role in the organization. Part of the reason for this is that the team was called ‘DevOps’, which conveys an unhealthy message. The DevOps team cannot be solely responsible for the implementation of DevOps practices at Medidata, any more than a hypothetical Agile team could be solely responsible for the effective collaboration between Biz and Dev. DevOps, like Agile, is part of everybody’s work activities; both terms encapsulate an ongoing process of collaboration between functional teams. So if you’re part of Dev, developing your code and then handing it off to the ‘DevOps team’ to take care of everything needed to deploy, run, monitor and maintain your app in production does not mean you are ‘doing DevOps’, and it’s important that that’s clear.
In the community of DevOps enthusiasts and advocates outside Medidata, having a ‘DevOps team’ with ‘DevOps engineers’ is now quite well recognized as an antipattern for successful implementation. This means that if we want to hire people from that community, advertising for a ‘DevOps engineer’ for the ‘DevOps team’ is likely to deter precisely the people we most want to attract.
So, the DevOps team is now deprecated! Devs, please get to know your new friends in Ops, and DevTools will do their best to make everybody’s lives easier and more productive.