Dev-Oops: Why DevOps efforts fail
The main goal of DevOps is quite simple: ship software updates frequently, reliably, and with better quality. This goal is somewhat “motherhood and apple pie,” since almost every organization will agree that they want to get there. Many will tell you they’ve already embarked on the DevOps journey by following some commonly followed frameworks, such as “CALMS.”
However, very few will express complete satisfaction with the results. After speaking to 200+ DevOps professionals at various stages of the adoption lifecycle, we found that organizations generally fall in one of three categories:
We were most interested in groups two and three since they were actually in the middle of their DevOps journey. When asked to better explain the challenges and roadblocks, here is what we found:
- 68% said that the lack of connectivity between the various DevOps tools in their toolchain was the most frustrating aspect
- 52% said that a large portion of their testing was still manual, slowing them down
- 38% pointed out that they had a mix of legacy and modern applications, i.e. a brownfield environment. This created complexity in terms of deployment strategies and endpoints, toolchain, etc.
- 27% were still struggling with siloed teams that could not collaborate as expected
- 23% had limited access to self-service infrastructure
- Other notable pain points included finding the right DevOps skill set, difficulty managing the complexity of multiple services and environments, lack of budget and urgency, and limited support from executive leadership
Let us look at each of these challenges in greater detail.
#1: Lack of connectivity in the DevOps toolchain
There are many DevOps tools available that help automate different tasks like CI, infrastructure provisioning, testing, deployments, config management, release management, etc. While these have helped tremendously as organizations start adopting DevOps processes, they often do not work well together.
As a classic example, a Principal DevOps engineer whose team uses Capistrano for deployments told us that he still communicates with Test and Ops teams via JIRA tickets whenever a new version of the application had to be deployed, or whenever a config change had to be applied across their infrastructure.
All the information required to run Capistrano scripts was available in the JIRA ticket, which he manually copied over to his scripts before running them. This process usually took several hours and needed to be carefully managed since the required config was manually transferred twice: once when entered into JIRA, and again when he copied it to Capistrano.
This is one simple example, but this problem exists across the entire toolchain.
Smaller organizations get around this problem by writing custom scripts that glue their toolchain together. This works fine for a couple of applications, but quickly escalates to spaghetti hell since these scripts aren’t usually written in a standard fashion. They are also difficult to maintain and often contain tokens, keys and other sensitive information. Worse still, these scripts are highly customized for each application and cannot be reused to easily scale automation workflows.
For most serious organizations, it is an expensive and complex effort to build this homegrown “DevOps glue,” and unless they have the discipline and resources of the Facebooks and Amazons of the world, it ultimately becomes a roadblock for DevOps progress.
Continuous Delivery is very difficult to achieve when the tools in your DevOps toolchain cannot collaborate and you manage dependencies manually or through custom scripts.
Challenge #2: Lack of test automation
Despite all the focus on TDD, most organizations still struggle with automating their tests. If the testing is manual, it is almost impossible to execute the entire test suite for every commit, becoming a barrier for Continuous Delivery. Teams try to optimize this by running a core set of tests for every commit and running the complete test suite only periodically. This means that most bugs are found later in your software delivery workflow and are much more expensive to find and fix.
Test automation is an important part of the DevOps adoption process and hence needs to be a top priority.
Challenge #3: Brownfield environments
Typical IT portfolios are heterogeneous in nature, spanning multiple decades of technology, cloud platform vendors, private and public clouds in their labs and data centers, all at the same time. It is very challenging to create a workflow that spans across these aspects since most tools work with specific architectures and technologies. This leads to toolchain sprawl as each team uses the toolchain best serving their needs.
The rise of Docker has also encouraged many organizations to develop microservices-based applications. This has also increased the complexity for DevOps automation since an application now needs hundreds of deployment pipelines for heterogeneous microservices.
Challenge #4: Cultural problems
Applications evolve across functionals silos. Developers craft software, which is stabilized by QA, and then deployed and operated by IT Operations. Even though all these teams are expected to work together and collaborate, they often have conflicting interests.
Developers are driven to move as fast as they can and build new stuff. QA and Release management teams are driven to be as thorough as possible, making sure no software errors can escape past their watchful eyes. Both teams are often gated by SecOps and Infrastructure Ops, who are incentivized to ensure production doesn’t break.
Governance and compliance also plays a role in slowing things down. Cost centers are under pressure to do more with less, which leads to a culture that opposes change, since change introduces risk and destabilizes things, which means more money and resources are required to manage the impact.
This breakdown across functional silos leads to collaboration and coordination issues, slowing down the flow of application changes.
Some organizations try to address this by making developers build, test and operate software. Though this might work in theory, developers are bogged down by production issues, and a majority of time is spent on operating what they built last month as opposed innovating on new things. Most organizations try to get all teams involved across all phases of the SDLC, but this approach still relies on manual collaboration.
Automation is the best way to broker peace and help Dev and Ops collaborate. But as we see in other challenges, ad-hoc automation itself can slow you down and introduce risk and errors.
Challenge #5: Limited access to self-service infrastructure and environments
For many organizations, virtual machines and cloud computing transformed the process of obtaining the right infrastructure on-demand. What previously took months could now be achieved in a few minutes. IaaS providers like AWS have hundreds of machines with flexible configurations and many options for pre-installed OS and other tools. Tools like Ansible, Chef, Puppet help represent infrastructure-as-code, which further speeds up provisioning and re-provisioning of machines.
However, this is still a problem in many organizations, especially those running their own data centers or those that haven’t embraced the cloud yet.
We Need more from DevOps
A popular DevOps framework describes a CALMS approach, consisting of Culture, Automation, Lean, Measurement and Sharing. The DevOps movement started as a cultural movement, and even today, most implementations focus heavily on culture.
While culture is an important part of any DevOps story, changing organizational culture is the hardest thing of all. Culture forms over a period of time due to ground realities. Ops teams don’t hate change because they are irrational or want to be blockers. Over the years, they’ve taken the heat every time an over-enthusiastic Dev team tried to fast-track changes to production without following every step along the way.
Seating them with the developers might help make the work environment a little friendlier but it doesn’t address the root cause, no matter how many beers they have together.