July 9, 2021
A Guide to Measuring DevOps Success & Proving ROI
As legendary management teacher and thinker Peter Drucker once said, “You can’t manage what you can’t measure.”
Later, Drucker would adjust the wording to be more proactive: “If you can’t measure it, you can’t improve it.”
Either way, variations of the famous quote have been co-opted by marketers and data lovers of all kinds to underscore the importance of tracking metrics.
Still, that piece of advice resonates especially hard with anyone who has ever implemented a DevOps strategy within their organization. The process is all-consuming in the early stages and requires everyone from the C-suite to the intern to be fully invested in getting it right.
It’s a lot of effort all around, which means measuring DevOps success is critical for understanding its real-world impacts (i.e., productivity, revenue, customer loyalty, etc.). And, of course, keeps your team invested in the long-term effort by pointing toward specific accomplishments.
Unfortunately, it seems that few people can agree on not only how to measure DevOps success, but also whether you can even prove DevOps ROI.
Others take a page from Stephen Covey’s 7 Habits of Highly Effective People, embracing the oft-cited quote, “begin with the end in mind” — a reminder to stay focused on the big picture.
In this article, we’ll touch on why defining and measuring DevOps success metrics is possible, though not without its fair share of challenges.
Measuring DevOps Success & Proving ROI Starts with Defining Objectives & Outcomes
We measure to determine our progress toward specific goals and objectives.
The goals tend to be somewhat subjective and focused on achieving quality improvements. The objectives are, just as they say, much more objective and empirical.
Perhaps the biggest barrier to understanding how to measure DevOps success is that the primary goal of the initiative is continuous improvement. This admittedly sounds a lot like setting infinity as a target sales quota.
One of the more common DevOps definitions offers an explanation that may also contribute to the confusion of measuring success:
“DevOps is the delivery of application changes at the speed of business. By replacing the infrequent updating of monolithic blocks of code with very frequently updated microservices, DevOps enables dramatic acceleration of improvement.”
Given that the “speed of business” isn’t an official way to measure speed, how fast are we talking?
According to the 2018 State of DevOps Report, “…elite performers are optimizing lead times, reporting that the time from committing code to having that code successfully deployed in production is less than one hour, whereas low performers required lead times between one month and six months,” estimating that, “the elite group has 2,555 times faster change lead times than low performers.” An article in ZDNet from 2015 reported, “How Amazon handles a new software deployment every second.”
Change at the speed of business.
In other words, while some people may have trouble working without a finish line on the horizon, continuous improvement success means never stagnating or backsliding — every day should be better than the last.
Superior Business Outcomes
Superior Business Outcomes are the ultimate end game for every process, including DevOps.
Microsoft CEO Satya Nadella’s now-famous quote, “Every company is a software company,” is a great way to describe the direct relationship between a company’s ability to quickly improve software and its increased profits.
Some DevOps goals will relate to the outcomes achieved through the software continuously improved by DevOps processes.
The components of continuous improvement include velocity, quality, performance, and outcomes. Other goals may seek to resolve known problems within the organization. Whatever goals and objectives a given organization identifies, it is critical to connect them with the value they bring to the business on the whole.
Why You Should be Measuring DevOps?
Following Simon Sinek’s advice to “Start with Why,” we ask why you’d want to be measuring DevOps?
The Heisenberg Uncertainty Principle teaches us that the mere fact that something is being observed affects the thing being observed.
More than anything else, measuring DevOps success is about tracking the progress that comes from this collaborative effort.
Implementing DevOps processes requires significant time, attention, and money. It’s also a huge ask for team members who may not feel as invested in taking on this significant commitment.
Anyone who makes an investment does so in order to receive an identifiable return on investment (ROI).
This is the most tangible, significant reason for establishing and executing a discrete set of metrics to determine the return achieved from the DevOps investments.
Arguably, one of the best reasons to measure DevOps ROI is best explained by DORA:
“Traditionally, IT has been viewed as a cost center and, as such, was expected to justify its costs and return on investment (ROI) up front. However, IT done right is a value driver and innovation engine. Companies that fail to leverage the transformative, value-generating power of IT risk being disrupted by those who do.”
The organization also recommends that organizations define what a positive DevOps ROI looks like within their own organization, based on a simple three-level IT performance scale:
- High IT Performers. These organizations stand to gain the biggest returns by offering superior software delivery. This includes productivity gains, happier employees, and more repeat customers.
- Medium IT Performers. Those in this mid-point will reap the most benefits by eliminating technical debt and prioritizing speed and value.
- Low IT Performers. This group should focus on opportunities for improvement by addressing low-hanging fruit and setting measurable goals.
High and Low don’t necessarily point toward technical skills or the strength of the DevOps strategy. Instead, it gives non-technical stakeholders the ability to connect DevOps to something tangible — “what does my software need to do?” — making it easier to justify the initiative.
From the C-suite perspective, it’s a simple framework for quickly sizing up your DevOps readiness and from there, defining a set of DevOps success metrics that represent improvement.
You Should Be Measuring DevOps by These Elements
The DevOps philosophy contains three core components: people, process, and technology. These components work together to speed up the entire product lifecycle from ideation to deployment, testing, and starting the loop all over again.
Here’s a look at how to measure DevOps success within these three areas:
People-related DevOps metrics tend to focus on productivity and work quality. You might track things like response times, failure rates, and the time it takes to complete a task.
Keep in mind that the aim isn’t replacing people with automation. It’s to set up the processes that help humans work smarter, freeing up more time for meaningful work.
Perhaps unsurprisingly, your “people metrics” are likely to be the least consistent and often the most difficult to obtain. So you might want to work on these first to set a benchmark and track progress through all these changes.
Process metrics track the process of feedback collection, implementation, and deployment of upgrades. These include quality and performance improvements over the previous iteration, which are critical DevOps success metrics — but they’re also highly subjective.
Customer feedback surveys can help you tie numeric values to quality, performance, and the overall experience with the product.
However, the downside of this is that customers interpret scoring criteria in their own way. For example, two people might have very different ideas of what qualifies as a three or a five on a five-star rating system.
Development-to-Deployment time is more objective and measurable, and proves most useful when combined with velocity, relevance, effectiveness, efficiency, and smoothness of flow.
Still, it’s worth noting that combining multiple sources of feedback — both objective and subjective — is the key to getting the holistic view that leads to improvement.
Technology metrics measure hardware, software, and service functions.
System uptime is critical. Software failure rate connects directly to development and deployment metrics. It’s pointless to be moving fast when the failure rate is too high.
DevOps Velocity Metrics
After establishing a solid “people-process-technology” foundation, you’ll want to look at velocity and performance quality.
Increasing competitive pressure drives an ever-increasing need to achieve continuous rapid improvement.
Stackify offers up a comprehensive list of DevOps speed metrics that measure the achievement of your high-speed iterations.
Here’s a quick rundown of the metrics they’ve included in the post.
- Deployment frequency. Tracking the number deployments performed over a certain period is an objective, easy-to-track metric that can help you determine if you’re on the right path. Here, your goal is always to deliver smaller improvements more often.
- Change volume. With each iteration, you’ll also want to look at the volume of changes made in response to user feedback.
- Deployment time. Reducing the amount of time devoted to each stage in the DevOps lifecycle contributes to increases in overall speed. Clocking the actual time it takes operations to deploy new improvements will help you understand operations’ performance and may play a role in your decision to apply automation to certain tasks.
- Lead time. Expanding beyond deployment time, lead time measures the elapsed time from receipt of a new request to availability in production.
- Incoming customer support requests. Trouble tickets are the most available metrics of software bugs and other deficiencies that cause rework and user disruption. This is a key element of quality.
- Automated test pass rate. Another contributor to the speed of the DevOps process is the incorporation of automation into the software testing stage. Automated tools test new software faster than humans. However, speed means nothing if your tools fail to deliver the right results.
- Defect escape rate. Code defects happen to the best of us. Frustrating as they are, the occasional defect comes with the territory. Abnormally high defect rates could be the first sign of trouble from one of your people, but ultimately, we’re talking more about early detection.
Adding quality assurance (QA) testing to the process creates a barrier between the production team and the customer. A safeguard that keeps mistakes from going public.
Additionally, comparing defects caught in testing to those found in production is a useful way to assess the efficiency of both the development process and your testing infrastructure.
- Availability. Anyone in IT is vividly aware of the importance of “five nines” availability, which is the ability to keep the system available for users 99.999% of the time. When users complain of “constant downtime,” having this ratio calculated helps to resolve their concerns.
- Service level agreements (SLA). Commitment is a key element to obtaining the confidence of the user community. Establishing a firm Service Level Agreement and regular reporting on fulfillment can achieve this, as long as you are continually meeting or exceeding your agreements.
- Failed deployment count. Leveraging the resilience of microservices in containers means that any given defect will likely not bring down the entire system. Full system unavailability disrupts user workflow and directly impacts your availability, which is the key component of most SLAs.
- Error rates. Racking the frequency of the occurrence of errors is far more valuable than simply identifying occasional ones. Errors are going to occur for a wide variety of reasons, but a pattern of errors occurring with regularity is a clear indicator of a deeper problem.
- Application usage and traffic. These metrics allow you to track user engagement with application features. Increased engagement after an update may indicate users are pleased with the updates, while neglected features suggest a usability problem. Additionally, if your traffic reports show no activity, then you have a problem. For example, a lack of traffic may indicate that there is a faulty microservice causing the anomaly.
- Application performance. Many elements can deteriorate application performance. The causative factor may come from the code, the storage, compilers, the database itself, protocol errors, the service bus, or many other elements. Effective application performance monitoring is a requirement in all environments.
- Mean time to detection (MTTD). The DevOps “need for speed” extends beyond development and deployment to include the detection of anomalies. The faster you detect them, the faster you can resolve them.
Mean time to recovery (MTTR). The other end of the error handling sequence — once you’ve detected and identified an anomaly, this measures the time it takes to actually resolve it and return the application to full availability.
Don’t Let Perfection Be the Enemy of Good
Bottom line: measuring DevOps success metrics is not “impossible.”
Again, that misconception boils down to a couple of key factors. For one, in DevOps, there’s no end game — it’s not about racking up X amount of leads in a month or landing one huge deal then moving on to the next big thing.
The second key factor is that many DevOps success metrics are both objective and subjective, which can be measured to provide useful insights.
Planning a DevOps Initiative? Download Our Free Guide!
3Pillar Global uses DevOps as a critical part of our digital product development. Download our Free DevOps guide where we discuss the benefits and common challenges experienced with DevOps or watch our on-demand webinar "Is DevOps Right For You?".