TECHNICAL DEBT

Measuring Technical Debt to Avoid the Boiling Frog Syndrome

Technical debt is the difference between the current state of the software, and what it should be to allow for the easiest change.

Egor Savochkin

Published in

Booking.com Engineering

9 min readSep 14, 2023

Software development is all about change. And, over the lifespan of our software, the goal is to implement required changes in a reasonable amount of time. Whether the changes are technical in nature, like an urgent security upgrade, or stem from a business need, such as building a new feature to make us more competitive in target markets — how fast we can change is critical.

What makes us slow down? This is usually because making something work is not the same as making it maintainable over the long term [SWE at Google]. Often the first working version is quick and dirty, and it takes extra effort to ensure it’s susceptible to change. Enter the technical debt metaphor [Ward]. A developer decides not to invest in code changeability once, and instead absorbs the debt in order to have something quicker. Later on, however, it will take extra effort every time we make changes to it. That means we’re paying interest until the debt is paid in full.

What is technical debt?

Technical debt is the difference between the current state of the software, and what it should be to allow for the easiest change. There might be times where it may be worth accumulating the technical debt — for example, if we need to meet a hard deadline that would otherwise stop the entire operation. But in the long run, it’s wise to implement measures to control and minimize technical debt [FowlerIsWorthTheCost].

If we are talking about software with a predicted lifespan of years, there should not be a question of whether to pay the technical debt. It’s only a question of how to identify, measure, and control it.

Technical debt can have different origins. For one, the team could simply be unaware of the problems technical debt causes. Or, the team may feel that the problem exists, but they are under the impression that they never have time. This has a lot to do with the engineering culture. Over time, the situation only gets worse [Broken window theory]. Another scenario involves taking on technical debt intentionally, after weighing all the pros and cons. And a third origin is the technical debt that accumulates from the fact that we do not always have all the information beforehand. Requirements may change, and we learn as we develop. This kind of debt is inevitable for even the best teams [FowlerTechDebtQuadrant].

The trick with technical debt is that it accumulates very slowly by taking many small shortcuts now and then. Sacrificing long-term results in favor of short-term comfort and simplicity is known as the boiling frog metaphor. In other words, the problems begin to add up until the point of catastrophe. How do we prevent this from happening?

The best defense against technical debt is to make it visible from the start. Then, we can proactively manage it by setting up the right health indicators, and taking corrective actions early on.

On the other hand, If we already found our system at its knees because of the technical debt, then it’s time for more radical “clean-up” initiatives — before it is too late. In this case it is advisable to establish several improvement indicators and use them to track the progress of those initiatives.

Here are some universal metrics we can use in many situations both as health and improvements indicators. This is not by any means an exhaustive list of metrics, though. You might find other metrics not mentioned here may be a better fit.

WTFs per minute

A widespread opinion is that the only valid measurement of code quality is WTFs per minute [MartinCleanCode]. Maybe it is a good idea for a startup to create a device that will count the WTFs?

source

Of course this metric is not only subjective, but also very dependent on the individual skill level of the software engineer and the engineering culture in the organization. Following the broken window theory, a large amount of the bad code encourages the developers to further produce the technical debt.

Number of code smells

Martin Fowler and Kent Beck introduced the term ‘code smell’ to help developers identify places where there could be problems in the code worth fixing. Their book, ‘Refactoring,’ includes 24 examples of bad smells. Many code smells and heuristics are also included in the ‘Clean Code’ book from Uncle Bob. [Clean Code ch 17]. Some code smells, like duplications and long functions, could be effectively identified by static analysis tools, such as SonarQube. However many code smells can’t be easily identified by static analysis tools. This is why we need another metric, like WTFs per minute.

Automated test coverage

Although this has been known for a long time [e.g. see SWE at Google, Clean Coder and many others], recent research [Accelerate, DORA research] identified a positive statistical correlation between test automation and software productivity. This means that if you increase the automated test coverage, you most likely increase the productivity of your team.

source

This metric can be easily tracked by many tools like JaCoCo and others. But like many other metrics, it can be easy to fool by writing a lot of tests that actually do not test anything. So, it is very important to combine the test coverage metric with other efforts aimed at improving the skills and explaining the benefits of writing self-testing code — for example, Google’s Testing on the Toilet.

This metric can be effectively used when the low test automation is a constraining factor for a team. When my team found very low test automation (around 50%) of one of its legacy components, we made it a priority. The metric was put on the team’s radar, and we monitored our progress and discussed the results during retros. A year later, the unit test coverage is up to 80%, and it’s no longer a constraining factor. So, now we are using it as a health indicator of our codebase.

Documentation coverage

Lack of documentation may also harm a team’s performance. Therefore, we can use a metric related to documentation coverage:

Documentation coverage — the percentage of the system that is covered by the documentation.

How do we use this metric? In teams where we see a lack of documentation as a constraining factor, we start focusing on improving it. We list all the components and assess the current documentation coverage for each one of them. Each week, we update the metrics and monitor our progress.

Effort spent on deprecated components

It’s a normal occurrence when we need to deprecate some component in favor of another one, but still need to keep the deprecated component around for some time. For example, this may happen because some of the clients need time to migrate to the new component. During this period, we may still need to do some work on it, such as bug fixes. This can be considered a waste, since the deprecated component will eventually be thrown away. The pitfall here is that very often teams “forget” about those deprecated components and keep supporting them. Over time the effort on supporting the deprecated components accumulates and may become very big. That is why it makes sense to keep track of deprecated components and try to decommission them as early as possible.

Possible metrics to use:

%work on deprecated components = time spent on deprecated components / total time

%work on deprecated components = #tasks related with deprecated components / #total tasks

Deprecated changes ratio = #changes to the deprecated components / #total changes

How do we use those metrics? My team is responsible for some of the financial content related to invoicing of the accommodation partner’s portal, which operates in more than 200 countries. Last year we implemented a new page for displaying invoices and rolled it out to almost all countries. However, some of the countries had special logic and we decided to postpone their migration and let them use the deprecated page for some time. This decision was a good one, because it allowed us to get quicker feedback for the new page. One drawback, though, was that we also needed to support several versions of the same page for several months. Although the old pages did not require a lot of effort to support, if we consider the accumulated effect it may eventually become substantial and even become the most constraining factors for a team. In such cases it makes sense to use this metric as an improvement or health indicator.

Effort spent on unplanned work

“In The Visible Ops Handbook, unplanned work is described as the difference between “paying attention to the low fuel warning light on an automobile versus running out of gas on the highway” (Behr et al. 2004). In the first case, the organization can fix the problem in a planned manner, without much urgency or disruption to other scheduled work. In the second case, they must fix the problem in a highly urgent manner, often requiring all hands on deck — for example, have six engineers drop everything and run down the highway with full gas cans to refuel a stranded truck.” [Accelerate].

In other words, by unplanned work we mean something that makes us stop the world and rush to make a fix, like a critical defect or an outage.

We can use the following metrics:

%unplanned work = time spent on unplanned work / total time

%unplanned work = #unplanned tasks / #tasks

Effort spent working on defects identified by end users

Software defects obviously slow down the feature work. As such they could be considered as part of the technical debt.

We can use the following metrics:

%work on defects = time spent working on defects / total time

%work on defects = #defects / #total tasks

Number of vulnerabilities

Using “Vulnerable and Outdated Components” is within the OWASP Top Ten web application security risks. This can result in urgent, unplanned work to fix vulnerabilities and to cope with their consequences. Therefore, we can consider having vulnerabilities as a technical debt.

There is a special tool to check project dependencies: https://owasp.org/www-project-dependency-check/. It is a widely known best practice to have the tool incorporated within the CI/CD pipeline. This helps us discover vulnerable and outdated components early and fix them.

Estimated effort to pay technical debt

A lot of things that can slow us down can’t be easily tracked by a static analysis tool. It’s impossible for something like SonarQube to estimate the effect of the shared database architecture, or other architectural problems. Every team has specific aspects that could be considered technical debt, and every team is different. As obvious as it is, the simplest way to measure technical debt may be to estimate the effort needed to remove it. However, this assumes the engineer has the skills and experience to write clean and maintainable code (e.g. design patterns, refactoring, self-testing code etc), architectural best practices (e.g. loosely coupled architecture, etc), as well as anti-patterns.

Key takeaways

Technical debt is the difference between the current state of the software and what it should be to allow for the easiest change.
In almost all cases, it is worth keeping technical debt on low levels. If we don’t, then we need to exercise extra effort for every change going forward.
Technical debt can have different origins: (i) when the team is just unaware of the problems technical debt causes, (ii) intentional accumulation after having taken into account all the pros and cons, (iii) a lack of information beforehand, as requirements may change and we learn as we develop.
The trick with technical debt is that it builds very slowly by taking many small shortcuts, until it is too late.
The best practice is to make technical debt visible from the start by establishing health indicators to monitor, and by taking corrective actions at early stages.
Unfortunately, we can end up in a situation where it is almost too late, and radical clean-up initiatives are required. For these, we can establish several improvement indicators and use them to track the progress of those initiatives.
There are metrics that can be used as both health and improvement indicators by the majority of teams, such as WTFs per minute, code smells or vulnerabilities, test coverage, documentation coverage, effort spent on deprecated components/unplanned work/defects or effort spent to pay technical debt. This is not an exhaustive list of metrics at all. Every team and situation may be unique and a team may come up with something different which may be a better fit.

Special thanks to Santiago Machado, Dunya Kirkali, Akash Khandelwal, Shreya Kedia and Kerry O’Leary for their help, ideas and article review.