No known bugs
How the principle of "no known bugs" works in practice
June 17, 2024 · 6 min read
The Cronofy engineering principle pretty much everyone asks me about, including non-engineers, is “no known bugs”. Common questions are around what that means in practice. This is an attempt to answer those questions in a way that I can reference, with a hope it might lead other organizations to adopt it as well.
First, the specific engineering principle in full:
No known bugs
To maintain speed we fix bugs as they are found, and prevent their return through automated tests. Bugs are ballast that slows us down. We carry bugs for as short a time as we can so we’re always building on solid ground.
Let’s start with the most common question.
What does “no known bugs” mean?
Seemingly obvious, but regularly asked. No known bugs means we don’t tolerate the existence of known bugs. This means our engineers work on tasks with the following priority:
- Security
- Stability
- Bugs
- New features
You can argue that security and stability are just special categories of bug, so this can be simplified to:
- Bugs
- New features
As we break down all our work into many small changes we can usually start fixing a bug within a day of it being discovered. If it’s sufficiently severe someone will halt their current work and move on to a fix immediately, but we prefer to finish what we’ve got in flight first. With most bugs being quite easy to resolve, this means a bug is only known about for a few days before being eliminated.
As bugs are short-lived, we don’t need a bug tracker. They go to the top of a team’s “up next” group, get resolved, and then we’ll inform any affected customers.
Once the bug is eliminated we can then get back to creating new ones improving the platform.
All bugs?
All bugs get this treatment. All of them. If someone begins debating “is really a bug or not?”, it’s almost certainly is. They would just rather work on something else.
And here lies the mindset shift for many. Everyone wants to work on, support, and sell a bug-free system! However, maintaining a bug-free system requires compromises that many are not used to.
Our principle removes the decision about whether a bug should be fixed, the answer is always “yes”. It also removes the decision about when a bug should be fixed, the answer is always as soon as we can.
We may start a week, month, quarter with giddy aspirations about what we’ll have delivered by the end of the period. On top of the uncertainty of how long it will take to deliver the new work is the commitment we have made to ourselves around bugs. That means there is also uncertainty of how much time will need to be spent fixing them.
Maintaining motivation
Shifting plans to fix bugs can be demoralising at times. It can feel like no progress has been made from one week to the next if a few bugs rear their head at once.
It’s on you as a leader to remind, usually through repetition, that this is the sacrifice we have collectively agreed to make in order to have a high-quality system. Bugs don’t create themselves, they come from the mistakes or oversights of our past selves. Closing those gaps is some of the most valuable work we can do as it repairs our ability to move forwards with confidence.
Part of how I explain this nowadays is through reframing.
There are no bugs
Bugs are created when the requirements of the product as a whole are not met. Those requirements may have not been anticipated at the time, or simply missed. But it is still a requirement of the system, otherwise there’s nothing to fix.
So at Cronofy we don’t have bugs, we have Unscheduled Product Enhancements. Fixing a bug is enhancing the product by returning it to its desired state of functioning as expected. We want the product to behave as we want at all times, and sometimes we need to take retroactive steps to resolve that.
With this reframing, we don’t occasionally spend a week fixing bugs, instead we’ve spent a week on Unscheduled Product Enhancements. The scheduled product enhancements will be picked up, but perhaps a little later than we anticipated.
This way every week is spent on enhancing the product, just sometimes that’s by paying off the quality debt we had unknowingly accrued.
Paying that debt off today means we can maintain a fast pace tomorrow.
Reducing Unscheduled Product Enhancements
There’s no silver bullet for guaranteeing you’re always producing high-quality software, but there’s some tactics we employ to improve our chances:
Customer empathy
We aim to understand the problem we’re solving to avoid a mismatch between what is needed and expected, and what we deliver.
Architectural Decision Records
We describe the motivation for the change and the reason for the path chosen by writing it down, to ensure we have thought changes through clearly.
Automated testing
We verify our platform works as expected with automated tests to ensure it will continue to behave as expected when being extended.
Breaking changes down into small steps
Smaller changes are easier to review, meaning there’s less opportunity for mistakes to be missed. If there’s an issue we have a handful of small candidates to analyse in the first instance. This reduces the time to resolution.
Getting started
At Cronofy we’ve had this policy from day one. It’s usually not feasible to introduce it as a rule for an mature system. That might mean no new features are delivered for months. However, in roles prior to Cronofy I’ve had success in introducing it to a system gradually.
Perhaps there’s a new service that needs to be built, or a new major feature. You can implement a “no bugs” policy for that as a starting point. You may have to fix a few bugs in adjacent places that interact with the new part, but on the whole you can isolate the effort and the perceived risk.
This will give you an area you can rely upon. It also gives you a case study for advocating for increasing the scope of the “no bugs” policy. In terms of candidates, it’s often easiest to apply to adjacent areas as they now have a reliable neighbour. However, consider where the value to the business comes from, perhaps you have Service Level Objectives that can be improved.
It’s a full commitment
Maintaining a “no bugs” policy sounds easy, utopian even, but it isn’t. It requires continuous diligence by everyone, and the occasional reminder of the long term benefits.
Without everyone being on board, it’s easily sacrificed in the name of temporary progress. But I can put my hand on my heart and say it is worth that effort to maintain.
Hey, Iām Garry Shutler
CTO and co-founder of Cronofy.
Husband, father, and cyclist. Proponent of the Oxford comma.