Moving forward with observability
Increasing stability through distribution of ownership
December 21, 2020 · 3 min read
As a leader, particularly as an early stage founder, your role is to empower people to do their jobs to the best of their ability. One of the places this is most extreme is in understanding the ebb and flow of your production environments.
As a founder, you’ve seen literally everything that has happened since day one. You’ve put in place mitigations for countless edge cases and exotic failures. Your system is observable through whatever tools you had to hand. Some out-the-box like logs, perhaps a smattering of cobbled together metrics because 3 years ago tools like Datadog existed but would have doubled your cost of sale and put you out of business in six months.
Your system is observable, but only to you. To anyone else it’s like the stream of random green symbols in The Matrix. They can’t see the girl in the red dress. It isn’t that they aren’t capable, but you can’t afford to sit with them for months showing them how to read the tea leaves. We can’t yet learn new skills via upload.
Risk to future growth
This lack of observability for the team was one of the most significant factors for us to scale as a team. At our company retreat in October 2019, I presented it as one of the most important projects we had for the continued performance and stability of the Cronofy platform.
At the time, I presented Honeycomb as one of the leading options in the space. This week we enabled it in all our production environments.
I’d long been underwhelmed by what APM’s bought to the table over simple metrics accessible through AWS CloudWatch. Honeycomb felt different. Charity Majors , Christine Yen, and Liz Fong-Jones were, and still are, on a mission to really change the landscape by defining a new category.
Reducing the P90 to understanding
I found our platform observable through a slew of bunnyhops between tools and cognitive cross-correlation that I could not even being to know, never mind document. What I was looking for was a tool that could provide such insights, without the mental gymnastics necessary to piece the puzzle together.
I’ve got to be honest, the evaluation period for Honeycomb was one of those times I hoped Cronofy wasn’t quite so stable. The real test will come when it comes to using it during a production event. However, we’ve had enough points of proof to believe it is the right tool for the job.
We’ve leant in to trying Honeycomb first during the proof of concept period, and had good success using a tool that we are still new to. Personally, I’ve had examples where I found answers in a similar time to my aforementioned unteachable paths. Even better than that though, has been when members of the team have come to similar insights in a similar time.
The future is here, and it feels more evenly distributed across the team.
Observability feels like a practice where the more you put in, the more you get out. Through adopting Honeycomb, we’ve drastically reduced the cost of putting more in, and through Honeycomb’s pricing model for wider events (ie. no higher cost!) it’s an absolute win.
We’ll be leaning into deeper, application-based service level objectives (SLOs) during 2021. These will enable us to head off platform issues before they become serious and provide concrete data for real-world performance improvements.