Simple scorecards beat scales

Simple scorecards are explicit and force real decisions, making them more effective

August 18, 2023 · 7 min read

Simple scorecards are more effective, and can provide stronger data for better insight, than the goto sliding scale. Rather than defaulting to an overall “marks out of 10” and then agonising over whether something is a 7 or an 8, building out a simple scorecard with ideally binary “yes or no” questions will produce better outcomes.

Why score at all?

Let’s take a step back and ask why we’re scoring at all. Usually, when we are scoring something we are trying to evaluate the quality of something. A candidate for a role, a response to a support ticket, and feedback delivered to a colleague are examples of things we evaluate at Cronofy.

We score things to distill a qualitative interaction into a more quantitive value so we can compare instances, and potentially track trends over time. An example of the former may be comparing two candidates for a role; by having a consistent scoring mechanism you aim to eliminate the difference between interviewers. For things like support tickets and feedback scoring; that allows you to track improvement over time to see the impact of training initiatives.

More upfront investment

The default “marks out of 10” scoring framework is simple to implement. Many tools come with such a thing out of the box. Often when you are starting to score something there’s just one person doing the scoring and so you get a level of consistency due to them having a consistent “gut feel” on whatever the subject may be.

This is better than nothing, but it does not scale. It is not replicable across people because it is impossible to fully explain the numerous permutations from that person’s lifetime of experience which makes the difference between a 7 and an 8.

Rather than having single overarching score, work to distill a framework which has more, simple to score, attributes. By teasing out these attributes you will create stronger guidelines as to what good looks like. That makes expectations more clear. It allows training to be more targeted, and so more effective. Overall leading to better outcomes.

How to distill attributes

To start building your framework you ideally need a pool of examples. You can then compare and contrast examples that are outstanding against ones that are mediocre, or perhaps even bad. Counter-examples can be great learning opportunities as well.

From the examples, you want to generate a collection of positive attributes that can make for a great example. Do this by ask yourselves questions like:

If this example is great, what is missing from the mediocre example that would make it great?
If this example sucks, what specifically made it suck and what is the inverse of that?

You want to generate as many attributes as you can at this stage you’re only generating potential scoring attributes at this point.

Once you have you potential scoring attributes, the real debate begins. From here you want to boil your candidates down to a handful of attributes, 3 to 6 has proved a good balance in my experience. Too few and nuance is lost, too many and it starts getting too complicated.

You will likely find attributes can be grouped under a single banner. That’s good! Don’t discard them altogether, instead use the sub-attributes as examples of what can combine to count toward the scored attribute.

Don’t be afraid of having attributes that are obvious or even easily achieveable. The presence of such attributes makes it clear where the baseline is, and if it’s ever a “no” or zero that’s a really strong signal.

You’re aiming for a scorecard where one or two scores relate to baseline good, one or two that capture excellent, and perhaps one that captures something aspirational or unique to your organisation.

Support quality management example

At Cronofy we have a scorecard for scoring how well we have handled a support ticket with four “yes or no” measures:

Communication - does the answer feel like it comes from our company?
Knowledge and Understanding - was the correct answer given, and were all of the customer’s queries addressed?
Procedures - were the correct tags and categories added, and were links to the knowledge base included?
Anticipation - were the customer’s feelings acknowledged and needs anticipated?

Calling back to the mix of scores, “Knowledge and Understanding” and “Procedures” are baseline attributes, “Communication” is mostly baseline but with some Cronofy-specific elements, “Anticipation” captures an excellent or aspirational attribute.

There’s further detail on the specifics of each in our internal reference, but for brevity I’ll share only one as an example:

Communication
Friendly tone of voice
Timing
Helpful and understanding language
Acknowledges customer’s emotions i.e. frustration, urgency, concern
Demonstrates Cronofy’s Principles through manner of speech:
Truth - Be open and honest with communication
Zero In - Act in the customers best interests

A customer may be perfectly happy with a ticket if it was handled in a timely manner, but for us to score it positively for this attribute it also needs to be handled in a Cronofy-like manner as well. This is where the nuance of the sub-attributes coming from the grouping process comes in handy.

Targeted training

With a simple scorecard in place, training becomes more straightforward on whatever it may be. In the support example above, we have four attributes, each with a handful of bullet points. This can easily translate to training and reference materials.

For a scorecard for evaluating job candidates, assistance can be provided in the form of model questions to delve into a particular attribute you are looking for in candidates.

Knowledge gaps become easier to identify. Two 7/10s may translate to yes-yes-no-yes or a yes-no-yes-yes. A scorecard gives you the information to of what specifically is lacking enabling more targeted, and therefore more effective, intervention at the individual, team, or company level.

A clear scorecard also means you can spread the work of scoring out more easily. Scoring the work of others generates more examples of what good looks like for each person involved than seeing only their own work, accelerating improvement.

You don’t lose high level aggregation, you can still derive 3/4, but you have an extra layer of information from which to gain further insight. The simplicity of the measurement encourages an increase in volume, providing more data and increasing the accuracy of insights.

Set the bar high

With a binary, yes-no, scorecard in place people will be forced to make decisions. There’s no more “well it’s missed a few notes but I think it’s a 7 overall” and then having to read the tea leaves to know that 6 is actually awful. Instead you have several attributes that are achieved or not.

In the interests of growth of the individual, team, product, or company you want to “round down”. A “maybe” is a “no”. The standard has either been reached or it hasn’t, there aren’t any half measures. Hiding the truth from ourselves or each other in the interests of kindness puts you on the path of “death by a thousand cuts”.

Making a “maybe” into a “yes” means conversations aren’t had about things not being quite up to scratch and so people won’t know they could be doing better and so avoid opportunities for growth. A score of 1/4 warrants further review, but even better if that includes three near misses from a 4/4. You’ll be able to provide extremely targeted coaching which is more likely to land. That opportunity is missed from a “rounded up” 3/4, and certainly wouldn’t have happened for a 7/10 from a sliding scale.

Less is more

Simple scorecards give you more information in fidelity of signal, combined with an increase in volume as they are easier to execute.

With more information to hand, and the more explicit guidance that can be derived from a simple scorecard, you’re well placed to improve what you are measuring.

Hey, I’m Garry Shutler

CTO and co-founder of Cronofy.

Husband, father, and cyclist. Proponent of the Oxford comma.