I am Lino
February 18, 2026

Metrics and observability strategy: measuring without fooling yourself

Posted on February 18, 2026  •  16 minutes  • 3259 words
Table of contents

In almost every team, there’s a magical moment when someone opens a dashboard, points at a green graph, and says: “See? We’re doing great.” Meanwhile, support is on fire, the payments API is going down every other hour, and the development team hasn’t slept properly in three weeks.

The difference between a healthy team and one stuck in that endless theater usually comes down to how they use metrics: as a flashlight to see better… or as a stick to beat each other with.

This article is about the flashlight.


Before the dashboard: why we measure (and why we don’t)

Imagine your system is an airplane flying at night. The instrumentation is there so you don’t fly into a mountain in the fog, not so the pilot can announce over the PA, “look how lovely everything is from up here.” Metrics are exactly that: instruments .

We measure for four very specific reasons:

Of course, some companies, teams, or managers tend to turn metrics into other things entirely:

If your metrics don’t help you make better decisions or work with less pain, they’re not metrics — they’re decoration.

Logs, metrics, and traces: the trio that tells the whole story

The “observability” jargon can sound like just another buzzword, but at its core it’s about three types of signals that answer different questions .

Logs: what the heck happened

Logs are the system’s diary. That’s where you record errors, requests, state changes, business decisions. “order_id=1234, status=PAID, payment_provider=stripe” is a good log entry: it tells you what happened, to whom, and in what context.

A very real example I’ve lived through: A user swears you’ve charged them twice. If all your logs show is Error occurred and not much else, you’re toast. If you can see the order’s journey, the payment attempts, and the provider’s callbacks, you can reconstruct the scene and respond with something better than “we’ve opened a ticket.”

Metrics: how healthy things look from 30,000 feet

Metrics are numbers aggregated over time: CPU, memory, p95 latency, error rate, number of pending jobs in a queue, etc. They don’t tell you the full story, but they do tell you “something’s off here.”

A graph of http_server_requests_seconds spiking on the payments API, or a queue_length{queue="emails"} growing non-stop, are elegant ways for the system to scream “heads up — I’m about to blow!” before the users do.

Traces: who did what with a request

Traces are the map of a request’s journey through your services. In a microservices world, seeing that checkout-api calls cart-service, then pricing-service, then payment-gateway, and finally email-service, with times and errors, is pure gold.

When a user clicks “buy” and everything is slow, the trace lets you see whether the problem is in the database, the payment gateway, or that mysterious service nobody’s wanted to touch since 2013.

Putting it all together:

That’s the level of observability you want — not 200 pretty dashboards to show off in the quarterly review.

SLI, SLO, and SLA: putting numbers on “it kinda works”

When business asks “is this working well?”, you can answer with “yeah, more or less” or you can talk SLOs. The second option tends to generate fewer arguments and fewer passive-aggressive emails.

In very short form:

The beauty of SLOs, as the SRE world keeps insisting, is that they aren’t decorative: they give you an error budget. If your SLO is 99.9%, you’re explicitly accepting that 0.1% of requests can fail. That margin tells you:

If in two weeks you’ve already burned through 80% of your error budget on the critical API, it’s probably not the best time for the “grand redesign” that touches every moving part. Or maybe it is… but at least you’ll know you’re playing Russian roulette with data in front of you.

DORA: measuring how you deliver, not whether you’re “busy”

DORA metrics are like that friend who, when you tell them how swamped you are, replies: “okay, but… what have you actually shipped?” They’re about your system’s ability to deliver changes, not about how fast you’re running down the hallway.

There are four (plus a fifth in recent iterations ):

Imagine two teams:

Who “works harder”? Doesn’t matter. Team B has a delivery system designed to absorb frequent changes and cheap failures: more deploys, smaller, more reversible. That’s what correlates with high-performing organizations — not the number of closed tickets.

DORA is, in essence, an antidote to productivity theater.

MTTR and friends: how fast you get back on your feet

Beyond DORA, there’s another family of metrics that tell very human stories: incident metrics.

This is where good observability really shows. Without it, the flow goes something like:

Angry customer → Slack/Teams on fire → 30 minutes to reproduce the failure → 1 hour of “let’s figure out which service it is” → another hour until someone finds the killer query.

With decent metrics, logs, and traces, the script changes:

Two minutes in, an alert fires because the error rate on checkout has spiked. You see that latency in payment-service has gone through the roof. You open a trace and discover the call to the external provider is taking 10 seconds because of something they changed on their end. You can mitigate, roll back, degrade gracefully, show up at the provider’s office with a chainsaw… all before half the world has filed a ticket.

It’s not about having zero incidents (that’s fantasy) — it’s about making sure they aren’t total wrecks every single time.

Good metrics, toxic metrics

A healthy metric changes your behavior for the better. A toxic metric, on the other hand, only changes how you game the system.

And in engineering, we’ve been gaming poorly designed systems for years.

Some examples of metrics that tend to help:

And then you have examples of metrics that go sideways the moment you look at them wrong:

If a metric can be easily gamed without actually improving anything — for instance, slicing tasks into a thousand tiny PRs to look more productive — someone will do it. And if you turn it into a KPI on top of that, you’re pinning a medal on that behavior .

Your strategy should look something like:

Security and cost: the two places where people lie the most

There are two areas where metrics tend to get especially surreal.

Security

The classic: “we have 300 open vulnerabilities, we need to bring that number down.” No context. No priorities. No distinction between a critical CVE exposed to the internet and a false positive in an isolated environment.

A more serious approach looks at things like:

And yes, logs, metrics, and traces matter here too: it’s hard to investigate an attack if you can’t reconstruct what happened.

Cost (FinOps)

Another gem: “we’re spending too much on cloud, cut spending by 30%.” Without saying which part of the system generates value, which part is underutilized, or what “too much” even means.

Talking about cost seriously means things like:

A decent metrics and observability strategy lets you talk about “this data access pattern costs us X per month and brings in Y — is it worth it?” instead of just crying in front of the invoice.

DevEx: measuring how miserable your day-to-day is

Developer experience (DevEx) isn’t some fluffy concept — it has very measurable symptoms. Instead of asking “are you happy?” and calling it a day, you can look at concrete things:

There are recent reports on developer productivity that point to exactly these factors as decisive: it’s not so much about how many hours you spend writing code, but how many of those hours are actually productive.

Measuring DevEx isn’t group therapy — it’s an elegant way of justifying why it’s worth investing in better pipelines, better tooling, or fewer meetings.

So then… what’s the retro even for?

If you already have DevEx metrics, DORA, SLOs, and half an observatory set up, it’s tempting to think the Scrum retrospective is unnecessary. It isn’t — but it doesn’t work the way it was sold, either.

The original idea is lovely: the team meets every sprint, talks about what went well, what went badly, and agrees on concrete actions to improve. On paper, it’s an engine for continuous improvement. In practice, most retros follow a pretty predictable script:

  1. Someone says “deploys take too long.”
  2. Someone else says “I got blocked waiting for another team to respond.”
  3. A third person says “meetings are eating our lives.”
  4. Two or three “action items” get written on a sticky note.
  5. At the next retro, nobody remembers the previous action items and the same complaints come up again.

The root problem? There’s no data. It’s all perception. And perceptions, while valid and real, have two enormous flaws as a driver of change:

Compare that with what happens when the retro has real data on the table:

Without data (classic retro)With data (informed retro)
“Deploys take too long”“Our average lead time this sprint was 4.2 days; a month ago it was 2.8”
“I got blocked several times”“35% of our PRs waited more than 24h for the first review”
“We spend too much time firefighting”“This sprint, 40% of our time went to unplanned work; last sprint it was 25%”
“Feels like there are more bugs”“Change failure rate has gone from 8% to 14% in the last month”

Suddenly, the conversation changes at its core. It’s no longer “I feel like…,” it’s “the data shows that…, why?” And that “why?” is exactly where the retro does shine: discussing causes, context, nuance, and possible solutions.

What the retro can’t do on its own — and where it constantly fails — is detect the problem and measure whether the solution worked. For that, you need metrics.

Put another way:

Without that complete cycle, the retro is a periodic venting session disguised as an improvement process. Useful for morale (sometimes), useless for changing anything structural.

And there’s another problem that rarely gets mentioned: retros reward the most articulate complaints, not the most important problems. The senior developer who speaks clearly and with charisma gets their annoyance turned into an action item; the junior who’s been wrestling with a broken dev environment for weeks says nothing because “maybe it’s just me.” With DevEx data (environment setup time, build times, frequency of external-dependency blocks), that problem surfaces without anyone needing to dare bring it up in a room.

Does this mean we should eliminate retros? No. It means we need to stop asking them for what they can’t deliver: diagnosis and follow-through. The retro works when it arrives with the homework done — data on the table — and focuses on what it actually does well: generating conversation, context, and the team’s commitment to one concrete, verifiable action.

The formula that works best, stripped of fanfare:

That turns the retro from a sticky-note ritual into a real improvement cycle. And as a bonus, it solves the most common problem: actions evaporating between sprints because nobody tracks them with anything more solid than collective memory.

And if you think I’m particularly frustrated by how useless most retros are — and that’s maybe why I’ve given this section extra space — you’d be right. The human side of software development deserves its own article (or several), and I’ll be talking about this Game of Thrones soon.

The “bare minimum” to not fly blind

As a conclusion to this article, and so you walk away with a useful reminder — if I had to define a basic package, something like “observability and metrics for normal teams that don’t want to build a NASA mission control,” it would be this:

You don’t need to start with everything automated or with the most expensive tool on the market. What you need, above all, is to be clear that metrics serve decisions and people — not the other way around.

If your graphs don’t help you catch problems before your users do, deliver better, and make smarter decisions about design, security, and cost, then yeah — they’re just pretty pictures for the next presentation. And for that, honestly, there’s always PowerPoint.


Quick glossary

So you don’t get lost in the jargon.


Sources and references

Because I stand on the shoulders of giants…

Follow me

I write and share opinions about technology, software development and whatever crosses my mind.