I am Lino
February 20, 2026

SLOs, SLAs, and SLIs: putting numbers on "it kinda works"

Posted on February 20, 2026  •  8 minutes  • 1597 words
Table of contents

In almost every company, there’s a magical phrase used to describe a system’s health: “it more or less works.” Translated into plain English: nobody knows how often it goes down, how many requests fail, or how much money is lost when it decides not to work. But hey, “more or less.”

SLI , SLO , and SLA are the grown-up version of that phrase. They’re the way to go from “I think it’s fine” to “this is what it handles, this is what we promise, and this is what’s at stake” — without having to fall back on the classic “trust me, I’m an engineer.”


Putting a face on the acronyms (no drama)

Let’s strip the mystical cloak off of these three letters.

An SLI (Service Level Indicator) is, basically, what you look at . A concrete number that describes how your service is behaving. For example: what percentage of requests succeed, how long the checkout API takes to respond for most users, how many errors per minute you’re throwing on login.

An SLO (Service Level Objective) is the level you aim for on that number. It’s your way of saying “below this, we consider ourselves doing a bad job.” It might sound like: “at least 99.9% of requests to the payments API must complete successfully every 30 days,” or “99% of searches must respond in under 400 ms.”

And an SLA (Service Level Agreement) is when you tie the knot… by contract. It’s the document you send to your customers saying: “I promise X level of service, and if I don’t deliver, Y happens” (credits, discounts, penalties). A typical example: “99.9% monthly uptime or we refund 10% of the fee.”

Google and friends explain it with great patience: the SLI is the metric, the SLO is the internal target, and the SLA is what can cost you money if you fail.

MiniFlix: your “more or less stable” streaming platform

Imagine you build MiniFlix, your own streaming platform. You have a /play endpoint and users who, for some reason, expect that when they hit the play button, the video actually plays. Weird people.

In this context, a good SLI might be:

Then you decide that, in order to sleep reasonably well, your internal SLO will be:

But for your enterprise clients, you sign an SLA that’s a touch more relaxed — 99.9% — with credits if you drop below that. Not because you’re a bad person, but because you want a small cushion: if you miss the SLO but stay above the SLA, you know you’re burning through your internal error budget and it’s time to get serious about reliability before the credits start showing up on the invoice.

Suddenly, instead of “MiniFlix is being a bit flaky today,” you can say: “we’ve burned 80% of our error budget in 10 days — either we ease up on risky changes or we’re going to miss our own standard this month.” That sounds less like bar talk and more like a conscious decision.

Maybe someone sees it as a shell game — and in the wrong hands it could be — but when the numbers are real, at least you know which shell to look under.

From “it’s slow” to “the p95 has gone through the roof”

One of the big advantages of talking about SLOs isn’t the acronym itself — it’s the change in conversation.

Before:

After:

Before:

After:

In Google’s SRE materials, two ideas come up with considerable insistence:

The 100% pipe dream (and why it’s a trap)

There’s a dangerous phase when you first discover SLOs. You typically go through something like:

The serious literature on the subject is fairly unanimous: 100% is an expensive fantasy . Even giants like Google or AWS talk about 99.9%, 99.95%, 99.99%… and even then, they still have rough days.

Another trap is making up targets without looking at your track record. If historically you’ve been hovering around 99.2%, setting 99.99% “because it sounds professional” is basically promising yourself you’ll live in permanent violation. Better to look at a few months of data , see what level you’re actually delivering, and from there decide: do we want to maintain this? Do we want to push it up a bit? Do we have the technical and financial headroom to do it?

And then there’s the wrong scale. “We want p99 < 500 ms on /v1/search-details” sounds very specific, but it means nothing to anyone outside the team. “We want 99% of searches to finish in under 400 ms because above that we see people bailing out ” is a different story: suddenly, business understands why that number matters.

When the acronyms help you talk to actual people

Where SLIs and SLOs truly shine is at the boundary with business. Without them, the conversation tends to be a clash of religions.

Scene without SLOs:

Scene with SLOs and error budget on the table:

It’s not like everyone suddenly holds hands, but at least you’re discussing risks and decisions with numbers — not “tech debt” as some abstract boogeyman. Google describes exactly this: using SLOs and error budgets as currency to negotiate how much reliability you’re willing to sacrifice to move faster… and vice versa.

The summary that doesn’t fit on a sales slide

In the end, these three letters aren’t a consultant’s trick — they’re a kind of contract with yourself:

From there, you either keep living in the same old “it kinda works” mode, or you accept that putting uncomfortable numbers on the table — even if they sting a bit — is the only grown-up way to decide when to push, when to brake, and how much risk you can afford .

It sounds less epic than “we’re going to revolutionize the industry,” but it has one advantage: it works even when nobody’s around to present the dashboard.


Quick glossary

The minimum viable knowledge so nobody catches you with a “wait, what’s that?” face.


Sources and references

Because saying “I more or less made it up” doesn’t meet any reasonable SLA.

  1. SRE fundamentals: SLIs, SLAs and SLOs - Google Cloud Blog
  2. Service Level Objectives (SRE Book, Ch. 4) - Google
  3. The Key Differences Between SLI, SLO and SLA in SRE - DZone
  4. Implementing SLOs (SRE Workbook, Ch. 2) - Google
  5. Embracing Risk (SRE Book, Ch. 3) - Google
  6. What is a Service-Level Objective (SLO)? - Atlassian
  7. SLAs: The What, the Why, the How - Atlassian
  8. Example SLO Document (SRE Workbook) - Google
  9. SLO Engineering Case Studies (SRE Workbook, Ch. 3) - Google
  10. Example Error Budget Policy (SRE Workbook) - Google
  11. Alerting on SLOs (SRE Workbook, Ch. 5) - Google
  12. Web Vitals - Google / web.dev
  13. The Calculus of Service Availability - ACM Queue
  14. Error Budget - Atlassian
Follow me

I write and share opinions about technology, software development and whatever crosses my mind.