SRE Without an SRE Team

Site Reliability Engineering was invented at Google, where teams have the headcount to have dedicated SREs, toil-reduction programmes, and extensive capacity planning functions. Most companies don't. That doesn't make the principles inapplicable — it means the implementation needs to be lighter.

SLOs: pick two metrics and measure them honestly

Don't start with a comprehensive SLO programme. Start with two metrics that matter to your users: probably availability and latency at a meaningful percentile (p95 or p99, not p50). Define what "good" looks like. Measure it. Make it visible. That's it for now.

The value of an SLO isn't the number. It's the discipline of having agreed, in advance, what good looks like — so that when something degrades, you have a shared language for how bad it is.

Error budgets as release gates

The practical version for small teams: if your availability SLO is 99.5% and you've used 80% of your error budget this month, that's a signal to slow down feature work and focus on reliability. You don't need a formal process — you need one person to own the number and the authority to slow down releases when it's trending badly.

Blameless postmortems, lightweight version

For every incident that causes user impact: write down what happened, what the timeline was, what the contributing factors were, and what would prevent recurrence. No names in the contributing factors section — only systems, processes, and decisions. Share it with the team. This is the entire practice, stripped to its core.

"A blameless postmortem is not about making people feel better. It's about making the system better."

The full SRE book is worth reading. Apply it selectively, starting with the parts that address your most frequent pain points.

Written by

Gbemisola Ojo

CEO | Cloud DevOps Manager

Leads the team, still keeps a hand in production.

SLOs: pick two metrics and measure them honestly

Error budgets as release gates

Blameless postmortems, lightweight version

Gbemisola Ojo

More from Operations.

We managed CI/CD for 40 companies. Here's what the good ones share.