Reliability Engineering

SLO

Service Level Objective

A Service Level Objective (SLO) is a target level of reliability for a service, expressed as a measurable goal (e.g., 99.9% availability).

SLO vs SLA vs SLI

SLI (Service Level Indicator): The metric you measure - Example: Request latency, error rate, availability

SLO (Service Level Objective): The target for that metric - Example: 99.9% of requests complete in <200ms

SLA (Service Level Agreement): The contract with consequences - Example: If we miss 99.9%, customer gets credits

Relationship: SLI → SLO → SLA

Why SLOs Matter

SLOs answer the critical question: "How reliable is good enough?"

Without SLOs: - Teams chase 100% reliability (impossible, expensive) - No framework for prioritizing reliability vs features - Arguments about what "reliable" means

Common SLO Types

Availability: % of time service is up - Example: 99.95% availability (22 min downtime/month)

Latency: Response time percentiles - Example: 99% of requests <200ms, 99.9% <1s

Error Rate: % of requests that fail - Example: <0.1% error rate

Throughput: Capacity delivered - Example: Support 10,000 requests/second

Setting Good SLOs

1. Start with user expectations 2. Measure current performance 3. Set achievable targets (with room to improve) 4. Define what "counts" clearly 5. Review and adjust quarterly

Related Terms

SLA

SRE

Error Budget

Put This Knowledge Into Practice

OpsBrief helps you improve operational visibility by consolidating events from all your tools into a unified daily brief.