Back to Glossary
Reliability Engineering

SLA

Service Level Agreement

A Service Level Agreement (SLA) is a contractual commitment to deliver a specific level of service, with defined consequences (usually financial) for missing targets.

SLA vs SLO

AspectSLOSLA
NatureInternal targetExternal contract
ConsequenceEngineering focusFinancial/legal
FlexibilityCan adjust quarterlyLocked in contract
TargetAspirationalConservative

Best practice: SLO should be stricter than SLA

If your SLA is 99.9%, your SLO might be 99.95%. This gives you buffer before contractual penalties.

Common SLA Components

1. Service description - What's covered? 2. Performance metrics - How is it measured? 3. Target levels - What's the commitment? 4. Measurement period - Monthly? Quarterly? 5. Exclusions - What doesn't count? 6. Remedies - What happens if missed?

SLA Best Practices

- Be specific - Vague SLAs lead to disputes - Measure accurately - Agree on how metrics are calculated - Set achievable targets - Don't promise what you can't deliver - Build in buffer - SLO > SLA - Define exclusions clearly - Maintenance windows, force majeure - Review regularly - As systems change, SLAs should too

The "Nines" of Availability

AvailabilityDowntime/YearDowntime/Month
99% (two 9s)3.65 days7.3 hours
99.9% (three 9s)8.76 hours43.8 minutes
99.99% (four 9s)52.6 minutes4.38 minutes
99.999% (five 9s)5.26 minutes26.3 seconds

Put This Knowledge Into Practice

OpsBrief helps you improve operational visibility by consolidating events from all your tools into a unified daily brief.