LearnOn-Call Fundamentals
Fundamentals

On-Call Fundamentals

Building sustainable, fair, and effective on-call rotations for your team.

15 min readLast updated: January 2026

What is On-Call?

On-call is a system where team members take turns being the first responder to production incidents outside of normal working hours. The on-call engineer is responsible for acknowledging alerts, triaging issues, and either resolving them or escalating appropriately.

Done well, on-call provides reliable coverage while distributing the burden fairly. Done poorly, it leads to burnout, attrition, and degraded incident response.

Designing On-Call Rotations

Key Principles

  • Fairness: Everyone participates (unless there's a good reason not to)
  • Predictability: People know when they're on-call well in advance
  • Flexibility: Easy to swap shifts or cover for teammates
  • Sustainability: On-call burden doesn't exceed reasonable limits
Weekly Rotation

One person is on-call for an entire week, then hands off to the next person.

Pros: Simple, clear ownership, fewer handoffs

Cons: Entire week can be exhausting, single point of failure

Best for: Smaller teams (4-8 people), lower alert volume

Follow-the-Sun

Responsibility follows daylight hours across time zones. No one works overnight.

Pros: No overnight pages, better work-life balance

Cons: Requires global team, more handoffs

Best for: Distributed teams across time zones

Primary/Secondary

Two people on-call: primary gets all alerts, secondary is backup if primary doesn't respond.

Pros: Redundancy, shared load during major incidents

Cons: More people tied up, secondary might not engage

Best for: High-stakes systems, SEV0-prone environments

Shift-Based

Fixed shifts (e.g., 8am-4pm, 4pm-12am, 12am-8am) with different people covering each.

Pros: Predictable hours, better for high-volume environments

Cons: Requires more people, shift handoffs

Best for: 24/7 operations, larger teams

Escalation Policies

An escalation policy defines what happens when the primary on-call doesn't respond. Good escalation policies ensure incidents get handled even when things go wrong.

Example Escalation Policy

1

Primary On-Call

Alert sent immediately. 5-minute acknowledgment window.

2

Secondary On-Call

If no ack after 5 min, alert secondary. Another 5-minute window.

3

Team Lead / Manager

If still no ack after 10 min total, escalate to leadership.

4

All-Hands Broadcast

For SEV0: page entire team if no response within 15 min.

On-Call Compensation

On-call is real work that happens outside of normal hours. Fair compensation acknowledges this.

Common Compensation Models

  • Flat stipend: Fixed amount per on-call shift (e.g., $500/week)
  • Per-page payment: Additional compensation per incident handled
  • Time-off in lieu: Comp time for after-hours work
  • Higher base salary: On-call expectation built into compensation
  • Combination: Stipend + per-incident bonus

Legal Considerations

On-call compensation requirements vary by jurisdiction. Some regions require pay for "on-call time" even if no incidents occur. Consult with HR/legal for compliance.

Preventing On-Call Burnout

On-call burnout is a real risk that leads to attrition, mistakes, and degraded incident response. Prevention requires intentional effort.

Warning Signs

  • Dreading on-call shifts weeks in advance
  • Sleep disruption affecting work quality
  • Decreased incident response quality over time
  • Team members frequently swapping away from on-call
  • High alert volume with many false positives

Prevention Strategies

Reduce Alert Volume
  • • Fix noisy alerts—every alert should be actionable
  • • Tune thresholds to reduce false positives
  • • Group related alerts to reduce noise
  • • Track and eliminate toil
Share the Load
  • • Ensure rotation includes enough people
  • • Expand on-call pool to all appropriate engineers
  • • Consider hiring for coverage if needed
  • • Follow-the-sun if possible
Fair Compensation
  • • Pay fairly for on-call time
  • • Compensate extra for heavy pages
  • • Provide comp time after rough shifts
  • • Recognize on-call in performance reviews
Improve Systems
  • • Invest in reliability to reduce incidents
  • • Better runbooks for faster resolution
  • • Automate common remediations
  • • Prioritize fixes for repeat incidents

Best Practices Summary

  • Make on-call voluntary where possible — Forced on-call breeds resentment
  • Set clear expectations — Response times, escalation, documentation
  • Provide good tooling — Fast alerting, clear runbooks, easy escalation
  • Hold post-mortems — Learn from incidents to reduce future burden
  • Measure and track — Pages per shift, MTTR, repeat incidents
  • Listen to feedback — On-call engineers know what's broken

Next Steps

Reduce On-Call Burden with OpsBrief

OpsBrief reduces alert fatigue by filtering noise and surfacing what matters. AI-powered daily briefs mean on-call engineers start their shift informed, not overwhelmed.