LearnOn-Call Fundamentals

Fundamentals

On-Call Fundamentals

Building sustainable, fair, and effective on-call rotations for your team.

15 min read•Last updated: January 2026

What is On-Call?

On-call is a system where team members take turns being the first responder to production incidents outside of normal working hours. The on-call engineer is responsible for acknowledging alerts, triaging issues, and either resolving them or escalating appropriately.

Done well, on-call provides reliable coverage while distributing the burden fairly. Done poorly, it leads to burnout, attrition, and degraded incident response.

Designing On-Call Rotations

Key Principles

Fairness: Everyone participates (unless there's a good reason not to)
Predictability: People know when they're on-call well in advance
Flexibility: Easy to swap shifts or cover for teammates
Sustainability: On-call burden doesn't exceed reasonable limits

Weekly Rotation

One person is on-call for an entire week, then hands off to the next person.

Pros: Simple, clear ownership, fewer handoffs

Cons: Entire week can be exhausting, single point of failure

Best for: Smaller teams (4-8 people), lower alert volume

Follow-the-Sun

Responsibility follows daylight hours across time zones. No one works overnight.

Pros: No overnight pages, better work-life balance

Cons: Requires global team, more handoffs

Best for: Distributed teams across time zones

Primary/Secondary

Two people on-call: primary gets all alerts, secondary is backup if primary doesn't respond.

Pros: Redundancy, shared load during major incidents

Cons: More people tied up, secondary might not engage

Best for: High-stakes systems, SEV0-prone environments

Shift-Based

Fixed shifts (e.g., 8am-4pm, 4pm-12am, 12am-8am) with different people covering each.

Pros: Predictable hours, better for high-volume environments

Cons: Requires more people, shift handoffs

Best for: 24/7 operations, larger teams

Escalation Policies

An escalation policy defines what happens when the primary on-call doesn't respond. Good escalation policies ensure incidents get handled even when things go wrong.

Example Escalation Policy

Primary On-Call

Alert sent immediately. 5-minute acknowledgment window.

Secondary On-Call

If no ack after 5 min, alert secondary. Another 5-minute window.

Team Lead / Manager

If still no ack after 10 min total, escalate to leadership.

All-Hands Broadcast

For SEV0: page entire team if no response within 15 min.

On-Call Compensation

On-call is real work that happens outside of normal hours. Fair compensation acknowledges this.

Common Compensation Models

Flat stipend: Fixed amount per on-call shift (e.g., $500/week)
Per-page payment: Additional compensation per incident handled
Time-off in lieu: Comp time for after-hours work
Higher base salary: On-call expectation built into compensation
Combination: Stipend + per-incident bonus

Legal Considerations

On-call compensation requirements vary by jurisdiction. Some regions require pay for "on-call time" even if no incidents occur. Consult with HR/legal for compliance.

Preventing On-Call Burnout

On-call burnout is a real risk that leads to attrition, mistakes, and degraded incident response. Prevention requires intentional effort.

Warning Signs

Dreading on-call shifts weeks in advance
Sleep disruption affecting work quality
Decreased incident response quality over time
Team members frequently swapping away from on-call
High alert volume with many false positives

Prevention Strategies

Reduce Alert Volume

• Fix noisy alerts—every alert should be actionable
• Tune thresholds to reduce false positives
• Group related alerts to reduce noise
• Track and eliminate toil

Share the Load

• Ensure rotation includes enough people
• Expand on-call pool to all appropriate engineers
• Consider hiring for coverage if needed
• Follow-the-sun if possible

Fair Compensation

• Pay fairly for on-call time
• Compensate extra for heavy pages
• Provide comp time after rough shifts
• Recognize on-call in performance reviews

Improve Systems

• Invest in reliability to reduce incidents
• Better runbooks for faster resolution
• Automate common remediations
• Prioritize fixes for repeat incidents

Best Practices Summary

Make on-call voluntary where possible — Forced on-call breeds resentment
Set clear expectations — Response times, escalation, documentation
Provide good tooling — Fast alerting, clear runbooks, easy escalation
Hold post-mortems — Learn from incidents to reduce future burden
Measure and track — Pages per shift, MTTR, repeat incidents
Listen to feedback — On-call engineers know what's broken

Next Steps

Post-Mortems Guide

Learn how to run effective post-mortems that reduce future on-call burden.

Read guide

Solving On-Call Burnout

Deep dive into preventing and addressing on-call burnout.

Reduce On-Call Burden with OpsBrief

OpsBrief reduces alert fatigue by filtering noise and surfacing what matters. AI-powered daily briefs mean on-call engineers start their shift informed, not overwhelmed.