LearnWhat is an Incident?
Fundamentals

What is an Incident?

Understanding incidents, how they differ from alerts and outages, and when to declare one.

8 min readLast updated: January 2026

Incident Definition

An incident is any unplanned event that disrupts or degrades a service, requiring a coordinated response to restore normal operations. Unlike routine issues that can be handled through standard processes, incidents demand immediate attention and often involve multiple people or teams.

Key Characteristics of an Incident

  • Impact: Affects users, business operations, or system health
  • Urgency: Requires immediate or near-immediate attention
  • Coordination: Often needs multiple people to resolve
  • Visibility: Should be tracked and documented

Alerts vs Incidents vs Outages

These terms are often confused, but understanding the distinction is crucial for effective incident management.

Alert

An automated notification triggered by a monitoring system when a metric crosses a threshold.

Example: "CPU usage exceeded 80% on server-prod-01"

Not all alerts become incidents. Many alerts are noise or self-resolving.

Incident

A declared event requiring coordinated response. May be triggered by alerts, user reports, or proactive detection.

Example: "Checkout flow failing for 15% of users"

Incidents are explicitly declared and tracked with a lifecycle.

Outage

Complete unavailability of a service or critical function. All outages are incidents, but not all incidents are outages.

Example: "Website completely unreachable"

Outages are typically SEV1 or SEV0 (highest severity) incidents.

When to Declare an Incident

Knowing when to declare an incident is one of the most important skills in incident management. Declare too early and you create unnecessary overhead. Declare too late and you waste precious response time.

Declare an incident when:

  • Users are impacted — Even a small percentage of users experiencing issues warrants attention
  • Revenue is at risk — Anything affecting transactions, sign-ups, or billing
  • Data integrity is threatened — Data loss, corruption, or security concerns
  • SLAs are breached or at risk — You're outside your error budget
  • Multiple alerts are firing — Correlated alerts suggest a systemic issue
  • You're unsure — When in doubt, declare. It's easier to downgrade than to catch up

Pro Tip: Lower the Bar for Declaration

Many teams set too high a bar for declaring incidents. This leads to "shadow incidents" that go untracked and unlearned from. Make it easy and low-friction to declare incidents. You can always close them quickly if they turn out to be non-issues.

Types of Incidents

Incidents come in various forms. Understanding the types helps with response planning and post-incident analysis.

By Impact

  • Customer-facing: Users directly experience the issue
  • Internal: Internal tools or processes are affected
  • Infrastructure: Underlying systems are degraded
  • Security: Security vulnerabilities or breaches

By Cause

  • Deployment-related: Issues caused by code changes
  • Infrastructure: Hardware, cloud, or network problems
  • Dependency: Third-party services failing
  • Capacity: Systems overwhelmed by load
  • Configuration: Misconfigurations or feature flags

What Happens After Declaration

Once declared, an incident enters a formal lifecycle:

  1. Acknowledgment: Someone takes ownership
  2. Assessment: Determine severity and impact
  3. Communication: Notify stakeholders
  4. Investigation: Find the root cause
  5. Mitigation: Stop the bleeding
  6. Resolution: Fix the underlying issue
  7. Post-mortem: Learn and improve

For more details, see our guide on the incident lifecycle.

Best Practices

  • Document your incident definition — Everyone should know what qualifies as an incident
  • Make declaration easy — One command or button to start an incident
  • Use severity levels — Not all incidents need the same response (learn about severity levels)
  • Track everything — Every incident should be documented, even minor ones
  • Review regularly — Analyze incident trends to find systemic issues

Next Steps

Now that you understand what an incident is, continue learning:

Never Miss an Incident with OpsBrief

OpsBrief unifies alerts and events from all your tools into a single view. AI surfaces what matters so you can identify and respond to incidents faster.