All templates
Flowchart template

Incident response runbook

Triage, mitigate, and review production incidents.

The worst time to design your incident process is during an incident. This template encodes the decisions ahead of time: is the alert real, how severe is it, who gets paged, and — critically — the escalation loop that kicks in when the first mitigation attempt does not work. It ends at the post-mortem, because an incident is not over when the graphs recover; it is over when the action items exist.

The first branch matters more than it looks. Routing false alarms to "resolve and tune the monitor" turns alert noise into a feedback loop instead of a nightly annoyance, which is how on-call rotations stay humane.

When to use this template

  • On-call onboarding — new responders learn the shape of an incident in one glance: triage, severity, mitigate, escalate, post-mortem. Pair the diagram with your paging tool walkthrough.
  • Process audits after a rough incident — replay what actually happened against the diagram. Every place reality diverged from the arrows is either a process gap or a diagram update.
  • Standardizing across teams — when each team triages differently, agree on this skeleton first, then let teams attach service-specific runbooks to the "Mitigate impact" node.

How to adapt it

Map the nodes to your tooling and vocabulary, then add the structure your organization needs:

  • Expand the severity diamond into your SEV1–SEV4 levels, each with its own paging policy.
  • Add a communications branch off the open-incident node for status page updates and stakeholder notifications.
  • Insert an incident commander handoff step before mitigation for incidents that run longer than one on-call shift.

Restructure it directly in the visual editor — visual edits regenerate clean Mermaid code, so the runbook your responders see stays a reviewable text file in the same repo as your alert definitions.

Mermaid code

Copy it anywhere Mermaid is supported — GitHub, Notion, or your docs.

flowchart TD
    A[Alert fires] --> B{Real incident?}
    B -->|No| C[Resolve alert + tune monitor]
    B -->|Yes| D{Severity?}
    D -->|Low| E[Create ticket for next sprint]
    D -->|High| F[Page on-call + open incident]
    F --> G[Mitigate impact]
    G --> H{Mitigated?}
    H -->|No| I[Escalate to specialists]
    I --> G
    H -->|Yes| J[Root cause fix]
    J --> K[Post-mortem + action items]

Frequently asked questions

What is an incident response runbook diagram?
It is the decision tree an on-call engineer follows when an alert fires: confirm it is real, assess severity, page and open an incident if it is high, mitigate, escalate if mitigation stalls, then fix the root cause and run a post-mortem. Having it as a diagram means a stressed responder at 3 a.m. can follow arrows instead of re-reading paragraphs.
Why separate mitigation from the root cause fix?
Because they have different goals and different clocks. Mitigation stops customer impact fast — roll back, fail over, scale up — even if you do not yet understand the bug. The root cause fix comes after, calmly, with the pressure off. Runbooks that conflate the two encourage engineers to debug in production while users are down, which lengthens every incident.
How do severity levels fit into an incident flowchart?
This template uses a single Low/High split to keep the triage decision fast: low severity becomes a sprint ticket, high severity pages the on-call. If your organization uses SEV1–SEV4, replace the diamond's two branches with one per level, each routing to its own response — but resist adding levels that do not change who gets paged or how fast.
Where should an incident runbook diagram live?
Wherever your on-call looks first under pressure: the alert annotation itself, your monitoring tool's runbook link, or the top of the incident channel topic. Keep the Mermaid source in version control next to your alerting config so changes to the process go through review, and the rendered diagram can never silently diverge from it.

Related templates