All templates
Sequence template

Circuit breaker pattern

Fail fast and recover gracefully when downstream services fail.

Circuit breaker is a design pattern that prevents cascading failures in microservices. When a downstream service (like a payment database) starts failing, a circuit breaker detects the pattern, stops sending requests to it, and instead returns errors immediately. This gives the failing service time to recover and prevents your service from wasting resources on doomed requests.

The pattern has three states: CLOSED (normal operation, requests flow through), OPEN (too many failures, reject requests fast), and HALF_OPEN (probing for recovery). Once failures spike above a threshold, the breaker opens; after a timeout, it sends a test request to see if the service recovered; if yes, it closes and resumes normal operation.

When to use this template

  • Designing resilience — document how your service handles database outages, API failures, and cascading problems before they happen.
  • Explaining service dependencies to engineers — this diagram makes clear that every external call can fail and shows how your service responds.
  • Post-incident reviews — annotate the state transitions with what actually happened during an outage (how many requests queued up, when the service recovered).

How to adapt it

Rename "Payment DB" to your actual failing dependency (Stripe API, Elasticsearch cluster, another microservice) and adjust the threshold and timeout to your system:

  • Add a fallback path — after the circuit opens, show a degraded-mode response (e.g. process payment asynchronously instead of synchronously) so your service still works with reduced features.
  • Layer in logging and alerts — annotate the OPEN transition with a monitoring alert so on-call engineers know the circuit tripped and investigate the root cause.
  • Show cache bypass — if the circuit opens, the service might query a cache instead of the database, trading freshness for availability.

Visual edits regenerate clean code, so you can adjust failure thresholds and response behavior without rewriting the pattern.

Mermaid code

Copy it anywhere Mermaid is supported — GitHub, Notion, or your docs.

sequenceDiagram
    participant Client
    participant Service as Payment Service
    participant Circuit as Circuit Breaker
    participant DB as Payment DB

    Client->>Service: Process payment
    Service->>Circuit: Check state
    Circuit-->>Service: CLOSED (pass through)
    Service->>DB: Execute transaction
    DB-->>Service: Success
    Service-->>Client: 200 OK

    Note over Circuit: Requests succeed, breaker stays CLOSED

    Client->>Service: Process payment
    Service->>Circuit: Check state
    Circuit-->>Service: CLOSED (pass through)
    Service->>DB: Execute transaction
    DB-->>Service: Timeout!
    Service-->>Circuit: Failure recorded
    Circuit-->>Service: Still CLOSED
    Service-->>Client: 500 Error

    Note over Circuit: Failures accumulate...

    Circuit->>Circuit: Failure count > threshold
    Circuit-->>Service: OPEN (fast-fail)
    Client->>Service: Process payment
    Service->>Circuit: Check state
    Circuit-->>Service: OPEN (reject immediately)
    Service-->>Client: 503 Service Unavailable

    Note over Circuit: Circuit is OPEN, DB gets breathing room

    Circuit->>Circuit: Wait timeout seconds
    Circuit-->>Service: HALF_OPEN (probe)
    Service->>DB: Test request
    DB-->>Service: Success!
    Circuit-->>Service: CLOSED (recovered)

Frequently asked questions

What is the circuit breaker pattern?
It's a resilience pattern that wraps calls to an unreliable service. When failures spike (e.g. database times out 5 times in a row), the circuit breaker opens and stops sending requests, returning errors immediately instead. After a timeout, it probes the service; if the probe succeeds, it closes and resumes normal operation.
Why not just let requests timeout on a broken service?
Timeouts waste resources and slow down the client. If your database is down, hammering it with requests keeps it down longer and wastes threads in your service. A circuit breaker detects the problem, stops sending requests, and gives the database time to recover. Clients get fast failures instead of slow timeouts.
What do CLOSED, OPEN, and HALF_OPEN mean?
CLOSED = normal, all requests go through. OPEN = service is failing, requests are rejected immediately (fast-fail). HALF_OPEN = service might be recovering, the breaker sends a test request; if it succeeds, transition to CLOSED, if it fails stay OPEN. This prevents yo-yo transitions between OPEN and CLOSED.
How do I set the failure threshold and timeout?
Thresholds depend on your tolerance: 5 failures in 10 seconds is common for APIs (fast detection). Timeout is usually 30–60 seconds (give the service time to restart). Start conservative and adjust based on your incident reviews — if a database restart takes 2 minutes, your timeout should be longer.

Related templates