All templates
State template

Rate limit state machine

Track quota consumption, backpressure, and recovery states.

Rate limiting protects your API from overload, abuse, and accidental denial-of-service attacks. But a client doesn't instantly go from "quota available" to "request rejected" — it passes through intermediate states: a warning zone where consumption is high, a throttled state where requests fail, and a recovery window where you can retry with backoff.

This template maps that lifecycle. A client in the Available state proceeds normally. As quota consumption climbs, the system enters LowQuota (send alerts before it's too late), then Throttled (reject new requests). Once the client backs off, Recovering gives a second chance: allow a trickle of requests while the quota refills.

When to use this template

  • API quota docs — explain to customers what happens when they hit their limit and how long before they can resume normal traffic.
  • On-call incident response — when a service is under load, trace which customers enter Throttled state and how long they stay there, informing SLA decisions.
  • Traffic shaping policies — document whether you use hard limits (reject immediately) or soft limits (queue and delay), and which states trigger alerts.

How to adapt it

Customize states and transitions for your quota model:

  • Multi-tier quotas — add states for each tier (Silver → Bronze → Blacklist) and transition rules based on reputation or contract terms.
  • Sliding window vs. fixed window — rename "backoff window elapsed" to your specific refill method (sliding window: request 50ms stale; fixed: refill every midnight).
  • Gradual degradation — instead of binary Throttled, add intermediate states like LimitedBandwidth → SlowResponse → Throttled to gracefully degrade before hard rejection.

Visual edits regenerate clean code, so you can map your quota enforcement rules to states without manual syntax.

Mermaid code

Copy it anywhere Mermaid is supported — GitHub, Notion, or your docs.

stateDiagram-v2
    [*] --> Available
    Available --> LowQuota: 80% consumed
    Available --> Throttled: quota exceeded
    LowQuota --> Available: quota refilled
    LowQuota --> Throttled: quota exceeded
    Throttled --> Recovering: apply backoff
    Recovering --> Available: backoff window elapsed
    Recovering --> Throttled: new request during backoff
    note right of Available
        Normal operation,
        requests proceed
    end note
    note right of LowQuota
        Warning state,
        alerts sent
    end note
    note right of Throttled
        Over quota,
        requests rejected
    end note
    note right of Recovering
        Backoff active,
        allow retries
    end note

Frequently asked questions

What is a rate limit state machine?
It defines the lifecycle of a quota-tracked client or endpoint as it consumes its allowance and recovers. Most systems move through: Available (normal), LowQuota (warning), Throttled (rejected), and Recovering (backoff). Understanding these states helps teams design alerts, client logic, and SLA targets.
When should I alert on LowQuota?
Alert as soon as you hit 70-80% of your quota — before you run out. This gives teams time to investigate the spike (legitimate traffic vs. a runaway bot) and either optimize their usage or request a higher limit. Waiting until Throttled means losing requests.
How do I implement this state machine?
Track the client's cumulative usage against the limit in your API gateway or middleware. Every request increments the counter; every refill window resets it (hourly, daily, etc.). Emit an event when you cross thresholds so alerting systems can react. Visual edits let you rename states and transitions to match your quota model.
Should rate limiting be per-user, per-API key, or per-IP?
Use API keys for authenticated users (fine-grained per-customer limits) and IP-based for anonymous traffic (prevent scraping/abuse). Combine both: a rate limit per IP and per key. This prevents a single abusive IP from hurting legitimate customers sharing that IP.

Related templates