Rate limit state machine
Track quota consumption, backpressure, and recovery states.
Rate limiting protects your API from overload, abuse, and accidental denial-of-service attacks. But a client doesn't instantly go from "quota available" to "request rejected" — it passes through intermediate states: a warning zone where consumption is high, a throttled state where requests fail, and a recovery window where you can retry with backoff.
This template maps that lifecycle. A client in the Available state proceeds normally. As quota consumption climbs, the system enters LowQuota (send alerts before it's too late), then Throttled (reject new requests). Once the client backs off, Recovering gives a second chance: allow a trickle of requests while the quota refills.
When to use this template
- API quota docs — explain to customers what happens when they hit their limit and how long before they can resume normal traffic.
- On-call incident response — when a service is under load, trace which customers enter Throttled state and how long they stay there, informing SLA decisions.
- Traffic shaping policies — document whether you use hard limits (reject immediately) or soft limits (queue and delay), and which states trigger alerts.
How to adapt it
Customize states and transitions for your quota model:
- Multi-tier quotas — add states for each tier (Silver → Bronze → Blacklist) and transition rules based on reputation or contract terms.
- Sliding window vs. fixed window — rename "backoff window elapsed" to your specific refill method (sliding window: request 50ms stale; fixed: refill every midnight).
- Gradual degradation — instead of binary Throttled, add intermediate states like LimitedBandwidth → SlowResponse → Throttled to gracefully degrade before hard rejection.
Visual edits regenerate clean code, so you can map your quota enforcement rules to states without manual syntax.
Mermaid code
Copy it anywhere Mermaid is supported — GitHub, Notion, or your docs.
stateDiagram-v2
[*] --> Available
Available --> LowQuota: 80% consumed
Available --> Throttled: quota exceeded
LowQuota --> Available: quota refilled
LowQuota --> Throttled: quota exceeded
Throttled --> Recovering: apply backoff
Recovering --> Available: backoff window elapsed
Recovering --> Throttled: new request during backoff
note right of Available
Normal operation,
requests proceed
end note
note right of LowQuota
Warning state,
alerts sent
end note
note right of Throttled
Over quota,
requests rejected
end note
note right of Recovering
Backoff active,
allow retries
end note
Frequently asked questions
- What is a rate limit state machine?
- It defines the lifecycle of a quota-tracked client or endpoint as it consumes its allowance and recovers. Most systems move through: Available (normal), LowQuota (warning), Throttled (rejected), and Recovering (backoff). Understanding these states helps teams design alerts, client logic, and SLA targets.
- When should I alert on LowQuota?
- Alert as soon as you hit 70-80% of your quota — before you run out. This gives teams time to investigate the spike (legitimate traffic vs. a runaway bot) and either optimize their usage or request a higher limit. Waiting until Throttled means losing requests.
- How do I implement this state machine?
- Track the client's cumulative usage against the limit in your API gateway or middleware. Every request increments the counter; every refill window resets it (hourly, daily, etc.). Emit an event when you cross thresholds so alerting systems can react. Visual edits let you rename states and transitions to match your quota model.
- Should rate limiting be per-user, per-API key, or per-IP?
- Use API keys for authenticated users (fine-grained per-customer limits) and IP-based for anonymous traffic (prevent scraping/abuse). Combine both: a rate limit per IP and per key. This prevents a single abusive IP from hurting legitimate customers sharing that IP.
Related templates
API error handling flow
Client-side error handling strategies for API requests and failures.
Error handling and recovery flow
API error response paths and recovery strategies.
Request timeout and retry pattern
Handle slow/failing requests with exponential backoff and max retries.