All templates
Sequence template

API rate limiting sequence

Client, gateway, and limiter handling a 429 with Retry-After.

When an API consumer reports mysterious 429s, the first question is always the same: who decided to block that request, and what did the client actually see? This sequence diagram answers it in one screen. A client calls the gateway, the gateway asks the rate limiter to check the quota for the API key, and the alt block shows both outcomes — a forwarded request returning 200 with an X-RateLimit-Remaining header, or a block returning 429 with Retry-After.

Drawing the limiter as its own participant matters. It reflects the real architecture in most production systems, where quota state lives in a shared store rather than inside any single gateway instance, and it makes clear that the backend service never sees over-limit traffic at all.

When to use this template

  • API documentation for consumers — show client developers exactly which headers to expect on success and which status code and Retry-After behavior to handle when they exceed their quota.
  • Architecture reviews — make explicit that rate limiting happens before the backend, so capacity planning for the service can ignore blocked traffic.
  • Incident postmortems — when a burst of 429s hits production, annotate this diagram with the actual quota values to explain what happened and why.

How to adapt it

Rename the participants to your real components, then extend the flow:

  • Add an authentication step before the quota check if your gateway validates the API key against an auth service first.
  • Split limits into per-key and per-IP checks with a second message to the limiter, mirroring layered defenses against abuse.
  • Add a note block documenting your algorithm — fixed window, sliding window, or token bucket — and the exact limits per plan tier.

The visual editor regenerates clean Mermaid code as you drag and rename, so the diagram you tweak here drops straight into a README or developer portal.

Mermaid code

Copy it anywhere Mermaid is supported — GitHub, Notion, or your docs.

sequenceDiagram
    participant C as Client
    participant G as API Gateway
    participant L as Rate Limiter
    participant S as Backend Service

    C->>G: GET /api/resource
    G->>L: Check quota for API key
    alt Under limit
        L-->>G: Allowed (remaining 49)
        G->>S: Forward request
        S-->>G: 200 OK
        G-->>C: 200 OK + X-RateLimit-Remaining
    else Over limit
        L-->>G: Blocked
        G-->>C: 429 Too Many Requests + Retry-After
    end

Frequently asked questions

What does an API rate limiting sequence diagram show?
It shows the exact conversation between a client, the API gateway, the rate limiter, and the backend for both outcomes: a request that passes the quota check and reaches the service, and one that gets blocked with a 429 Too Many Requests plus a Retry-After header. The alt block makes both paths visible side by side.
Where should rate limiting live — in the gateway or the service?
Most teams enforce limits at the gateway, exactly as this diagram shows, so blocked requests never consume backend capacity. The limiter is drawn as a separate participant because it usually is one — a Redis-backed counter or a dedicated service — shared across all gateway instances so quotas stay consistent.
What headers should a rate-limited API return?
On success, return X-RateLimit-Limit and X-RateLimit-Remaining so clients can pace themselves before hitting the wall. On a 429, always include Retry-After so well-behaved clients know exactly how long to back off. This template models both, which is why it doubles as API documentation for consumers.
How do I adapt this diagram for token bucket or sliding window limits?
The participants stay the same — only the limiter's internal decision changes. Rename the quota-check message to match your algorithm, or add a note over the Rate Limiter describing the bucket refill rate. Visual edits regenerate clean Mermaid code, so the result stays paste-ready for your API docs.

Related templates