Question 1

What does an API rate limiting sequence diagram show?

Accepted Answer

It shows the exact conversation between a client, the API gateway, the rate limiter, and the backend for both outcomes: a request that passes the quota check and reaches the service, and one that gets blocked with a 429 Too Many Requests plus a Retry-After header. The alt block makes both paths visible side by side.

Question 2

Where should rate limiting live — in the gateway or the service?

Accepted Answer

Most teams enforce limits at the gateway, exactly as this diagram shows, so blocked requests never consume backend capacity. The limiter is drawn as a separate participant because it usually is one — a Redis-backed counter or a dedicated service — shared across all gateway instances so quotas stay consistent.

Question 3

What headers should a rate-limited API return?

Accepted Answer

On success, return X-RateLimit-Limit and X-RateLimit-Remaining so clients can pace themselves before hitting the wall. On a 429, always include Retry-After so well-behaved clients know exactly how long to back off. This template models both, which is why it doubles as API documentation for consumers.

Question 4

How do I adapt this diagram for token bucket or sliding window limits?

Accepted Answer

The participants stay the same — only the limiter's internal decision changes. Rename the quota-check message to match your algorithm, or add a note over the Rate Limiter describing the bucket refill rate. Visual edits regenerate clean Mermaid code, so the result stays paste-ready for your API docs.

API rate limiting sequence

When to use this template

How to adapt it

Mermaid code

Frequently asked questions

Related templates

API gateway request routing

Microservice request flow

Service mesh communication