Request timeout and retry pattern
Handle slow/failing requests with exponential backoff and max retries.
The internet is unreliable. Networks drop packets, servers get overloaded, and load balancers time out slow requests. A robust client doesn't fail on the first hiccup — it retries with exponential backoff, giving the server time to recover without hammering it with duplicate requests.
This template shows the complete timeout and retry dance: send a request with a deadline, wait for a response, and if you hit a timeout or a server error, wait a bit longer each time before trying again. The backoff prevents thundering-herd scenarios where dozens of clients all retry at once, turning a momentary blip into a full outage.
When to use this template
- API client libraries — document how your SDK handles transient failures so users know what to expect when the network is flaky.
- Resilience policy docs — define retry budgets and timeout windows for different API endpoints (file uploads need longer timeouts than metadata lookups).
- On-call runbooks — trace request flow under load so on-call engineers can explain cascading failures to stakeholders ("Why did 30% of requests fail?").
How to adapt it
Customize the decision points and delays to your system:
- Vary timeout by operation — add a diamond after "Send request": "Is this a long-lived operation (upload/export)?" to branch to 30s vs 5s timeouts.
- Add circuit breaker — after "Increment retry count", check if the service has been failing frequently; if so, fail-fast instead of burning retries.
- Jitter and backoff strategies — replace "Increase backoff delay" with full-jitter or decorrelated jitter to avoid thundering herds when multiple clients retry together.
Visual edits regenerate clean code, so you can document your real retry thresholds without manual syntax updates.
Mermaid code
Copy it anywhere Mermaid is supported — GitHub, Notion, or your docs.
flowchart TD
A[Send HTTP request] --> B[Set timeout clock]
B --> C{Response within timeout?}
C -->|Yes| D{Status code ok?}
D -->|2xx| E[Return success]
D -->|5xx/other| F[Increment retry count]
C -->|No - timeout| F
F --> G{Retry count < max?}
G -->|No| H[Return error to caller]
G -->|Yes| I[Wait with backoff]
I --> J{Max backoff reached?}
J -->|No| K[Increase backoff delay]
J -->|Yes| L[Use max backoff]
K --> A
L --> A
Frequently asked questions
- What is a request timeout and retry pattern?
- It's a fault-tolerance strategy for unreliable networks and overloaded services. When a request doesn't complete within a deadline or returns a server error, the client waits an increasing amount of time before retrying — up to a maximum retry count. This prevents thundering-herd problems and gives transient failures time to recover.
- Why use exponential backoff instead of retrying immediately?
- Immediate retries hammer an already-stressed service and make things worse. Exponential backoff (wait 100ms, 200ms, 400ms, 800ms…) gives the service time to drain its queue and recover. It also prevents your client from becoming a denial-of-service attack on itself.
- How do I adapt this for my API?
- Set the timeout based on your SLA: 5s for most APIs, 10-30s for file uploads. Start backoff at 100-500ms and double it each retry. Set max retries to 3-5 (including the original attempt). Add jitter (random ±10%) to prevent multiple clients retrying simultaneously. Visual edits let you adjust the decision thresholds without rewriting code.
- Should I retry all errors or only some?
- Retry only idempotent requests on transient errors: 408 (timeout), 429 (rate limit), 502/503/504 (server error). Never retry 4xx client errors (401, 403, 400) or non-idempotent POST requests unless you've confirmed idempotency. Add this logic to the 'Status code ok?' diamond with more granular branches.