All templates
Flowchart template

Load balancer request routing

How requests are distributed across backend servers.

Every scaled-up application hides its complexity behind a load balancer. Requests come in to a single IP, the load balancer examines each one, decides which backend server should handle it, and forwards it along. The flow looks simple, but the routing decision — which algorithm to use, whether to stick sessions, how to handle failures — shapes uptime and performance.

This template walks through the decision tree: health checks eliminate down servers, routing algorithm selection picks the next server, and session affinity ensures stateful clients land consistently. Most outages trace back to assumptions about this flow not matching production reality.

When to use this template

  • Designing high-availability systems — agree on your health check strategy, routing algorithm, and session affinity requirements before deploying.
  • Troubleshooting user complaints — when users report random logouts or inconsistent behavior, annotate the load balancer decisions with real logs to see whether they're hitting different servers.
  • Capacity planning — use the diagram to show your team how the load balancer handles server additions and failures, and discuss what your metrics should track.

How to adapt it

Expand the diagram to match your infrastructure:

  • Add failure handling — after the health check fails, show automatic server restart or alert escalation rather than just retrying.
  • Include geo-routing — insert a geography decision before the algorithm to route users to the nearest regional load balancer.
  • Show cache layer — route cache-hit requests to a CDN or local cache before they reach the backends, with a fallback to load-balanced servers for misses.

Visual edits regenerate clean Mermaid code, so you can embed the diagram in your infrastructure runbook or incident response guide.

Mermaid code

Copy it anywhere Mermaid is supported — GitHub, Notion, or your docs.

flowchart TD
    A[Client request arrives] --> B[Load Balancer receives]
    B --> C{Health check passed?}
    C -->|No| D[Route to next server]
    C -->|Yes| E{Session affinity?}
    E -->|Sticky session| F[Route to same server]
    E -->|No sticky| G[Apply routing algorithm]
    G --> H{Algorithm type?}
    H -->|Round-robin| I[Next server in queue]
    H -->|Least connections| J[Server with fewest active]
    H -->|IP hash| K[Consistent hash of IP]
    I --> L[Send request to server]
    J --> L
    K --> L
    F --> L
    D --> B
    L --> M[Server processes]
    M --> N[Response to client]

Frequently asked questions

What does a load balancer do?
A load balancer sits between clients and your backend servers, distributing incoming requests across instances to prevent any single server from being overwhelmed. It performs health checks to route traffic only to healthy servers, applies routing algorithms like round-robin or least-connections, and maintains session affinity so users stay on the same server when needed.
What is session affinity or sticky sessions?
Session affinity ensures that once a user hits a particular server, all their subsequent requests go to that same server. This is critical when server memory holds user session state (shopping carts, login status). Without sticky sessions, a second request might land on a different server that doesn't have that user's session data, forcing them to log in again.
How is a load balancer different from a service mesh?
A load balancer handles traffic distribution at the network edge — between clients and servers. A service mesh (like Istio) is deployed inside your cluster and manages communication between your microservices. Service meshes offer finer-grained control (per-RPC routing, canary deployments) but add overhead; load balancers are simpler and more efficient for client-facing traffic.
Why do we need health checks?
Without health checks, the load balancer might send requests to a server that has crashed or become unresponsive, causing user errors. Health checks — periodic pings or HTTP requests to a /health endpoint — let the load balancer detect failures in seconds and route traffic away from broken instances automatically.

Related templates