Question 1

What is an auto-scaling decision tree?

Accepted Answer

It models when and how to scale infrastructure in response to load: checking CPU, memory, latency, and cost, then deciding whether to scale horizontally (more instances) or vertically (bigger instances). It makes the trade-offs between performance, cost, and complexity explicit.

Question 2

Why check cost before scaling?

Accepted Answer

Because scaling fixes the symptom (high CPU) but might not fix the root cause (inefficient code or database query). If cost is rising, vertical scaling or code optimization might be cheaper than horizontal scaling. This decision tree forces teams to consider both.

Question 3

What metrics should trigger auto-scaling in production?

Accepted Answer

Use multiple signals: CPU > 70%, memory > 80%, or request latency > your SLA (e.g., p99 > 500ms). Scale before you hit 100% to avoid traffic loss during scale-out delays. Use predictions (if trend suggests CPU will hit 90% in 5 minutes, scale now) for proactive scaling.

Question 4

How do I model pod eviction and graceful shutdown in this diagram?

Accepted Answer

After 'Scale out pods', add a decision: 'Draining existing pods?' If yes, route through graceful shutdown (wait for in-flight requests to complete before removing). If no, force-terminate. Add a feedback loop: if scale-out fails due to resource limits, escalate to ops or alert on-call.

Auto-scaling decision tree

When to use this template

How to adapt it

Mermaid code

Frequently asked questions

Related templates

Database migration flow

Kubernetes deployment pipeline

Network topology diagram