Question 1

What is a service degradation strategy?

Accepted Answer

It's a plan to keep your system partially working when a critical dependency fails. Instead of returning an error to users, you gracefully degrade: serve cached data, disable certain features, or simplify the UI. Partial availability beats total unavailability. Degrees of failure response matter — some users keep working while engineers fix the root cause.

Question 2

What are the most common fallback patterns?

Accepted Answer

Read-only mode (disable writes, serve old data), cache fallback (serve fresh-cached response if backend is slow/down), feature simplification (stripe down to essentials), and circuit breaker (stop trying after N failures to save backend load). Choose based on your failure scenario: if the database is overloaded, reduce traffic; if an external API is down, use cached results.

Question 3

How do I decide between graceful degradation and returning an error?

Accepted Answer

Degrade if users can still get value without the dependency. Return an error if the feature cannot work at all. A maps app that loses real-time traffic data can degrade to cached routes; a payment processor going down cannot degrade—you fail fast and escalate. Know your non-negotiables.

Question 4

How long should I retry before giving up?

Accepted Answer

Exponential backoff is standard: 1s, 2s, 4s, 8s, up to a max like 60s or 5 minutes. For most systems, 30 seconds of retries is reasonable — if the dependency is still down after that, the outage is likely broader and manual intervention is needed. Set a ceiling so you don't mask a real problem with endless retries.

Service degradation strategy

When to use this template

How to adapt it

Mermaid code

Frequently asked questions

Related templates

Deployment rollback decision tree

Chaos engineering experiment

Database backup and recovery process