Multi-provider API integration with fallbacks
Try primary provider, fallback to secondary, log failures for audit.
Most applications depend on at least one external service — email (Sendgrid, Mailgun), payments (Stripe, Square), SMS (Twilio, Bandwidth), or analytics (Mixpanel, Segment). If that provider has an outage, your feature breaks. A multi-provider strategy says: try your primary provider, and if it fails, immediately fall back to a secondary. This diagram shows the happy path (primary succeeds), the fallback path (primary fails, secondary succeeds), and the failure path (both fail). Every fallback is logged so you can audit which provider handled each request and detect chronically unhealthy providers.
The key insight is that fallback must be transparent to your application code. Your feature
code should not know whether the email went through Sendgrid or Mailgun — it just calls sendEmail()
and gets back success or failure. The provider-switching logic lives in the integration layer, not
scattered through your business logic.
When to use this template
- Integration architecture reviews — map which external services you depend on and which ones have fallbacks in place; prioritize adding fallbacks to single points of failure.
- Incident response — during a provider outage, this diagram shows which services gracefully degraded (fallback kicked in) and which went dark.
- Onboarding new services — when you add a new payment gateway or SMS provider, use this as the pattern: primary + secondary + audit log.
How to adapt it
Customize for your specific providers and failure modes:
- Replace "Primary Provider" and "Secondary Provider" with your actual services (Stripe/Square, Sendgrid/Mailgun, Twilio/Bandwidth).
- Add a timeout decision — if primary takes >5 seconds, do you fail-over or wait? Choose based on your latency budget.
- Extend the audit log with your tracking fields — customer ID, request type, timestamp, provider used, response time.
- Add rate-limit handling — some providers return rate-limit errors; should you queue for later or immediately fail over?
- Include webhook reconciliation — if the primary provider is eventually slow, do you periodically sync with the secondary to catch missed events?
Visual edits regenerate clean Mermaid code as you adapt this pattern to your provider pairs.
Mermaid code
Copy it anywhere Mermaid is supported — GitHub, Notion, or your docs.
sequenceDiagram
participant App
participant Primary as Primary Provider
participant Secondary as Secondary Provider
participant Log as Audit Log
App->>Primary: Request (email/payment/analytics)
Primary--xApp: Error or timeout
App->>Log: Log primary failure
Note over App: Fallback triggered
App->>Secondary: Retry with same request
Secondary-->>App: Success
App->>Log: Log fallback success
App->>App: Return result to user
Note over App: Fallback failed
Secondary--xApp: Error or timeout
App->>Log: Log fallback failure<br/>Alert ops
App-->>App: Return error to user
Frequently asked questions
- Why use a multi-provider strategy for critical services?
- A single provider is a single point of failure. Email providers, payment gateways, SMS services, and analytics platforms all have outages. Multi-provider fallback means when Sendgrid goes down, you switch to MailChimp. When Stripe has an incident, you try Square. Your users keep working while the primary provider recovers. The tradeoff: you pay for two services and must handle subtle behavioral differences.
- What should I log and audit about fallback events?
- Log every fallback: when it happened, which primary failed, which secondary succeeded (or also failed), how long it took, and the request details (scrubbed of sensitive data). This audit trail helps you: (1) detect if a provider is chronically unhealthy, (2) investigate customer issues (which provider handled their request?), and (3) decide if you should renegotiate or switch providers. Over time, the data reveals which provider is more reliable.
- How do I handle responses that differ between providers?
- Normalize the response format before returning to your user code. Primary provider returns error code 401, secondary returns 403? Map both to your app's 'unauthorized' response. Primary returns rate-limit headers, secondary doesn't? Handle both gracefully. Wrap each provider's client with an adapter that translates its quirks into your canonical interface — then your application code never knows which provider handled the request.
- What about async operations like email or webhooks?
- Same logic: try primary, if it returns an error or timeout, queue the request for the secondary provider. For email, send immediately to primary; if it fails, fall back to secondary in the same request. For webhooks, if primary is down, queue the payload and retry secondary on a schedule. The fallback path might be async (queue → retry later) while the primary path is synchronous (return immediately). Log both paths so you can audit which provider handled each request.