Event-driven notification system
Event publisher, queue, consumers, and retry/failure handling.
Notification systems are where silent failures hide. A signup email never arrives, a payment alert disappears, analytics loses a day of data — all while the main action (sign up, purchase, event) succeeds. This template shows a resilient event-driven architecture: the user action publishes to a queue, multiple consumers (email, SMS, analytics) read independently, and failed sends retry with backoff before moving to a dead-letter queue for team review.
The three consumer branches show fan-out: one event triggers email, SMS, and analytics in parallel, each with its own retry and failure path. This pattern lets you add new consumers later without changing the publisher.
When to use this template
- Designing notification infrastructure — decide early whether events are transient or persistent, and whether old messages should retry after a consumer comes back online.
- Incident response — when a notification type fails, trace the path from publish to the DLQ and decide: is it a transient infrastructure failure (retry) or a permanent bug (deploy a fix)?
- Onboarding new consumers — use this diagram to show where a new notification type (Slack alerts, webhooks) plugs in and what failure modes to handle.
How to adapt it
Rename the queue and consumers to your real ones — RabbitMQ or Kafka for the queue, SQS Lambda for email, PagerDuty for alerts — and layer in your domain-specific logic:
- Add priority tiers so critical events (payment failures) bypass the queue and call a service synchronously, while analytics uses best-effort async.
- Insert message deduplication if your producer can publish the same event twice and you need exactly-once delivery semantics.
- Add consumer concurrency limits per user or order to show ordering constraints (make sure refund emails arrive before invoice emails).
Visual edits regenerate clean code, so you can sketch queue partitions and consumer dependencies without writing Mermaid syntax directly.
Mermaid code
Copy it anywhere Mermaid is supported — GitHub, Notion, or your docs.
flowchart TD
A[User action triggers event] --> B[Publish to event queue]
B --> C{Event enqueued?}
C -->|No| D[Log publish failure]
D --> E[Alert & retry]
C -->|Yes| F[Email consumer reads event]
G[SMS consumer reads event]
H[Analytics consumer reads event]
F --> I{Send email}
G --> J{Send SMS}
H --> K{Record event}
I -->|Failed| L[Retry with backoff]
J -->|Failed| L
K -->|Failed| L
I -->|Success| M[Mark complete]
J -->|Success| M
K -->|Success| M
L --> N{Max retries exceeded?}
N -->|No| O[Re-enqueue after delay]
N -->|Yes| P[Move to dead-letter queue]
O --> F
P --> Q[Alert team for manual review]
Frequently asked questions
- What is an event-driven notification system?
- It's an architecture where user actions (sign up, purchase, upload file) publish events to a queue, and multiple independent consumer services (email, SMS, analytics) subscribe to those events. The queue decouples the action from the notifications, so placing an order does not block on sending email — and if email is down, the order still succeeds.
- Why use a queue instead of calling the notification service directly?
- Direct calls are synchronous and brittle: if the email service is slow or down, your main service hangs or fails. A queue is asynchronous and resilient: the publisher fires and forgets, consumers process events at their own speed, and if a consumer crashes, messages wait in the queue until it recovers.
- What is a dead-letter queue and why is it important?
- A dead-letter queue (DLQ) is where messages go after they fail to process after N retries. It separates transient failures (email provider overloaded, retried successfully) from permanent ones (malformed event, consumer bug). Without a DLQ, messages either get dropped silently or block the queue forever.
- How do I extend this diagram for priority queues or ordered events?
- Add a priority check after the publish step: high-priority events (payment failures) go to an expedited queue, low-priority (analytics) go to the batch queue. For ordering, add a per-user or per-entity queue partition so events for the same resource process in order. Visual edits regenerate clean Mermaid code, so you can sketch these variations without syntax.