Data pipeline (ETL)
Extract, transform, and load with quality checks.
Every data pipeline diagram shows the happy path; the trustworthy ones also show where bad data goes. This template includes both: extract from source systems, a schema gate, transform and enrich, a quality gate, and the load into the warehouse — with both gates routing failures to a dead-letter queue instead of letting them vanish. The fan-out at the end, to dashboards and ML features, documents why the pipeline exists at all.
The shapes carry meaning too: cylinders for data stores, a subroutine block for the dead-letter queue, diamonds for the validation gates. Consistent notation lets a reader separate storage from processing without reading a single label.
When to use this template
- Pipeline architecture reviews — agree on where validation happens and what failure handling looks like before choosing orchestrators or writing transformation code.
- Data quality post-incidents — when bad data reaches a dashboard, trace it against this diagram. The gate it slipped through is the check you need to strengthen.
- Explaining the platform to stakeholders — analysts and ML engineers see exactly where their dashboards and features sit downstream, and why a source schema change can break them.
How to adapt it
Substitute your real systems for the generic nodes, then reflect your actual architecture:
- Split "Source systems" into named sources — production database, event stream, third-party APIs — each with its own extract path.
- Add a staging or lake layer between extract and transform if you land raw data before processing.
- Attach a replay path from the dead-letter queue back into transform to document how quarantined records re-enter after a fix.
Rearranging stages is drag-and-drop in the visual editor, and visual edits regenerate clean Mermaid code — so the architecture diagram in your data platform docs stays as reviewable as the pipeline code itself.
Mermaid code
Copy it anywhere Mermaid is supported — GitHub, Notion, or your docs.
flowchart LR
A[(Source systems)] --> B[Extract]
B --> C{Schema valid?}
C -->|No| D[[Dead-letter queue]]
C -->|Yes| E[Transform + enrich]
E --> F{Quality checks pass?}
F -->|No| D
F -->|Yes| G[(Data warehouse)]
G --> H[Dashboards]
G --> I[ML features]
Frequently asked questions
- What is an ETL pipeline diagram?
- It shows how data flows from source systems through extract, validation, transform, and quality-check stages into a warehouse, and then fans out to consumers like dashboards and ML features. The two validation gates with a shared dead-letter queue are the key detail — they document that bad records are quarantined for inspection rather than silently dropped or loaded.
- What do the different node shapes mean in this Mermaid flowchart?
- The cylinder shape, written as A[(Source systems)], is Mermaid's database notation and marks data stores — here the sources and the warehouse. The double-bracket subroutine shape, D[[Dead-letter queue]], marks a distinct subsystem with its own handling process. Using shapes consistently lets readers distinguish storage, processing, and decisions at a glance without reading every label.
- Why route failures to a dead-letter queue instead of dropping them?
- Dropped records disappear; dead-lettered records leave evidence. When schema validation or quality checks fail, quarantining the record preserves it for debugging, replay after a fix, and volume monitoring — a sudden spike in dead-letter traffic is often the first signal that an upstream system changed its schema. Both gates feeding one queue keeps that monitoring in a single place.
- Does this diagram work for ELT and streaming pipelines too?
- Yes, with small edits. For ELT, move the transform node after the warehouse to reflect in-warehouse transformation with a tool like dbt. For streaming, relabel extract as the ingestion topic and add a stream processor before the quality gate — the dead-letter pattern and the fan-out to dashboards and ML features carry over unchanged.