All templates
Flowchart template

Zero-downtime database migration

Dual-write, shadow-read, backfill, and cutover strategy for production data.

Migrating a production database schema with zero downtime is a rite of passage in infrastructure engineering. This template maps the standard playbook: create the new schema, add dual-write code so all writes go to both old and new, backfill historical data in batches, shadow-read both schemas to verify they match, cutover to the new schema, roll back instantly if anything breaks, then clean up the old one.

The key insight is that you never shut down. Instead, you run both schemas in parallel, prove they match, and only then switch. If something goes wrong at any step, you roll back to old data without losing a single write.

When to use this template

  • Planning a schema change — before you start coding the migration, walk through this diagram with your team so everyone agrees on the backfill strategy (batch size, shadow-read window, rollback triggers).
  • On-call runbook — keep this diagram handy during the cutover so you know exactly what to do if shadow-read reports a mismatch or reads start erroring.
  • Compliance and incident reviews — show auditors your migration process, proving that you never had a window where data was inconsistent or unavailable.

How to adapt it

Customize the timeline and thresholds to your specific migration:

  • Add data validation checks (row counts, checksums on sensitive columns) between shadow-read and cutover to catch subtle bugs like timezone mismatches or encoding drift.
  • Insert canary cutover (route 1% of traffic to new schema, wait 1 hour, then 10%, then 100%) instead of an all-at-once switch, so slow bugs surface before they hit everyone.
  • Extend rollback to show alerting and escalation: if rollback is triggered, page the on-call database engineer immediately so they can investigate why the new schema failed.

Visual edits regenerate clean Mermaid, so you can sketch your migration timeline and share it with databases, backend, and DevOps teams before you start writing code.

Mermaid code

Copy it anywhere Mermaid is supported — GitHub, Notion, or your docs.

flowchart TD
    A[Start: Old schema live] --> B[Create new schema]
    B --> C[Add dual-write code]
    C --> D{Dual-write verified?}
    D -->|No| E[Debug dual-write]
    E --> C
    D -->|Yes| F[Begin backfill]
    F --> G[Copy old data to new schema]
    G --> H{Backfill complete?}
    H -->|No| I[Continue backfill batches]
    I --> G
    H -->|Yes| J[Add shadow-read code]
    J --> K[Read from both, compare]
    K --> L{Reads match?}
    L -->|No| M[Debug data mismatch]
    M --> J
    L -->|Yes| N[Cutover: switch to new]
    N --> O[Monitor new schema]
    O --> P{Errors detected?}
    P -->|Yes| Q[Rollback to old schema]
    Q --> A
    P -->|No| R[Cleanup: remove dual-write]
    R --> S[End: New schema only]

Frequently asked questions

What is a zero-downtime database migration?
A zero-downtime migration moves data from an old database schema to a new one without shutting down your service. The trick is to run both schemas in parallel: your code writes to both (dual-write), reads from the old one, and verifies that new data matches old data (shadow-read). Once they align, you switch reads to the new schema and clean up the old one. If anything goes wrong mid-migration, you roll back instantly.
Why can't we just dump the old database and load the new one?
Because your service is running and users are actively writing data. If you pause writes, you lose revenue; if you dump during writes, you corrupt the new database. A migration that does not coordinate with live writes will always lose data or cause downtime. Dual-write ensures data consistency even as writes arrive during the backfill.
What does 'shadow-read' mean and why is this diagram critical?
Shadow-read means reading from both the old and new database, comparing the results, and alerting if they differ. This is your safety net: if the backfill missed rows or the schema transformation lost data, shadow-read catches it before you commit. Running shadow-read for 30 minutes before cutover catches most bugs that would otherwise turn into customer-visible data loss.
What happens if the cutover fails and I need to rollback?
Rollback is instant: flip your code back to reading from the old schema, keep dual-writing to both, and investigate why the new schema differs. Once you fix the issue (usually data mismatch or a schema bug), resume shadow-read verification, then retry cutover. This safety net is why zero-downtime migrations take time — but they never take the system down.

Related templates