Online migrations (zero-downtime)¶
End-to-end recipe for adding a NOT NULL column with a default to
a large table without rewriting it and without downtime.
Added in 4.0.
The problem¶
operations = [
AddField(
"Order",
"currency",
dorm.CharField(max_length=3, null=False, default="USD"),
),
]
On PostgreSQL ≤ 10 this rewrites the entire table (ALTER TABLE
... NOT NULL DEFAULT 'USD'). A 50M-row table takes hours; during
that window every write blocks on the ACCESS EXCLUSIVE lock.
Down.
PG 11+ optimises if the default is non-volatile — but a
Python-computed default or an ALTER changing the type still
rewrites.
The recipe¶
Three operations, each in its own migration (or three steps in the same migration):
from dorm.migrations.operations import (
AddFieldOnline, BackfillBatch, SetNotNullOnline,
)
operations = [
# Step 1 — column nullable, no rewrite.
AddFieldOnline(
"Order",
"currency",
dorm.CharField(max_length=3, null=False, default="USD"),
),
# Step 2 — chunked backfill by PK range.
BackfillBatch(
table="orders",
update_sql=(
'UPDATE "orders" SET "currency" = \'USD\' '
'WHERE "id" BETWEEN %s AND %s '
'AND "currency" IS NULL'
),
pk_column="id",
batch_size=10_000,
sleep_seconds=0.05, # throttle pressure on primary
),
# Step 3 — promote to NOT NULL without rewrite (PG ≥ 12).
SetNotNullOnline("Order", "currency"),
]
What each step does¶
AddFieldOnline¶
-- On PostgreSQL: metadata-only, instantaneous.
ALTER TABLE "orders" ADD COLUMN "currency" VARCHAR(3) NULL;
The field is declared NOT NULL on the model, but the op forces
nullable temporarily. No rewrite.
Optional set_not_null_now=True: if the table is small (< 1000
rows) and the default is safe, runs steps 2+3 inline. For large
tables leave it default (False).
BackfillBatch¶
-- Loop over PK ranges, each range in its own tx:
UPDATE "orders" SET "currency" = 'USD'
WHERE "id" BETWEEN 1 AND 10000 AND "currency" IS NULL;
COMMIT;
-- ... next batch ...
Each batch:
- Takes batch_size rows (default 10k).
- Dedicated transaction.
- Row-level lock for the duration of the UPDATE.
- sleep_seconds between batches to avoid I/O saturation.
Parameters:
- batch_size — lower = shorter locks but more commit overhead.
10k is a reasonable starting point.
- sleep_seconds — pause between batches. 0.05s typical for
primaries serving live traffic. 0 if the table is frozen.
- max_batches — cut the migration after N batches (testing /
incremental rollout).
SetNotNullOnline¶
On PG ≥ 12 the trick is:
-- 1. CHECK NOT VALID — instant adopt (no row scan).
ALTER TABLE "orders"
ADD CONSTRAINT chk_orders_currency_notnull
CHECK ("currency" IS NOT NULL) NOT VALID;
-- 2. VALIDATE — scans with SHARE UPDATE EXCLUSIVE lock,
-- doesn't block readers/writers.
ALTER TABLE "orders" VALIDATE CONSTRAINT chk_orders_currency_notnull;
-- 3. SET NOT NULL — metadata-only now that the validated CHECK
-- proves no row violates the constraint.
ALTER TABLE "orders" ALTER COLUMN "currency" SET NOT NULL;
-- 4. Drop the redundant CHECK.
ALTER TABLE "orders" DROP CONSTRAINT chk_orders_currency_notnull;
PG ≤ 11 lacks the step-3 optimisation — the op falls back to an
ALTER COLUMN SET NOT NULL that does rewrite. If your target is
11, consider leaving the column nullable.
When to split into 3 migrations¶
If you'll deploy the application code and the migration in separate releases, split:
- Release N — deploy
AddFieldOnline. The column exists, nullable; old code doesn't read it; new code isn't deployed yet. - Backfill batch job — run
BackfillBatchas a migration or standalone script outside the release window. May take hours; the app keeps working because the field is nullable. - Release N+1 — deploy
SetNotNullOnlineplus any code that relies on the NOT NULL. If the backfill didn't finish before this release, the migration fails (visible and early).
Caveat on BackfillBatch.update_sql¶
You write the SQL — use %s (PG) or ? (SQLite) placeholders per
backend, and always include:
BETWEEN %s AND %sfor the PK range (both%sare bound by the op).WHERE ... IS NULLfor idempotence (rerunning the job doesn't duplicate work).
update_sql=(
'UPDATE "orders" '
'SET "currency" = "billing_country_currency" ' # some computation
'WHERE "id" BETWEEN %s AND %s '
'AND "currency" IS NULL' # idempotent
),
Backend caveats¶
- PostgreSQL ≥ 12: full recipe, no rewrite.
- PostgreSQL 10–11:
SetNotNullOnlinefalls back toALTER COLUMN SET NOT NULLwhich does rewrite. Consider leaving nullable. - SQLite:
AddFieldOnlinealways adds nullable (the only form ofADD COLUMN NOT NULLrequires a DEFAULT). Backfill works.SetNotNullOnlinefalls back to a rewrite. For large tables, consider aRunSQLwith theCREATE TABLE ... AS SELECT; DROP; RENAMErecipe. - MySQL: DDL is not transactional. Each step commit-or-die. Plan rollback manually.
- DuckDB: recipe works, no SAVEPOINT. The table rewrites on
SET NOT NULLbecause of the columnar architecture — but rewriting is cheap on DuckDB.
Tests¶
For unit tests use the pattern shown in
tests/test_online_migrations.py. Each op accepts a _State
mock, so you don't need the full migration runner.
More¶
- Migrations — base ops + new ops
dorm diff— post-deploy CI gate- Advanced — companion PG features