Hash-based sharding¶
When a dataset grows past what fits on a single DB, the pattern is horizontal sharding: split rows across N physical servers by a shard key (typically tenant_id, user_id, org_id).
dorm.contrib.sharding (3.4+).
When to use¶
- Main table is past the TB; vertical scaling exhausted.
- Tenants distributed geographically (US-east, EU-west).
- Write-heavy load saturating a single primary.
When NOT to¶
- <100GB per table and <5000 QPS — vertical scaling is much simpler.
- If your queries do frequent cross-shard JOINs — sharding breaks those. Reshape your data model first.
- No clear partitioning key (queries fan out across arbitrary subsets).
API¶
from dorm.contrib.sharding import (
HashShardRouter, with_shard_key, shard_for, for_each_shard,
)
# settings.py
DATABASES = {
"default": {...},
"shard_0": {...},
"shard_1": {...},
"shard_2": {...},
"shard_3": {...},
}
DATABASE_ROUTERS = [
HashShardRouter(num_shards=4, shard_models={Order, Customer}),
]
# Request handler
from dorm.contrib.sharding import with_shard_key
@app.post("/orders")
async def create_order(request, body):
with with_shard_key(request.user.tenant_id):
order = await Order.objects.acreate(...) # routed to shard_N
return order
Deterministic hash¶
shard_for(key, num_shards) uses hashlib.blake2b with a
configurable salt, NOT Python's built-in hash() (which is
randomised per-process since Python 3.3 — that would put the same
row on different shards in different workers).
from dorm.contrib.sharding import shard_for
assert shard_for("user-42", 4) == shard_for("user-42", 4) # deterministic
# Some callers prefer their own salt for security:
shard_for("user-42", 4, salt=b"my-secret-salt")
for_each_shard — fan-out¶
For global queries (total count, batch jobs per shard):
from dorm.contrib.sharding import for_each_shard
results = for_each_shard(
lambda alias: Order.objects.using(alias).count(),
num_shards=4,
)
# {"shard_0": 1234, "shard_1": 1209, ...}
total = sum(results.values())
Sequential; wrap in asyncio.gather or threads for parallelism
if needed.
Compose with row-level multi-tenancy¶
HashShardRouter + TenantModel compose elegantly — the shard
key is usually the tenant id:
with current_tenant(tenant_id), with_shard_key(tenant_id):
Note.objects.create(title="hi")
# → tenant_id auto-filled + routed to the right shard
No active shard key¶
If your model is sharded and no with_shard_key() is active:
By design. Silent fallback to default would scatter rows
inconsistently across shards.
Rebalancing (shard splits)¶
dorm does not rebalance automatically. If you go 4 → 8 shards:
- Create the new shards (empty).
- Switch
num_shards=8in production — new rows go to the new distribution. - For each old shard, migrate rows to their new destination:
- Whether to pause traffic during migration is an ops / business call.
To avoid this pain, consistent hashing (a ring) instead of modulo. dorm doesn't ship that out of the box; consider Citus or Vitess if you need it.
Pitfalls¶
- Cross-shard JOINs impossible — each shard is an independent DB. Model your data before sharding.
allow_relationrejects cross-shard FKs: the router returnsFalsewhen obj1 / obj2 live on different aliases. Catches bugs in code before runtime.- Migrations:
dorm migrateonly runs ondefaultby default. To run on every shard:
More¶
- Helpers
- Row-level multi-tenancy — natural pairing
- API:
dorm.contrib.sharding