Database Scaling Patterns: Replicas, Sharding & CQRS

The Database Is Almost Always the Bottleneck

Stateless application servers scale horizontally with ease — add more instances, add a load balancer. Databases are stateful and do not scale the same way. At some point, every growing application hits database limits: CPU saturation from complex queries, I/O bottlenecks on write-heavy workloads, or connection pool exhaustion as application instances multiply.

The mistake most teams make: waiting until the database is on fire before thinking about scaling strategy. The right time to plan is now.

Stage 1: Read Replicas

Most production applications read far more than they write (typical ratio: 80/20 to 95/5). Read replicas are the first and highest-leverage scaling move.

A primary handles all writes. One or more replicas stream changes from the primary and handle read queries. Replication lag is typically sub-100ms on modern PostgreSQL with streaming replication.

    // Route reads to replica, writes to primary
    const primaryDb = new Pool({ connectionString: process.env.DATABASE_PRIMARY_URL })
    const replicaDb = new Pool({ connectionString: process.env.DATABASE_REPLICA_URL })

    async function getUser(id: string) {
      return replicaDb.query('SELECT * FROM users WHERE id = $1', [id])
    }

    async function updateUser(id: string, data: UserUpdate) {
      return primaryDb.query('UPDATE users SET ... WHERE id = $1', [...values, id])
    }

With Prisma, use datasource routing. With TypeORM, configure multiple connections per entity. With an ORM that doesn't support it natively, route at the service layer.

Stage 2: Connection Pooling with PgBouncer

Each PostgreSQL connection consumes ~5-10MB of RAM. A primary instance supporting 200 connections caps out quickly as application instances multiply. PgBouncer sits between your application and PostgreSQL, multiplexing thousands of application connections onto a smaller pool of actual database connections.

Transaction pooling mode (most aggressive): connections are returned to the pool after each transaction, not each session. Reduces actual PostgreSQL connections by 10-20x.

Stage 3: Query Optimization Before Sharding

Sharding is complex and operationally expensive. Before going there, exhaust query optimization:

Index everything you filter or sort on — a missing index on a 10M row table causes full table scans
Partial indexes for frequently filtered subsets (`WHERE status = 'active'`)
Materialized views for expensive aggregation queries that are read-heavy
EXPLAIN ANALYZE every slow query — seq scans and hash joins on large tables are the usual suspects

A single missing index has saved teams from premature sharding more than once.

Stage 4: Vertical Partitioning (CQRS)

Command Query Responsibility Segregation separates the write model (commands) from the read model (queries). The write database is normalized for consistency. The read database is denormalized and optimized for query patterns — often a different storage engine entirely.

    Write path:  API → Command Handler → PostgreSQL (normalized, ACID)
                                                  │
                                             Event bus
                                                  │
    Read path:   API → Query Handler  ← Read Store (Redis / Elasticsearch / denormalized PG)

This pattern shines when your read queries are complex joins that perform poorly on your write-optimized schema. The read store is rebuilt from events and can be reshaped without migrating production data.

Stage 5: Horizontal Sharding

Sharding splits data across multiple database instances by a shard key (typically user ID or tenant ID). It is the nuclear option — complex to implement, difficult to rebalance, and painful to query across shards.

Managed sharding options have improved significantly: Citus for PostgreSQL, PlanetScale for MySQL, and Neon for serverless PostgreSQL with branching support. These handle the distribution layer, letting you write mostly standard SQL.

The Decision Tree

Slow read queries → Read replica
Connection exhaustion → PgBouncer
Slow queries despite indexes → EXPLAIN ANALYZE, optimize
Write/read patterns diverge → CQRS
Data volume exceeds single node → Sharding

Most teams never need step 5 with modern PostgreSQL instances and proper indexing.

The Database Is Almost Always the Bottleneck

The mistake most teams make: waiting until the database is on fire before thinking about scaling strategy. The right time to plan is now.

Stage 1: Read Replicas

Most production applications read far more than they write (typical ratio: 80/20 to 95/5). Read replicas are the first and highest-leverage scaling move.

A primary handles all writes. One or more replicas stream changes from the primary and handle read queries. Replication lag is typically sub-100ms on modern PostgreSQL with streaming replication.

    // Route reads to replica, writes to primary
    const primaryDb = new Pool({ connectionString: process.env.DATABASE_PRIMARY_URL })
    const replicaDb = new Pool({ connectionString: process.env.DATABASE_REPLICA_URL })

    async function getUser(id: string) {
      return replicaDb.query('SELECT * FROM users WHERE id = $1', [id])
    }

    async function updateUser(id: string, data: UserUpdate) {
      return primaryDb.query('UPDATE users SET ... WHERE id = $1', [...values, id])
    }

With Prisma, use datasource routing. With TypeORM, configure multiple connections per entity. With an ORM that doesn't support it natively, route at the service layer.

Stage 2: Connection Pooling with PgBouncer

Transaction pooling mode (most aggressive): connections are returned to the pool after each transaction, not each session. Reduces actual PostgreSQL connections by 10-20x.

Stage 3: Query Optimization Before Sharding

Sharding is complex and operationally expensive. Before going there, exhaust query optimization:

Index everything you filter or sort on — a missing index on a 10M row table causes full table scans
Partial indexes for frequently filtered subsets (`WHERE status = 'active'`)
Materialized views for expensive aggregation queries that are read-heavy
EXPLAIN ANALYZE every slow query — seq scans and hash joins on large tables are the usual suspects

A single missing index has saved teams from premature sharding more than once.

Stage 4: Vertical Partitioning (CQRS)

    Write path:  API → Command Handler → PostgreSQL (normalized, ACID)
                                                  │
                                             Event bus
                                                  │
    Read path:   API → Query Handler  ← Read Store (Redis / Elasticsearch / denormalized PG)

Stage 5: Horizontal Sharding

The Decision Tree

Slow read queries → Read replica
Connection exhaustion → PgBouncer
Slow queries despite indexes → EXPLAIN ANALYZE, optimize
Write/read patterns diverge → CQRS
Data volume exceeds single node → Sharding

Most teams never need step 5 with modern PostgreSQL instances and proper indexing.

Database Scaling Patterns: Replicas, Sharding & CQRS

The Database Is Almost Always the Bottleneck

Stage 1: Read Replicas

Stage 2: Connection Pooling with PgBouncer

Stage 3: Query Optimization Before Sharding

Stage 4: Vertical Partitioning (CQRS)

Stage 5: Horizontal Sharding

The Decision Tree

More from the blog

Scaling to 1M+ Requests/Day

Redis Beyond Caching: Pub/Sub, Queues & Rate Limiting at Scale

Want to build something like this?

Database Scaling Patterns: Replicas, Sharding & CQRS

The Database Is Almost Always the Bottleneck

Stage 1: Read Replicas

Stage 2: Connection Pooling with PgBouncer

Stage 3: Query Optimization Before Sharding

Stage 4: Vertical Partitioning (CQRS)

Stage 5: Horizontal Sharding

The Decision Tree

More from the blog

Scaling to 1M+ Requests/Day

Redis Beyond Caching: Pub/Sub, Queues & Rate Limiting at Scale

Want to build something like this?