The Database Is Almost Always the Bottleneck
Stateless application servers scale horizontally with ease — add more instances, add a load balancer. Databases are stateful and do not scale the same way. At some point, every growing application hits database limits: CPU saturation from complex queries, I/O bottlenecks on write-heavy workloads, or connection pool exhaustion as application instances multiply.
The mistake most teams make: waiting until the database is on fire before thinking about scaling strategy. The right time to plan is now.
Stage 1: Read Replicas
Most production applications read far more than they write (typical ratio: 80/20 to 95/5). Read replicas are the first and highest-leverage scaling move.
A primary handles all writes. One or more replicas stream changes from the primary and handle read queries. Replication lag is typically sub-100ms on modern PostgreSQL with streaming replication.
// Route reads to replica, writes to primary
const primaryDb = new Pool({ connectionString: process.env.DATABASE_PRIMARY_URL })
const replicaDb = new Pool({ connectionString: process.env.DATABASE_REPLICA_URL })
async function getUser(id: string) {
return replicaDb.query('SELECT * FROM users WHERE id = $1', [id])
}
async function updateUser(id: string, data: UserUpdate) {
return primaryDb.query('UPDATE users SET ... WHERE id = $1', [...values, id])
}With Prisma, use datasource routing. With TypeORM, configure multiple connections per entity. With an ORM that doesn't support it natively, route at the service layer.
Stage 2: Connection Pooling with PgBouncer
Each PostgreSQL connection consumes ~5-10MB of RAM. A primary instance supporting 200 connections caps out quickly as application instances multiply. PgBouncer sits between your application and PostgreSQL, multiplexing thousands of application connections onto a smaller pool of actual database connections.
Transaction pooling mode (most aggressive): connections are returned to the pool after each transaction, not each session. Reduces actual PostgreSQL connections by 10-20x.
Stage 3: Query Optimization Before Sharding
Sharding is complex and operationally expensive. Before going there, exhaust query optimization:
- Index everything you filter or sort on — a missing index on a 10M row table causes full table scans
- Partial indexes for frequently filtered subsets (`WHERE status = 'active'`)
- Materialized views for expensive aggregation queries that are read-heavy
- EXPLAIN ANALYZE every slow query — seq scans and hash joins on large tables are the usual suspects
A single missing index has saved teams from premature sharding more than once.
Stage 4: Vertical Partitioning (CQRS)
Command Query Responsibility Segregation separates the write model (commands) from the read model (queries). The write database is normalized for consistency. The read database is denormalized and optimized for query patterns — often a different storage engine entirely.
Write path: API → Command Handler → PostgreSQL (normalized, ACID)
│
Event bus
│
Read path: API → Query Handler ← Read Store (Redis / Elasticsearch / denormalized PG)This pattern shines when your read queries are complex joins that perform poorly on your write-optimized schema. The read store is rebuilt from events and can be reshaped without migrating production data.
Stage 5: Horizontal Sharding
Sharding splits data across multiple database instances by a shard key (typically user ID or tenant ID). It is the nuclear option — complex to implement, difficult to rebalance, and painful to query across shards.
Managed sharding options have improved significantly: Citus for PostgreSQL, PlanetScale for MySQL, and Neon for serverless PostgreSQL with branching support. These handle the distribution layer, letting you write mostly standard SQL.
The Decision Tree
- Slow read queries → Read replica
- Connection exhaustion → PgBouncer
- Slow queries despite indexes → EXPLAIN ANALYZE, optimize
- Write/read patterns diverge → CQRS
- Data volume exceeds single node → Sharding
Most teams never need step 5 with modern PostgreSQL instances and proper indexing.