The simplest flash sale system: every purchase attempt acquires a PostgreSQL row-level lock via SELECT FOR UPDATE, checks remaining stock, and decrements. Demonstrates why database-level pessimistic locking creates a serialization bottleneck that collapses under high concurrency.
Designing a flash sale system is one of the most challenging system design interview questions because it forces candidates to reason about concurrent access to a single shared resource — limited inventory — under extreme traffic conditions. The naive direct database lock approach is where every candidate should start: it establishes the correctness baseline that makes the improvements in queue-based and tiered architectures measurable and concrete.
The core challenge is atomically decrementing inventory without overselling. In the naive approach, every purchase attempt opens a PostgreSQL transaction, executes SELECT FOR UPDATE on the inventory row to acquire an exclusive row-level lock, checks if remaining_stock > 0, decrements the count via UPDATE, inserts an order record, and commits. The SELECT FOR UPDATE guarantees mutual exclusion: while one transaction holds the lock, all other transactions targeting the same row block in a wait queue until the lock is released.
This approach provides perfect correctness — zero overselling is guaranteed by PostgreSQL's row-level locking semantics. But the cost is serialization. Each transaction holds the lock for 50-100ms (lock acquisition + disk read + update + WAL write + commit). At 75ms average lock hold time, a single inventory row processes approximately 13 transactions per second. With 10,000 items available and 50,000 concurrent users, the math is brutal: 50K users competing for locks on a handful of hot item rows, with 99%+ of transactions timing out in the lock wait queue.
The connection pool becomes the hard ceiling. PostgreSQL typically supports 100-500 connections. Each blocked SELECT FOR UPDATE holds a connection for the duration of the lock wait (which can be seconds under contention). At 50K concurrent requests, the connection pool exhausts immediately, and subsequent requests receive connection errors before even attempting the transaction. The system does not degrade gracefully — it cliff-edges from functional to broken.
The read path (availability checks) compounds the problem. Without a cache layer, every GET /availability request queries PostgreSQL directly, competing with purchase transactions for the same connection pool. At 40% read traffic, availability checks consume connections that purchase transactions need, creating a tragic feedback loop: users see 'items available' from read queries but cannot purchase because the connection pool is saturated by other purchases.
This template makes the serialization bottleneck visible and quantifiable. Run the simulation at increasing RPS and watch lock wait time grow from milliseconds to seconds, then connection pool utilization hit 100%, then error rates spike. The comparison with the Queue and Tiered variants provides the concrete numbers needed to explain why demand shaping (virtual queues) and caching (Redis atomic DECR) are essential for production flash sales.
Another critical failure mode in the naive approach is deadlocks. When a single transaction touches multiple inventory rows (e.g., a user buying items A and B together), and another transaction touches the same rows in a different order, PostgreSQL detects a deadlock and aborts one transaction. Under high concurrency with multi-item carts, deadlock frequency increases quadratically with the number of concurrent transactions. The naive approach has no retry logic or deadlock avoidance strategy — aborted transactions simply fail.
The naive approach also lacks any observability into the lock contention. Without monitoring lock wait times, queue depths, or connection pool utilization, operators have no early warning before the system collapses. The simulation exposes these metrics, making the degradation curve visible. Understanding this degradation is essential for system design interviews — interviewers want to see candidates identify specific failure modes, not just wave at 'scalability issues.'
Flash sale system design appears in interviews at Amazon, Shopify, Ticketmaster, Nike, Walmart, and Target. Interviewers expect candidates to start with the database lock approach, identify the serialization bottleneck, and propose demand shaping as the first optimization — then discuss bot detection and CDN edge absorption that motivate the tiered approach.
The naive flash sale system is a four-component linear architecture: Client, Load Balancer, SaleService, and PostgreSQL database. There is no cache, no event stream, no queue, no bot detection, and no separation between the read and write paths. Every request follows the same synchronous path from client to database.
All traffic enters through the AWS Application Load Balancer, which distributes requests across SaleService pods using round-robin. The Load Balancer adds approximately 2ms of routing latency and supports up to 100K RPS, but the real capacity ceiling is determined entirely by the database behind SaleService. Both purchase attempts and availability checks flow through the same LB and service.
The SaleService is a stateless REST API running on 5 pods with 50 threads each (250 concurrent connections). It handles two operations: (1) purchase attempt — open a PostgreSQL transaction, SELECT FOR UPDATE on the inventory row, check remaining_stock > 0, UPDATE to decrement, INSERT an order record, COMMIT; (2) availability check — SELECT remaining_stock from the inventory table (no lock). The service processing time is approximately 8ms for business logic, but purchase latency is dominated by the database transaction time (50-100ms for the lock + write path, potentially seconds under contention).
PostgreSQL stores two tables: inventory and orders. The inventory table is the critical bottleneck — every purchase attempt acquires an exclusive row-level lock on the target item's inventory row. At high concurrency, the lock wait queue for popular items grows to thousands of transactions, most of which time out. The orders table stores confirmed purchases, inserted within the same transaction as the inventory decrement for atomicity.
The system has no redundancy at the data layer. A single PostgreSQL primary handles all reads and writes with 100 max connections. If the database fails, both purchases and availability checks stop entirely. There is no cache (every availability check hits the database), no event stream (payment is processed synchronously within the purchase transaction), and no demand shaping (all 50K concurrent users hit the database simultaneously).
The transaction isolation level matters significantly. At the default READ COMMITTED level, the SELECT FOR UPDATE sees committed data and acquires a lock. At SERIALIZABLE, PostgreSQL would detect more anomalies but abort more transactions due to serialization failures. The naive approach uses READ COMMITTED as the pragmatic choice — it prevents overselling via the explicit row lock without the overhead of full serializability. The trade-off is that phantom reads are possible on the availability check path (a user might see stale stock counts between the SELECT and the next committed UPDATE), but this is acceptable since the authoritative check happens within the locked purchase transaction.
Error handling in the naive approach is straightforward but limited. Lock wait timeouts (default 30 seconds in PostgreSQL) cause the transaction to abort with a timeout error. Connection pool exhaustion returns an immediate error. The service returns HTTP 503 (Service Unavailable) for both cases. There is no retry logic, no circuit breaker, and no fallback path — the system either succeeds or fails.
The concrete scaling ceiling is approximately 200 concurrent purchase attempts before lock contention causes cascading timeouts. At this point, the connection pool is saturated, lock wait times exceed SLO thresholds, and availability checks are starved of connections. The system degrades from functional to broken over a narrow traffic band — there is no graceful degradation curve.
This sequence diagram shows the purchase flow with pessimistic locking and the availability check flow. The critical insight is the lock serialization: while Transaction A holds the row lock, Transaction B blocks in the wait queue. At high concurrency, the lock wait queue grows to thousands of transactions with most timing out.
Step-by-Step Walkthrough
Pseudocode
// Purchase attempt — pessimistic locking
async function purchase(user_id, sale_id, item_id):
tx = await db.beginTransaction()
try:
// Step 1: Acquire exclusive row lock
row = await tx.execute(
"SELECT remaining_stock FROM inventory
WHERE item_id = $1 FOR UPDATE",
[item_id]
) // Blocks if another transaction holds the lock
// Step 2: Check stock availability
if row.remaining_stock <= 0:
await tx.rollback()
return { status: 409, error: "SOLD_OUT" }
// Step 3: Atomic decrement + order insert
await tx.execute(
"UPDATE inventory SET remaining_stock = remaining_stock - 1
WHERE item_id = $1", [item_id]
)
await tx.execute(
"INSERT INTO orders (order_id, user_id, sale_id, item_id, status)
VALUES (gen_random_uuid(), $1, $2, $3, 'CONFIRMED')",
[user_id, sale_id, item_id]
)
// Step 4: Commit releases the lock
await tx.commit() // ~50-100ms total
return { status: 200, message: "CONFIRMED" }
catch (e):
await tx.rollback()
return { status: 500, error: e.message }Choice
Acquire an exclusive row-level lock on inventory before reading stock count
Rationale
SELECT FOR UPDATE is the simplest correct approach for preventing overselling. The database enforces mutual exclusion at the row level — no race condition is possible between the stock check and the decrement. The cost is that all concurrent purchase attempts for the same item serialize on the lock, creating a hard throughput ceiling of ~13 TPS per item at 75ms average lock hold time.
Choice
One database for both inventory and orders with no cache layer
Rationale
A single PostgreSQL instance eliminates replication lag, cache consistency issues, and distributed coordination complexity. ACID transactions guarantee that inventory decrements and order inserts are atomic. The cost is that purchase writes and availability reads compete for the same 100-connection pool, creating mutual degradation under load.
Choice
All purchase attempts hit the database directly with no admission control
Rationale
A virtual queue shapes 500K RPS burst traffic to a sustainable backend rate (e.g., 1000/sec), reducing load by 500x. The naive approach deliberately omits this to demonstrate the thundering herd problem: all users hit inventory simultaneously, overwhelming the database. This motivates the Queue variant's token-bucket admission control.
Choice
Every availability check queries PostgreSQL directly
Rationale
A Redis cache for stock counts would serve availability checks in 1ms instead of 5-10ms, freeing database connections for purchase transactions. The naive approach skips caching, which means 40% of traffic (availability reads) competes with purchase writes for the same connection pool — a direct contributor to the connection pool exhaustion bottleneck.
Choice
Payment is processed within the purchase transaction, not async
Rationale
In the naive approach, the order is marked CONFIRMED immediately upon successful inventory decrement — there is no separate payment step. This simplifies the flow but means there is no mechanism for handling payment failures or releasing reserved stock. The Queue and Tiered variants introduce async payment via Kafka to decouple reservation from payment.
Target RPS
~200 TPS (DB lock ceiling)
Latency (p99)
50ms-5s purchase (varies with lock contention)
Storage
~2 GB/month at low scale
Availability
~99% (single DB, no redundancy)
| Operation | Time | Space | Notes |
|---|---|---|---|
| Purchase (POST /purchase) | O(1) per transaction, but serialized — effective throughput O(1/lock_hold_time) per item | O(1) per order (~500 bytes with indexes) | Lock hold time ~75ms. Per-item throughput: ~13 TPS. Connection pool (100) is the hard ceiling. |
| Availability check (GET /availability) | O(1) SELECT, ~5ms database round-trip | O(1) per query (no caching, no buffer) | Competes with purchase transactions for database connections. No cache layer. |
| Lock wait queue | O(N) where N = concurrent purchase attempts for the same item | O(N) in PostgreSQL lock wait queue | At 50K concurrent users, lock wait queue for hot items can grow to thousands. Most transactions time out. |
Stores per-item stock counts for flash sale events. The critical bottleneck table: every purchase attempt acquires an exclusive row-level lock via SELECT FOR UPDATE. Lock hold time is 50-100ms per transaction, serializing all concurrent purchases for the same item. Updated by SaleService on every successful purchase.
Indexes: idx_inventory_sale ON (sale_id, item_id)
Hot row problem: all purchases for item X serialize on the same row lock. At 75ms lock hold time, throughput is ~13 TPS per item.
Stores confirmed purchase orders. Inserted within the same transaction as the inventory decrement to ensure atomicity — if the transaction rolls back, both the order and the decrement are reverted. Write-heavy during the sale window.
Indexes: idx_orders_user ON (user_id, created_at DESC), idx_orders_sale ON (sale_id, item_id)
Inserted at ~200 TPS max (limited by inventory lock throughput). No async status updates — order is CONFIRMED immediately on commit.
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| Naive (Direct DB Lock) | T1 | 50ms-5s purchase (lock contention) | ~200 concurrent purchases | $300/month (single DB + service) | Low — 4 components, no cache, no queue | 99% (single DB, no redundancy) |
| Queue (Virtual Queue + Token Bucket) | T2 | <50ms reservation (token-gated) | 10M concurrent users, 1K reservations/sec | $2,500/month (Redis + Kafka + workers) | Medium — virtual queue, Redis DECR, Kafka | 99.9% (replicated components) |
| Tiered (Waiting Room + Bot Detection + CDN) | T3 | <30ms reservation (post-admission) | 50M+ concurrent users, 5K reservations/sec | $8,000/month (CDN + Lambda + full pipeline) | High — 12 components, CDN edge, bot detection | 99.9% (defense-in-depth, multi-layer) |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
Each SELECT FOR UPDATE holds an exclusive row-level lock for 50-100ms (lock acquisition + disk read + update + WAL write + commit). At 75ms average, one inventory row processes ~13 TPS. With 100 database connections shared between purchases and availability checks, the connection pool saturates at ~200 concurrent purchase attempts. Beyond this, requests queue for connections, lock wait times grow to seconds, and timeouts cascade. The system cliff-edges from functional to broken.
SELECT FOR UPDATE acquires an exclusive row-level lock on the inventory row before reading the stock count. While the lock is held, all other transactions attempting SELECT FOR UPDATE on the same row block in a wait queue. This guarantees that only one transaction at a time can read and modify the stock count — the check-and-decrement is atomic. If remaining_stock is 0, the transaction returns SOLD_OUT without decrementing. PostgreSQL guarantees zero overselling at the cost of serialized access.
Demand shaping via a virtual queue. Instead of letting all 50K users hit the database simultaneously (thundering herd), a virtual queue admits users at a controlled rate (e.g., 1000/sec). This reduces database load by 50x while maintaining fairness (FIFO ordering). The Queue variant implements this with a Redis sorted set and TokenWorker. The second optimization is replacing SELECT FOR UPDATE with Redis atomic DECR, which eliminates database lock contention entirely.
Optimistic locking (version column + retry) avoids holding locks but creates a different problem at flash sale scale: massive retry storms. With 50K concurrent users, optimistic lock conflicts would cause 99%+ of transactions to fail and retry, generating 50x more database load than the original requests. Pessimistic locking is actually better for high-contention scenarios because it queues rather than retries. The real solution is to move the contention point from the database to Redis (atomic DECR), where single-threaded execution eliminates both lock types.
Availability checks (GET /availability) are 40% of traffic. Each check is a SELECT query that consumes a database connection for 5-10ms. At 20K availability checks/sec during the burst, the database handles 20K SELECT queries competing with purchase transactions for the 100-connection pool. This creates a tragic feedback loop: users see 'items available' but cannot purchase because availability checks are consuming the connections that purchase transactions need.
The naive approach is the correctness baseline. SELECT FOR UPDATE provides the strongest possible consistency guarantee — ACID transactions with zero overselling. Understanding why it fails at scale (lock serialization, connection pool exhaustion) is essential for explaining why the optimizations in the Queue and Tiered variants exist. Interviewers expect candidates to start here, identify the bottlenecks quantitatively, and then propose solutions. Jumping directly to Redis + Kafka without understanding the baseline raises red flags.
Sign in to join the discussion.
Ready to design your own Flash Sale?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator