Hard4 componentsInterview: Very High

Flash Sale — Naive (Direct DB Lock)

Q: Why does the naive approach collapse at just 200 concurrent purchases?

Each SELECT FOR UPDATE holds an exclusive row-level lock for 50-100ms (lock acquisition + disk read + update + WAL write + commit). At 75ms average, one inventory row processes ~13 TPS. With 100 database connections shared between purchases and availability checks, the connection pool saturates at ~200 concurrent purchase attempts. Beyond this, requests queue for connections, lock wait times grow to seconds, and timeouts cascade. The system cliff-edges from functional to broken.

Q: How does SELECT FOR UPDATE prevent overselling?

SELECT FOR UPDATE acquires an exclusive row-level lock on the inventory row before reading the stock count. While the lock is held, all other transactions attempting SELECT FOR UPDATE on the same row block in a wait queue. This guarantees that only one transaction at a time can read and modify the stock count — the check-and-decrement is atomic. If remaining_stock is 0, the transaction returns SOLD_OUT without decrementing. PostgreSQL guarantees zero overselling at the cost of serialized access.

Q: What is the first optimization an interviewer expects?

Demand shaping via a virtual queue. Instead of letting all 50K users hit the database simultaneously (thundering herd), a virtual queue admits users at a controlled rate (e.g., 1000/sec). This reduces database load by 50x while maintaining fairness (FIFO ordering). The Queue variant implements this with a Redis sorted set and TokenWorker. The second optimization is replacing SELECT FOR UPDATE with Redis atomic DECR, which eliminates database lock contention entirely.

Q: Why not use optimistic locking instead of pessimistic locking?

Optimistic locking (version column + retry) avoids holding locks but creates a different problem at flash sale scale: massive retry storms. With 50K concurrent users, optimistic lock conflicts would cause 99%+ of transactions to fail and retry, generating 50x more database load than the original requests. Pessimistic locking is actually better for high-contention scenarios because it queues rather than retries. The real solution is to move the contention point from the database to Redis (atomic DECR), where single-threaded execution eliminates both lock types.

Q: How does the availability check path contribute to the bottleneck?

Availability checks (GET /availability) are 40% of traffic. Each check is a SELECT query that consumes a database connection for 5-10ms. At 20K availability checks/sec during the burst, the database handles 20K SELECT queries competing with purchase transactions for the 100-connection pool. This creates a tragic feedback loop: users see 'items available' but cannot purchase because availability checks are consuming the connections that purchase transactions need.

Q: Why is this approach still worth studying despite its limitations?

The naive approach is the correctness baseline. SELECT FOR UPDATE provides the strongest possible consistency guarantee — ACID transactions with zero overselling. Understanding why it fails at scale (lock serialization, connection pool exhaustion) is essential for explaining why the optimizations in the Queue and Tiered variants exist. Interviewers expect candidates to start here, identify the bottlenecks quantitatively, and then propose solutions. Jumping directly to Redis + Kafka without understanding the baseline raises red flags.

The simplest flash sale system: every purchase attempt acquires a PostgreSQL row-level lock via SELECT FOR UPDATE, checks remaining stock, and decrements. Demonstrates why database-level pessimistic locking creates a serialization bottleneck that collapses under high concurrency.

TransactionsBeginnerBottleneck AnalysisPessimistic Locking

Try in Simulator

Problem Statement

Designing a flash sale system is one of the most challenging system design interview questions because it forces candidates to reason about concurrent access to a single shared resource — limited inventory — under extreme traffic conditions. The naive direct database lock approach is where every candidate should start: it establishes the correctness baseline that makes the improvements in queue-based and tiered architectures measurable and concrete.

The core challenge is atomically decrementing inventory without overselling. In the naive approach, every purchase attempt opens a PostgreSQL transaction, executes SELECT FOR UPDATE on the inventory row to acquire an exclusive row-level lock, checks if remaining_stock > 0, decrements the count via UPDATE, inserts an order record, and commits. The SELECT FOR UPDATE guarantees mutual exclusion: while one transaction holds the lock, all other transactions targeting the same row block in a wait queue until the lock is released.

This approach provides perfect correctness — zero overselling is guaranteed by PostgreSQL's row-level locking semantics. But the cost is serialization. Each transaction holds the lock for 50-100ms (lock acquisition + disk read + update + WAL write + commit). At 75ms average lock hold time, a single inventory row processes approximately 13 transactions per second. With 10,000 items available and 50,000 concurrent users, the math is brutal: 50K users competing for locks on a handful of hot item rows, with 99%+ of transactions timing out in the lock wait queue.

The connection pool becomes the hard ceiling. PostgreSQL typically supports 100-500 connections. Each blocked SELECT FOR UPDATE holds a connection for the duration of the lock wait (which can be seconds under contention). At 50K concurrent requests, the connection pool exhausts immediately, and subsequent requests receive connection errors before even attempting the transaction. The system does not degrade gracefully — it cliff-edges from functional to broken.

The read path (availability checks) compounds the problem. Without a cache layer, every GET /availability request queries PostgreSQL directly, competing with purchase transactions for the same connection pool. At 40% read traffic, availability checks consume connections that purchase transactions need, creating a tragic feedback loop: users see 'items available' from read queries but cannot purchase because the connection pool is saturated by other purchases.

This template makes the serialization bottleneck visible and quantifiable. Run the simulation at increasing RPS and watch lock wait time grow from milliseconds to seconds, then connection pool utilization hit 100%, then error rates spike. The comparison with the Queue and Tiered variants provides the concrete numbers needed to explain why demand shaping (virtual queues) and caching (Redis atomic DECR) are essential for production flash sales.

Another critical failure mode in the naive approach is deadlocks. When a single transaction touches multiple inventory rows (e.g., a user buying items A and B together), and another transaction touches the same rows in a different order, PostgreSQL detects a deadlock and aborts one transaction. Under high concurrency with multi-item carts, deadlock frequency increases quadratically with the number of concurrent transactions. The naive approach has no retry logic or deadlock avoidance strategy — aborted transactions simply fail.

The naive approach also lacks any observability into the lock contention. Without monitoring lock wait times, queue depths, or connection pool utilization, operators have no early warning before the system collapses. The simulation exposes these metrics, making the degradation curve visible. Understanding this degradation is essential for system design interviews — interviewers want to see candidates identify specific failure modes, not just wave at 'scalability issues.'

Flash sale system design appears in interviews at Amazon, Shopify, Ticketmaster, Nike, Walmart, and Target. Interviewers expect candidates to start with the database lock approach, identify the serialization bottleneck, and propose demand shaping as the first optimization — then discuss bot detection and CDN edge absorption that motivate the tiered approach.

Architecture Overview

The naive flash sale system is a four-component linear architecture: Client, Load Balancer, SaleService, and PostgreSQL database. There is no cache, no event stream, no queue, no bot detection, and no separation between the read and write paths. Every request follows the same synchronous path from client to database.

All traffic enters through the AWS Application Load Balancer, which distributes requests across SaleService pods using round-robin. The Load Balancer adds approximately 2ms of routing latency and supports up to 100K RPS, but the real capacity ceiling is determined entirely by the database behind SaleService. Both purchase attempts and availability checks flow through the same LB and service.

The SaleService is a stateless REST API running on 5 pods with 50 threads each (250 concurrent connections). It handles two operations: (1) purchase attempt — open a PostgreSQL transaction, SELECT FOR UPDATE on the inventory row, check remaining_stock > 0, UPDATE to decrement, INSERT an order record, COMMIT; (2) availability check — SELECT remaining_stock from the inventory table (no lock). The service processing time is approximately 8ms for business logic, but purchase latency is dominated by the database transaction time (50-100ms for the lock + write path, potentially seconds under contention).

PostgreSQL stores two tables: inventory and orders. The inventory table is the critical bottleneck — every purchase attempt acquires an exclusive row-level lock on the target item's inventory row. At high concurrency, the lock wait queue for popular items grows to thousands of transactions, most of which time out. The orders table stores confirmed purchases, inserted within the same transaction as the inventory decrement for atomicity.

The system has no redundancy at the data layer. A single PostgreSQL primary handles all reads and writes with 100 max connections. If the database fails, both purchases and availability checks stop entirely. There is no cache (every availability check hits the database), no event stream (payment is processed synchronously within the purchase transaction), and no demand shaping (all 50K concurrent users hit the database simultaneously).

The transaction isolation level matters significantly. At the default READ COMMITTED level, the SELECT FOR UPDATE sees committed data and acquires a lock. At SERIALIZABLE, PostgreSQL would detect more anomalies but abort more transactions due to serialization failures. The naive approach uses READ COMMITTED as the pragmatic choice — it prevents overselling via the explicit row lock without the overhead of full serializability. The trade-off is that phantom reads are possible on the availability check path (a user might see stale stock counts between the SELECT and the next committed UPDATE), but this is acceptable since the authoritative check happens within the locked purchase transaction.

Error handling in the naive approach is straightforward but limited. Lock wait timeouts (default 30 seconds in PostgreSQL) cause the transaction to abort with a timeout error. Connection pool exhaustion returns an immediate error. The service returns HTTP 503 (Service Unavailable) for both cases. There is no retry logic, no circuit breaker, and no fallback path — the system either succeeds or fails.

The concrete scaling ceiling is approximately 200 concurrent purchase attempts before lock contention causes cascading timeouts. At this point, the connection pool is saturated, lock wait times exceed SLO thresholds, and availability checks are starved of connections. The system degrades from functional to broken over a narrow traffic band — there is no graceful degradation curve.

Architecture Preview

Loading architecture preview...

Open in Simulator

Request Flow — SELECT FOR UPDATE Purchase

This sequence diagram shows the purchase flow with pessimistic locking and the availability check flow. The critical insight is the lock serialization: while Transaction A holds the row lock, Transaction B blocks in the wait queue. At high concurrency, the lock wait queue grows to thousands of transactions with most timing out.

Loading diagram...

Step-by-Step Walkthrough

1User submits a purchase request — Load Balancer routes to SaleService
2SaleService opens a PostgreSQL transaction (BEGIN)
3SaleService executes SELECT FOR UPDATE on the inventory row — acquires exclusive lock
4While lock is held, all other purchase attempts for this item BLOCK in the wait queue
5SaleService checks remaining_stock > 0, decrements via UPDATE, inserts order record
6SaleService commits the transaction — lock is released, next waiting transaction proceeds
7For availability checks, SaleService reads stock count with a plain SELECT (no lock)

Pseudocode

// Purchase attempt — pessimistic locking
async function purchase(user_id, sale_id, item_id):
    tx = await db.beginTransaction()
    try:
        // Step 1: Acquire exclusive row lock
        row = await tx.execute(
            "SELECT remaining_stock FROM inventory
             WHERE item_id = $1 FOR UPDATE",
            [item_id]
        )   // Blocks if another transaction holds the lock

        // Step 2: Check stock availability
        if row.remaining_stock <= 0:
            await tx.rollback()
            return { status: 409, error: "SOLD_OUT" }

        // Step 3: Atomic decrement + order insert
        await tx.execute(
            "UPDATE inventory SET remaining_stock = remaining_stock - 1
             WHERE item_id = $1", [item_id]
        )
        await tx.execute(
            "INSERT INTO orders (order_id, user_id, sale_id, item_id, status)
             VALUES (gen_random_uuid(), $1, $2, $3, 'CONFIRMED')",
            [user_id, sale_id, item_id]
        )

        // Step 4: Commit releases the lock
        await tx.commit()   // ~50-100ms total
        return { status: 200, message: "CONFIRMED" }
    catch (e):
        await tx.rollback()
        return { status: 500, error: e.message }

Key Design Decisions

SELECT FOR UPDATE (Pessimistic Locking)

Choice

Acquire an exclusive row-level lock on inventory before reading stock count

Rationale

SELECT FOR UPDATE is the simplest correct approach for preventing overselling. The database enforces mutual exclusion at the row level — no race condition is possible between the stock check and the decrement. The cost is that all concurrent purchase attempts for the same item serialize on the lock, creating a hard throughput ceiling of ~13 TPS per item at 75ms average lock hold time.

Single PostgreSQL Database

Choice

One database for both inventory and orders with no cache layer

Rationale

A single PostgreSQL instance eliminates replication lag, cache consistency issues, and distributed coordination complexity. ACID transactions guarantee that inventory decrements and order inserts are atomic. The cost is that purchase writes and availability reads compete for the same 100-connection pool, creating mutual degradation under load.

No Queue or Demand Shaping

Choice

All purchase attempts hit the database directly with no admission control

Rationale

A virtual queue shapes 500K RPS burst traffic to a sustainable backend rate (e.g., 1000/sec), reducing load by 500x. The naive approach deliberately omits this to demonstrate the thundering herd problem: all users hit inventory simultaneously, overwhelming the database. This motivates the Queue variant's token-bucket admission control.

No Cache for Availability

Choice

Every availability check queries PostgreSQL directly

Rationale

A Redis cache for stock counts would serve availability checks in 1ms instead of 5-10ms, freeing database connections for purchase transactions. The naive approach skips caching, which means 40% of traffic (availability reads) competes with purchase writes for the same connection pool — a direct contributor to the connection pool exhaustion bottleneck.

Synchronous Payment Processing

Choice

Payment is processed within the purchase transaction, not async

Rationale

In the naive approach, the order is marked CONFIRMED immediately upon successful inventory decrement — there is no separate payment step. This simplifies the flow but means there is no mechanism for handling payment failures or releasing reserved stock. The Queue and Tiered variants introduce async payment via Kafka to decouple reservation from payment.

Scale & Performance

Target RPS

~200 TPS (DB lock ceiling)

Latency (p99)

50ms-5s purchase (varies with lock contention)

Storage

~2 GB/month at low scale

Availability

~99% (single DB, no redundancy)

Time & Space Complexity

Operation	Time	Space	Notes
Purchase (POST /purchase)	O(1) per transaction, but serialized — effective throughput O(1/lock_hold_time) per item	O(1) per order (~500 bytes with indexes)	Lock hold time ~75ms. Per-item throughput: ~13 TPS. Connection pool (100) is the hard ceiling.
Availability check (GET /availability)	O(1) SELECT, ~5ms database round-trip	O(1) per query (no caching, no buffer)	Competes with purchase transactions for database connections. No cache layer.
Lock wait queue	O(N) where N = concurrent purchase attempts for the same item	O(N) in PostgreSQL lock wait queue	At 50K concurrent users, lock wait queue for hot items can grow to thousands. Most transactions time out.

Database Schema (HLD)

inventory

Stores per-item stock counts for flash sale events. The critical bottleneck table: every purchase attempt acquires an exclusive row-level lock via SELECT FOR UPDATE. Lock hold time is 50-100ms per transaction, serializing all concurrent purchases for the same item. Updated by SaleService on every successful purchase.

item_id VARCHAR PK (item identifier)sale_id VARCHAR FK (flash sale event ID, indexed)total_stock INTEGER (initial stock count, immutable)remaining_stock INTEGER (current remaining, decremented per purchase)updated_at TIMESTAMPTZ (last modification timestamp)

Indexes: idx_inventory_sale ON (sale_id, item_id)

Hot row problem: all purchases for item X serialize on the same row lock. At 75ms lock hold time, throughput is ~13 TPS per item.

orders

Stores confirmed purchase orders. Inserted within the same transaction as the inventory decrement to ensure atomicity — if the transaction rolls back, both the order and the decrement are reverted. Write-heavy during the sale window.

order_id VARCHAR PK (UUID)user_id VARCHAR FK (purchasing user, indexed)sale_id VARCHAR FK (flash sale event ID)item_id VARCHAR FK (item purchased)quantity INTEGER (quantity purchased, default 1)status VARCHAR (CONFIRMED or FAILED)created_at TIMESTAMPTZ (order timestamp)

Indexes: idx_orders_user ON (user_id, created_at DESC), idx_orders_sale ON (sale_id, item_id)

Inserted at ~200 TPS max (limited by inventory lock throughput). No async status updates — order is CONFIRMED immediately on commit.

Solution Comparison

Variant	Tier	Latency	Throughput	Cost	Complexity	Reliability
Naive (Direct DB Lock)	T1	50ms-5s purchase (lock contention)	~200 concurrent purchases	$300/month (single DB + service)	Low — 4 components, no cache, no queue	99% (single DB, no redundancy)
Queue (Virtual Queue + Token Bucket)	T2	<50ms reservation (token-gated)	10M concurrent users, 1K reservations/sec	$2,500/month (Redis + Kafka + workers)	Medium — virtual queue, Redis DECR, Kafka	99.9% (replicated components)
Tiered (Waiting Room + Bot Detection + CDN)	T3	<30ms reservation (post-admission)	50M+ concurrent users, 5K reservations/sec	$8,000/month (CDN + Lambda + full pipeline)	High — 12 components, CDN edge, bot detection	99.9% (defense-in-depth, multi-layer)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions

Why does the naive approach collapse at just 200 concurrent purchases?

Each SELECT FOR UPDATE holds an exclusive row-level lock for 50-100ms (lock acquisition + disk read + update + WAL write + commit). At 75ms average, one inventory row processes ~13 TPS. With 100 database connections shared between purchases and availability checks, the connection pool saturates at ~200 concurrent purchase attempts. Beyond this, requests queue for connections, lock wait times grow to seconds, and timeouts cascade. The system cliff-edges from functional to broken.

How does SELECT FOR UPDATE prevent overselling?

SELECT FOR UPDATE acquires an exclusive row-level lock on the inventory row before reading the stock count. While the lock is held, all other transactions attempting SELECT FOR UPDATE on the same row block in a wait queue. This guarantees that only one transaction at a time can read and modify the stock count — the check-and-decrement is atomic. If remaining_stock is 0, the transaction returns SOLD_OUT without decrementing. PostgreSQL guarantees zero overselling at the cost of serialized access.

What is the first optimization an interviewer expects?

Demand shaping via a virtual queue. Instead of letting all 50K users hit the database simultaneously (thundering herd), a virtual queue admits users at a controlled rate (e.g., 1000/sec). This reduces database load by 50x while maintaining fairness (FIFO ordering). The Queue variant implements this with a Redis sorted set and TokenWorker. The second optimization is replacing SELECT FOR UPDATE with Redis atomic DECR, which eliminates database lock contention entirely.

Why not use optimistic locking instead of pessimistic locking?

Optimistic locking (version column + retry) avoids holding locks but creates a different problem at flash sale scale: massive retry storms. With 50K concurrent users, optimistic lock conflicts would cause 99%+ of transactions to fail and retry, generating 50x more database load than the original requests. Pessimistic locking is actually better for high-contention scenarios because it queues rather than retries. The real solution is to move the contention point from the database to Redis (atomic DECR), where single-threaded execution eliminates both lock types.

How does the availability check path contribute to the bottleneck?

Availability checks (GET /availability) are 40% of traffic. Each check is a SELECT query that consumes a database connection for 5-10ms. At 20K availability checks/sec during the burst, the database handles 20K SELECT queries competing with purchase transactions for the 100-connection pool. This creates a tragic feedback loop: users see 'items available' but cannot purchase because availability checks are consuming the connections that purchase transactions need.

Why is this approach still worth studying despite its limitations?

The naive approach is the correctness baseline. SELECT FOR UPDATE provides the strongest possible consistency guarantee — ACID transactions with zero overselling. Understanding why it fails at scale (lock serialization, connection pool exhaustion) is essential for explaining why the optimizations in the Queue and Tiered variants exist. Interviewers expect candidates to start here, identify the bottlenecks quantitatively, and then propose solutions. Jumping directly to Redis + Kafka without understanding the baseline raises red flags.

Related Templates

Flash Sale — Virtual Queue + Token Bucket Flash Sale — Tiered (Waiting Room + Bot Detection + CDN)

Discussion

Ready to design your own Flash Sale?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator