Vetora logo
Hard4 componentsInterview: High

Ticketmaster — Naive (SELECT FOR UPDATE)

The simplest ticket booking system: one TicketService with SELECT FOR UPDATE row locks on a PostgreSQL seats table. Zero double-booking via ACID transactions, but lock contention serialises all concurrent checkout attempts and collapses at ~200 concurrent buyers.

TransactionsPostgreSQLBeginnerBottleneck AnalysisSingle Server
Problem Statement

Every Ticketmaster design interview should start with the naive baseline — not because it works at scale, but because it clearly exposes why it doesn't. SELECT FOR UPDATE is the natural first instinct: the database already guarantees ACID transactions, row-level locks prevent concurrent updates, and you get zero double-booking with zero additional infrastructure. A single PostgreSQL transaction opens, locks the target seat row with SELECT FOR UPDATE, checks status = AVAILABLE, updates to HELD, inserts an order record, and commits. Simple, correct, and immediately comprehensible. The key question is not whether it works at small scale — it does — but where exactly it breaks and why.

The lock hold time problem is the fundamental throughput ceiling. A PostgreSQL SELECT FOR UPDATE transaction holds its row-level exclusive lock for the entire duration of the transaction — typically 50ms including the check, update, order insert, and commit. During those 50ms, any other user attempting to reserve the same seat is blocked in a FIFO queue behind the lock holder. With 200 concurrent users attempting to book the same popular seat (a GA floor position at a Taylor Swift onsale), the last user in the queue waits 200 × 50ms = 10 seconds just for the lock. At this point, connection pool exhaustion cascades: the PostgreSQL connection pool saturates (typically 200 connections), new requests cannot acquire a connection, and the service begins returning 500 errors to all users. This is not a gradual degradation — it is a cliff edge where the system goes from functional to completely unavailable in seconds.

The seat map problem compounds the lock contention. Without any caching layer, every seat map render (GET /api/v1/events/{id}/seats) requires a full-table scan of the seats table filtered by event_id — returning up to 80,000 rows per event. This is not a selective index lookup; it is a sequential read of all seat rows for the event. At 5,000 concurrent seat map views for a single popular event (realistic for an onsale moment where fans refresh constantly), 5,000 concurrent full-table scans compete for the same 200 PostgreSQL connections. Each scan holds a connection for 80-150ms (returning 80,000 rows over the network). Connection pool exhaustion for the seat map query alone prevents any reservation queries from acquiring connections.

Despite these limitations, the naive design has genuine value. For small venues — a comedy club with 200 seats, a fan club presale limited to 500 buyers, an internal company lottery — 50 concurrent buyers is a realistic ceiling and SELECT FOR UPDATE handles it cleanly. The simplicity is not just aesthetic: a system with 4 components (client, load balancer, service, database) has minimal operational surface area. No Redis TTL management, no Kafka lag monitoring, no cache invalidation logic, no async pipeline troubleshooting. A single developer can understand, debug, and operate the entire system. This simplicity is genuinely valuable at the scale it targets.

The naive design reveals exactly what V1 needs to fix. The bottleneck is lock serialisation — but the insight is that named-seat reservation is actually naturally parallel. 60,000 seats = 60,000 independent rows that can all be locked simultaneously without contention between different seats. The problem is not that we are locking; it is that PostgreSQL lock hold time is 50ms while Redis SETNX hold time is 2ms, and that all 60,000 seat locks share one 200-connection PostgreSQL pool. V1 replaces SELECT FOR UPDATE with Redis SETNX per seat key: each SETNX is independent, takes 2ms, and uses Redis's connection pool separately from the database. The 25x latency improvement (50ms → 2ms) plus the elimination of connection pool sharing between locks and reads are the two changes that unlock production-scale ticket booking.

Architecture Overview

The naive ticket booking system is a four-component linear architecture: BuyerClient → LoadBalancer → TicketService → SeatDB. There is no cache, no async pipeline, no search service, and no separation between read and write paths. This is the minimum viable implementation of a ticket booking system.

All traffic flows from BuyerClient through the LoadBalancer (AWS ALB, 2ms routing latency) to TicketService. TicketService is a monolithic service running on 4 ECS Fargate pods with 50 threads each, handling all four operations: event listing, seat map display, seat reservation, and checkout confirmation. With 200 total threads and 10ms processing time, theoretical throughput is 20,000 RPS — but this number is meaningless because the actual bottleneck is PostgreSQL lock hold time and connection pool saturation.

SeatDB is a single PostgreSQL RDS instance (db.r7g.xlarge, 4 vCPU, 32 GB RAM) with a 200-connection pool. Three tables live here: events (100K rows, read-heavy, B-tree index on city and event_date for filtered browse), seats (80K rows per event, the most contended table), and orders (write-heavy during onsale, one row per reservation attempt).

The reservation flow exposes the serialisation bottleneck directly. TicketService opens a PostgreSQL transaction, executes SELECT FOR UPDATE on the target seat row (seat_id = ?), acquires an exclusive row-level lock, checks status = AVAILABLE, updates status to HELD, inserts an orders row with status = PENDING and expires_at = NOW() + 10 minutes, and commits. Every concurrent attempt on the same seat blocks on the row lock until commit. With 50ms lock hold time and 200 concurrent reservations for the same event's popular seats, lock wait time reaches 10 seconds and connection pool exhaustion follows within seconds.

The seat map query is equally revealing. TicketService executes SELECT * FROM seats WHERE event_id = ? — returning up to 80,000 rows. With no caching, every seat map render holds a PostgreSQL connection for 80-150ms of I/O. At 5,000 concurrent seat map views (realistic for an onsale moment), all 200 connections saturate on seat map reads alone, blocking reservation attempts from acquiring connections entirely. The simulation makes this visible: as load increases past 200 concurrent users, the DB connection pool graph saturates at 200 and response time spikes from 100ms to 2-5 seconds before timing out.

The checkout confirmation flow illustrates the race condition inherent in this design. TicketService processes payment after the initial reservation transaction commits (the lock is released at commit). If another user's SETNX — or in this design, another SELECT FOR UPDATE — succeeds between the initial reserve and the confirm, a race condition opens. This is the fundamental limitation of separating reservation from payment without maintaining the lock: true atomicity would require holding the lock for the entire payment flow, but database transactions must never span external network calls.

Architecture Preview
Loading architecture preview...
Key Design Decisions
SELECT FOR UPDATE vs Redis SETNX

Choice

PostgreSQL row lock via SELECT FOR UPDATE

Rationale

Simplest correct approach to preventing double-booking: the database enforces mutual exclusion at the row level with no additional infrastructure. No Redis cluster to manage, no TTL edge cases, no separate service to operate. The trade-off is lock hold time — 50ms per transaction limits reservation throughput to approximately 20 TPS per hot seat. V1 replaces this with Redis SETNX, reducing hold time to 2ms and eliminating shared connection pool contention between seat map reads and reservations.

No Hold-and-Confirm

Choice

Atomic reservation: lock → check → update → commit in one transaction

Rationale

True hold-and-confirm requires the seat lock to persist across the payment gateway call (200-500ms). Database transactions must never span external network calls — the PostgreSQL lock would block all other buyers for 500ms during payment, reducing throughput to 2 TPS per seat. The naive design completes the DB transaction before payment, but this creates a race window: another user can claim the seat between reserve commit and payment confirm. The hold-and-confirm pattern in V1 solves this by using Redis TTL instead of a database lock for the hold duration.

No Caching Layer

Choice

All queries direct to PostgreSQL with no intermediate cache

Rationale

Eliminates operational complexity of Redis management, cache invalidation logic, and eventual consistency reasoning. At small scale (50 concurrent buyers, 500-seat venue), PostgreSQL can serve seat map queries at acceptable latency. At scale, this becomes the primary additional bottleneck beyond lock contention — 5K concurrent seat map views saturate the 200-connection pool entirely before any reservation attempts can acquire connections.

Monolithic TicketService

Choice

One service handles all operations: browse, seat map, hold, confirm

Rationale

Minimises infrastructure to one service, one database, one load balancer. Operationally simple — one codebase, one deployment, one log stream. Works for small venues where browse and reservation load are both low. At scale, browse traffic (60% of all requests) and seat reservation traffic (15%) share the same thread pool and connection pool, allowing a browse surge at onsale to starve reservation threads during the most critical window.

No Event Search

Choice

Events accessed by direct ID or simple SQL filters (city, date, category)

Rationale

Elasticsearch or full-text search adds operational complexity: index synchronisation, cluster management, and schema evolution. SQL LIKE queries and B-tree indexed filters work for hundreds or thousands of events with filtered browse. At millions of events with full-text artist or venue search, SQL prefix matches degrade — but that scale problem is outside the scope of a naive design targeting small venues with under 50 concurrent buyers.

Scale & Performance

Target RPS

~200 concurrent seat checkouts before PostgreSQL lock contention causes cascading timeouts

Latency (p99)

50-100ms seat reservation (lock hold), 2-5s seat map display under load (connection pool saturation)

Storage

~50 GB (events + seats + orders for a mid-sized platform)

Availability

~99% (single PostgreSQL instance, no standby replica in the naive setup)

Database Schema (HLD)
events

Event catalog storing concert, sports, and theatre events. One row per event with venue, date, artist, total seat count, and pricing range. Read-heavy (browse and detail views), write-rare (admins add events). B-tree index on (city, event_date) supports filtered browse queries. Approximately 100K rows for a mid-size ticketing platform.

event_id UUID PK (unique event identifier)name VARCHAR (event display name)artist VARCHAR (performer or team name)venue_name VARCHAR (venue name)city VARCHAR (city for geographic filtering)event_date TIMESTAMPTZ (ISO event datetime)total_seats INT (total seats in venue)available_seats INT (denormalised available count for browse display)price_min_cents INT (lowest tier price in cents)price_max_cents INT (highest tier price in cents)

Indexes: idx_events_city_date ON (city, event_date) — filtered browse queries, idx_events_artist ON (artist) — artist name search (prefix match only)

available_seats is a denormalised counter updated on every reservation — creates a secondary hot-row contention point during onsale. At 100K events this table is small and fits entirely in PostgreSQL buffer cache.

seats

Individual seat inventory: one row per physical seat in every venue for every event. The status column (AVAILABLE/HELD/SOLD) is the authoritative availability state in this V0 design. SELECT FOR UPDATE on this row provides mutual exclusion for reservation. At 80K seats per event × 1K active events = 80M rows total. The seats table receives the highest write contention during onsale.

seat_id UUID PK (unique seat identifier per event)event_id UUID FK (parent event)section VARCHAR (section name: Floor, Orchestra, Mezzanine, Upper)row_num VARCHAR (row identifier: A, B, 1, 2, etc.)seat_number INT (seat number within row)price_tier VARCHAR (GA, Standard, Premium, VIP)price_cents INT (price in cents for this seat)status VARCHAR (AVAILABLE / HELD / SOLD)

Indexes: idx_seats_event ON (event_id) — seat map query returns all seats for event, idx_seats_event_status ON (event_id, status) — count available seats

This table is the primary bottleneck in V0. SELECT FOR UPDATE locks individual rows for 50ms per reservation. The seat map query (SELECT * WHERE event_id = ?) scans all rows for the event — up to 80,000 rows — holding a connection for 80-150ms with no caching.

orders

Purchase orders linking users to seats. Status lifecycle: PENDING (held, awaiting payment) → CONFIRMED (payment complete) → EXPIRED (hold TTL elapsed) → CANCELLED (user action). The expires_at column enforces the 10-minute hold window. One row per seat reservation attempt including failed ones.

order_id UUID PK (unique order identifier)user_id UUID (purchasing user)seat_id UUID FK (reserved seat, references seats)event_id UUID FK (event, references events)status VARCHAR (PENDING / CONFIRMED / EXPIRED / CANCELLED)reserved_at TIMESTAMPTZ (hold start timestamp)expires_at TIMESTAMPTZ (hold expiry: reserved_at + 10 minutes)confirmed_at TIMESTAMPTZ (payment confirmation timestamp, nullable)

Indexes: idx_orders_seat_status ON (seat_id, status) — find active holds for a seat, idx_orders_user ON (user_id) — user order history, idx_orders_expires ON (expires_at) WHERE status = PENDING — expired hold cleanup

In V0, expired holds are released lazily (on next read of the seat) or by a background cron job. At 1K inserts/sec at onsale peak, this table grows quickly. No async pipeline — all writes are synchronous in the reservation transaction.

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions
Why does SELECT FOR UPDATE create a throughput ceiling at ~200 concurrent checkouts?

PostgreSQL SELECT FOR UPDATE holds an exclusive row-level lock for the entire transaction duration — approximately 50ms including status check, seat update, order insert, and commit. With 200 concurrent buyers attempting reservations for the same event's popular seats, each waiter adds 50ms to the queue behind the lock holder. The 200th user in the queue waits 200 × 50ms = 10 seconds. Before that point, the 200-connection PostgreSQL pool saturates: each waiting transaction holds a connection while blocked on the lock, consuming connections without doing useful work. When the pool fills, new requests cannot acquire a connection at all, and the service returns 500 errors. This is not a gradual degradation — it is a sudden cliff where the system transitions from functional to completely unavailable.

What is the actual failure mode at onsale for a major artist?

At onsale open, users simultaneously refresh the event page (browse queries) and attempt seat reservations. Browse queries (SELECT * FROM seats WHERE event_id = ?) return 80,000 rows and hold PostgreSQL connections for 80-150ms each. At 5,000 concurrent seat map views, all 200 connections saturate on browse queries alone — reservation queries cannot acquire connections and immediately fail. Users see connection timeout errors within 2-3 seconds of onsale open. The system does not experience gradual degradation; it experiences complete unavailability for all reservation attempts while browse queries exhaust the connection pool.

Is SELECT FOR UPDATE ever the right answer in a real production system?

Yes, for specific use cases. Admin operations that update a single record and run at low concurrency (e.g., an admin marking an event as cancelled) benefit from SELECT FOR UPDATE simplicity — no Redis infrastructure needed for a one-per-hour operation. Low-concurrency batch processing where serialisation is acceptable is another valid case. Small venue systems (comedy club presales, fan club lotteries limited to 50 concurrent buyers) can run SELECT FOR UPDATE in production indefinitely without hitting the throughput ceiling. The mistake is applying a pattern suited for low-concurrency administrative operations to a high-concurrency consumer-facing onsale scenario.

What is the first change to make when moving from V0 to V1?

Replace SELECT FOR UPDATE with Redis SETNX for seat holds. This one change reduces lock hold time from 50ms to 2ms (a 25x improvement) and eliminates connection pool sharing between seat map reads and reservations. The second immediate change is adding EventCache (Redis) for seat map queries — serving the 80,000-row seat availability result from a compact 7.5KB bitmap cache instead of a full-table scan. Together, these two changes move the system from a 200-user ceiling to a 50,000-user ceiling, which is sufficient for most real-world onsale scenarios.

Why does the seat map query degrade so severely under load?

The seat map query (SELECT * FROM seats WHERE event_id = ?) returns up to 80,000 rows per event — all seat records with their current status, section, row, and price. This is not a selective point lookup; it is a sequential scan of all seats for the event even with an index on event_id (the index reduces to O(n seats) page reads, not O(1)). Each seat map query holds a PostgreSQL connection for the entire result transfer time (80-150ms for 80,000 rows over the network). At 5,000 concurrent seat map views — realistic when 500,000 users simultaneously open the event page at onsale — the 200-connection pool fills with seat map queries alone, blocking all other database operations.

Related Templates

Discussion

Sign in to join the discussion.

Ready to design your own Ticketmaster?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator