Vetora logo
Hard10 componentsInterview: Very High

Ticketmaster — Per-Seat Redis Hold

The canonical interview answer: Redis SETNX per seat ID for atomic holds, dedicated SearchService with EventCache for event browse, and async Kafka pipeline for ticket generation. 60,000 seats = 60,000 independent Redis keys with zero cross-seat contention.

TransactionsRedisKafkaSeat ReservationMicroservices
Problem Statement

The key insight that unlocks the Ticketmaster design interview is recognising that named seat reservation is naturally parallel. Every seat has its own independent identity: seat 14E in section 103 is a completely separate resource from seat 15F in the same section. Redis SETNX per seat key means there is no coordination or contention between different seats — 60,000 seats can all be held simultaneously in parallel, each SETNX taking 2ms independently. This is the fundamental inversion from V0: instead of serialising on a shared database connection pool, each seat hold is a completely independent Redis operation. The throughput is not 20 TPS total (V0) — it is 2ms per seat hold, with all 60,000 seats processable in parallel.

The hold-and-confirm pattern requires careful design of the 10-minute TTL. The hold window must be long enough for users to complete payment — typical payment flows take 2-3 minutes including card entry, 3D Secure authentication, and bank authorization. 10 minutes provides headroom for slow connections, confused users, and payment retries without being so long that seats are hoarded. When a hold expires (Redis auto-deletes the key at TTL), the seat silently returns to availability. The next EventCache invalidation cycle (within 2 seconds) reflects the returned seat in the availability bitmap. On payment confirmation, SeatService does not merely mark the hold confirmed in Redis — it writes an immutable order record to OrderDB (PostgreSQL) and publishes a ticket_confirmed event to TicketStream (Kafka). Releasing a hold requires a Lua script for atomic compare-and-delete: GET the hold value, verify it belongs to the requesting user, then DEL. This prevents a user from accidentally releasing another user's hold if keys collide (they should not, but defensive coding matters).

The seat map challenge is the second hard problem in V1. Five million users simultaneously viewing the seat map for a popular event need near-real-time data on which of 60,000 seats are currently held versus available. Reading SeatHoldCache directly — SCAN for all seat:{event_id}:* keys — is O(60,000 Redis lookups) per user per render. With 5M concurrent viewers, this generates 300 billion Redis operations per second, an impossible load. The solution is a dedicated EventCache storing a compact availability bitmap per event: 1 bit per seat means 60,000 seats = 7.5KB of data. This entire bitmap is read as a single Redis GET — O(1) per render regardless of seat count. SeatService triggers EventCache invalidation on every successful SETNX and every hold release, keeping staleness under 2 seconds for 5M concurrent viewers at the cost of a single 7.5KB GET each.

The read-to-write ratio explains why SearchService and SeatService must be independent. Across a full onsale lifecycle, approximately 100 users browse events and view seat maps for every 1 user who actually purchases. This 100:1 browse-to-purchase ratio means 99% of traffic never touches SeatService — it is pure read traffic on event metadata and seat availability. SearchService handles this read traffic backed by EventCache. If browse and seat hold traffic shared a service, a browse surge at onsale open would consume all service threads and connection pool slots, starving seat hold operations during the highest-stakes window. Independent services with independent scaling allow SearchService to handle 5M browse RPS while SeatService handles 50K hold RPS — completely different scaling requirements satisfied independently.

The Kafka pipeline for ticket generation decouples confirmation from delivery in two ways. First, QR code generation (SHA-256 of the ticket payload, rendered as an image) takes 100-200ms — unacceptably long in the checkout critical path during high-concurrency onsale. Second, email delivery is inherently unreliable (SMTP provider timeouts, spam filtering, rate limits). Kafka absorbs both concerns: SeatService publishes a ticket_confirmed event to TicketStream (synchronous, 5ms), TicketWorker consumes asynchronously and generates the QR code and sends the email (up to 30 seconds). If email delivery fails, TicketWorker retries with exponential backoff using Kafka offset management — no confirmed purchase is ever lost. The confirmed order record in OrderDB is the source of truth; email delivery is a best-effort side effect.

Architecture Overview

The V1 architecture uses 10 components organized into two independent paths. The browse path serves 99% of traffic: BuyerClient → LoadBalancer → SearchService → EventCache (99% hit) → EventDB (1% miss). The seat hold path handles the critical 1%: BuyerClient → LoadBalancer → SeatService → SeatHoldCache (SETNX) → OrderDB (on confirm) → TicketStream → TicketWorker.

SearchService is a read-optimized service backed by EventCache (ElastiCache Redis). EventCache maintains two distinct cache namespaces: event metadata entries (event name, artist, venue, pricing tiers) with 60-second TTL since event details change rarely, and the seat availability bitmap per event with 2-second TTL since seats are held and released constantly during onsale. The bitmap is the compact representation of 60,000 seats in 7.5KB — a single Redis GET per seat map render instead of 60,000 individual HGET calls. EventDB (PostgreSQL RDS) stores the authoritative event catalog and seat geometry. SearchService queries EventDB only on cache miss (approximately 1% of browse traffic), keeping database load manageable even at 5M browse RPS.

SeatService is the write-critical path. When a user selects seat 14E, SeatService executes SETNX seat:{event_id}:14E with a 600-second TTL. Redis SETNX returns 1 (key set successfully, hold acquired) or 0 (key exists, seat already held). The atomic nature of SETNX eliminates the check-then-set race condition that plagues optimistic locking approaches. On SETNX success, SeatService immediately invalidates the EventCache availability bitmap so the seat map reflects the hold within 2 seconds. SeatHoldCache stores not just the hold existence but also the user ID and hold timestamp, enabling proper ownership verification on confirm and release.

On payment confirmation, SeatService executes three operations in sequence: verify the hold belongs to the requesting user (Lua GET + compare), write an order record to OrderDB with status = CONFIRMED, and publish a ticket_confirmed message to TicketStream. The order write to OrderDB is synchronous — confirmation only returns success after the database write succeeds. OrderDB is PostgreSQL RDS Multi-AZ for 99.99% availability. TicketStream is a 3-broker Kafka cluster with replication factor 3, providing durability for ticket events even during broker failure.

TicketWorker consumes from TicketStream and performs two asynchronous tasks: QR barcode generation (SHA-256 of ticket payload, rendered as PNG, stored in S3) and email delivery (HTML email with attached QR code via SES). TicketWorker scales independently of SeatService — during onsale peak, multiple TicketWorker instances process the confirmation backlog. Kafka offset management ensures exactly-once processing: TicketWorker commits its offset only after successful QR generation and email delivery (or after max retries with dead-letter queue routing). If a broker goes down, consumers automatically rebalance and resume from the last committed offset.

The 10-component count (BuyerClient, LoadBalancer, SearchService, EventCache, EventDB, SeatService, SeatHoldCache, OrderDB, TicketStream, TicketWorker) reflects the necessary service separation for a production ticket booking system handling 50K concurrent seat holds and 5M concurrent browse requests simultaneously.

Architecture Preview
Loading architecture preview...
Key Design Decisions
Redis SETNX vs PostgreSQL SELECT FOR UPDATE

Choice

Redis SETNX with 600-second TTL per seat key

Rationale

SETNX reduces seat hold latency from 50ms (PostgreSQL lock hold) to 2ms (Redis command round-trip) — a 25x improvement. More importantly, SETNX on independent seat keys eliminates shared resource contention: 60,000 seats = 60,000 parallel operations with no coordination. PostgreSQL SELECT FOR UPDATE shares a 200-connection pool between all operations, creating cascading saturation during onsale. Redis SETNX uses a separate connection pool isolated from read traffic.

10-Minute Hold TTL

Choice

600-second Redis TTL on seat hold keys

Rationale

The TTL must be long enough for payment completion (typical 2-3 minutes for card entry + 3D Secure + bank authorization, with headroom for slow connections) and short enough to prevent seat hoarding. 10 minutes is the industry standard used by Ticketmaster, StubHub, and AXS. TTL shorter than 5 minutes risks expiring holds while users are mid-payment on slow connections. TTL longer than 15 minutes gives too much advantage to bot buyers who hold seats without intent to purchase.

Separate EventCache for Seat Availability

Choice

Compact availability bitmap in EventCache with 2s TTL, invalidated on every hold

Rationale

Direct reads from SeatHoldCache for seat map rendering would require SCAN across 60K keys per event per render — 300B Redis ops/sec at 5M concurrent viewers, an impossible load. The availability bitmap (7.5KB per event) reduces this to O(1) per render regardless of seat count. SeatService triggers invalidation on every SETNX success, ensuring staleness never exceeds the 2-second TTL in normal operation.

Separate SearchService and SeatService

Choice

Independent services with independent scaling, connection pools, and thread pools

Rationale

The 100:1 browse-to-purchase ratio means browse traffic is 100x larger by volume than hold traffic. Browse is read-heavy and cache-absorbed; holds are write-heavy and latency-critical. A shared service at onsale open would see a browse surge (5M users refreshing the event page) compete with seat hold requests (50K SETNX operations) for the same thread pool. Separation ensures the browse surge cannot crowd out hold threads during the highest-stakes window.

Kafka for Async Ticket Delivery

Choice

TicketStream Kafka topic → TicketWorker generates QR + sends email asynchronously

Rationale

QR generation and email delivery add 100-200ms per confirmation and have 2-5% failure rates from SMTP providers. At 5K concurrent confirmations at onsale peak, synchronous delivery would block 5K service threads simultaneously. Kafka decouples fast confirmation (SeatService returns success after OrderDB write, 10ms) from slow delivery (TicketWorker email within 30 seconds). Kafka's offset-based exactly-once semantics ensure no confirmed ticket ever loses its QR email.

Scale & Performance

Target RPS

50K seat holds/sec at onsale peak; 5M browse RPS (EventCache 99% hit rate)

Latency (p99)

2ms seat hold (SETNX), <20ms browse (cache hit), <30s ticket email (async)

Storage

~200 GB total; SeatHoldCache ~3.6MB per active event in Redis

Availability

99.99% (multi-node Redis cluster + RDS Multi-AZ + Kafka 3-way replication)

Database Schema (HLD)
seats (EventDB — PostgreSQL)

Stores seat geometry and price configuration. NOT the availability source of truth in V1 — SeatHoldCache (Redis) owns real-time availability. EventDB seats are updated only on confirmed purchases (status = SOLD) or on hold expiry cleanup. Provides the seat map geometry (section, row, number) that the availability bitmap is layered over.

seat_id UUID PK (unique seat identifier per event)event_id UUID FK (parent event)section VARCHAR (section name: Floor, Orchestra, Mezzanine, Upper)row_num VARCHAR (row identifier: A, B, 1, 2)seat_number INT (seat number within row)price_tier VARCHAR (GA, Standard, Premium, VIP)price_cents INT (seat price in cents)status VARCHAR (AVAILABLE / SOLD — HELD state lives in Redis, not here)

Indexes: idx_seats_event ON (event_id) — seat geometry fetch for map rendering

The status column reflects only AVAILABLE and SOLD — the HELD state is stored exclusively in SeatHoldCache (Redis SETNX key) and reflected in EventCache (availability bitmap). This avoids write contention on the seats table during onsale: only OrderDB receives high-frequency writes.

orders (OrderDB — PostgreSQL RDS Multi-AZ)

Immutable record of confirmed ticket purchases. Written only when SeatService converts a Redis hold to a confirmed purchase after payment success. This is the source of truth for all financial and ticketing records. TicketWorker reads from TicketStream (Kafka) rather than polling this table, preventing read amplification on the orders table during peak.

order_id UUID PK (unique order identifier)user_id UUID (purchasing user)seat_id UUID FK (purchased seat)event_id UUID FK (event)status VARCHAR (CONFIRMED / CANCELLED / REFUNDED)price_paid_cents INT (price at time of purchase, locked at hold creation)confirmed_at TIMESTAMPTZ (payment success timestamp)ticket_generated BOOL (updated by TicketWorker on QR delivery)

Indexes: idx_orders_user ON (user_id) — user purchase history, idx_orders_event ON (event_id) — event sales reporting and capacity checks, idx_orders_seat ON (seat_id) — verify single confirmed order per seat

Multi-AZ PostgreSQL for 99.99% availability on the financial record path. At 5K purchases/sec peak per major event, sustained write throughput requires connection pooling via PgBouncer. ticket_generated starts false and is set to true by TicketWorker after successful QR email delivery.

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions
Why use Redis SETNX directly instead of a distributed lock library like Redlock?

Redis SETNX is already a distributed lock. It is atomic at the Redis command level (single-threaded command execution), provides native TTL support for automatic expiry, and completes in a single round-trip to Redis. Libraries like Redlock add complexity by acquiring locks across multiple independent Redis nodes for additional safety guarantees — but those guarantees address clock drift and network partition scenarios that do not change the correctness of per-seat holds. For Ticketmaster's use case, a single Redis node SETNX with TTL is correct and sufficient. Redlock adds 3-5 Redis round-trips per lock acquisition for marginal additional safety in a scenario where the consequence of a rare concurrent double-hold is a brief payment conflict, not permanent data loss.

What happens when a hold expires while the user is actively entering payment details?

When the 600-second TTL elapses, Redis automatically deletes the seat hold key. The seat becomes available for other users to SETNX. The original user continues entering payment details without any immediate notification — the expiry is invisible to them until they submit payment. When they submit, SeatService attempts to verify the hold (GET seat:{event_id}:{seat_id} and check owner = user_id) and finds the key does not exist. SeatService returns HOLD_EXPIRED to the client. The user must return to the seat map and select a new seat — which may now be unavailable if another user grabbed it during the expiry window. Production systems add client-side countdown timers (visible 10-minute countdown) to prevent silent expiry surprises.

How does the seat map stay fresh without constantly reading SeatHoldCache?

EventCache stores a compact availability bitmap per event: 1 bit per seat (0 = available, 1 = held/sold), 7.5KB for a 60,000-seat venue. This single key is fetched once per 2-second TTL window — a single Redis GET serves all concurrent seat map renders within that window. SeatService invalidates (DEL or overwrites) the EventCache availability key on every successful SETNX hold and every hold release, triggering a fresh bitmap fetch on the next seat map request. Between invalidations, up to 2 seconds of staleness is the design trade-off: users may see a seat as available for up to 2 seconds after it was held, but the subsequent SETNX attempt will correctly return SEAT_UNAVAILABLE.

What is the scaling ceiling of V1 and when do you need V2?

A single SeatHoldCache Redis node handles approximately 100,000 SETNX operations per second before CPU saturation. For a single major event with 50K concurrent hold attempts, V1 operates comfortably within this ceiling. The breakdown point is Taylor Swift or Super Bowl scale: 5 million users simultaneously attempting SETNX at onsale open. Even with 100K ops/sec capacity, 5 million simultaneous requests create a connection queue that overwhelms the Redis node. V2 adds a WaitingRoomService that gates admission to SeatService, releasing tokens at 5K/sec — ensuring SeatHoldCache never sees more than 5K concurrent SETNX ops regardless of how many users are waiting.

How do you handle multi-seat bookings where a user wants 4 seats together?

Multi-seat booking requires all-or-nothing semantics: if any of the 4 seats is unavailable, none should be held. The implementation uses a two-phase approach: first, SETNX all 4 seat keys in parallel (Redis pipeline for minimal latency); if all 4 succeed, the booking proceeds. If any SETNX fails (seat already held), SeatService releases all successfully held seats using a Lua script for atomic compare-and-delete (verifying ownership before deletion to prevent race conditions). The Lua script executes atomically on a single Redis node — if all 4 seats for the event are on the same Redis shard (which they are in V1's single-node setup), this is straightforward. V2 requires cross-shard coordination since the Redis cluster is sharded by event_id, keeping all seats for one event on one shard.

Related Templates

Discussion

Sign in to join the discussion.

Ready to design your own Ticketmaster?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator