Vetora logo
Hard9 componentsInterview: Very High

Flash Sale — Virtual Queue + Token Bucket

Industry-standard flash sale architecture: virtual queue (Redis sorted set) shapes demand from 500K RPS to 1K/sec via controlled token release. Redis atomic DECR prevents overselling. Kafka handles async payment. Demonstrates demand shaping as the key optimization over naive database locking.

TransactionsRedisKafkaDemand ShapingVirtual Queue
Problem Statement

The virtual queue with token bucket architecture is the industry-standard approach for flash sale systems handling millions of concurrent users. It solves the fundamental problem that makes the naive database lock approach collapse: the thundering herd, where all users hit inventory simultaneously, overwhelming the database with lock contention.

The key architectural insight is demand shaping. Instead of allowing 500K users to compete for database row locks simultaneously, a virtual queue serializes access: users enter a FIFO queue backed by a Redis sorted set, and a TokenWorker releases purchase tokens at a controlled rate (1000/sec). Only token-holders can attempt a reservation. This reduces backend load by 500x — the database never sees more than 1000 concurrent purchase attempts regardless of how many users are waiting.

The demand shaping separates two concerns that the naive approach conflates: fairness and inventory management. The virtual queue provides fairness via FIFO ordering — users who arrived first get tokens first. Inventory management uses Redis atomic DECR — a single-threaded operation that returns the new stock count in one round-trip (~1ms), eliminating the need for database row locks entirely. If the DECR result is >= 0, the reservation succeeds. If negative, the item is sold out.

The numbers make the case compelling. In the naive approach, 50K concurrent users generate 50K SELECT FOR UPDATE transactions, saturating the 100-connection pool in milliseconds. With the virtual queue, those 50K users generate 50K ZADD operations to Redis (which handles 500K ZADD/sec easily on a single node), and only 1000/sec token-gated reservation attempts reach the inventory system. The database handles order persistence at 1000 inserts/sec — well within its capacity.

Kafka decouples reservation from payment processing. When a reservation succeeds, SaleService publishes a checkout event to Kafka and immediately returns 'reserved' to the user (~40ms total). CheckoutWorker asynchronously processes payment (~200ms), updates the order status, and releases stock back via Redis INCR if payment fails. This means users see confirmation in 40ms instead of 240ms, and payment gateway failures do not block the reservation path.

The architecture scales horizontally at each tier. QueueService scales with queue-join traffic (500K RPS). ReservationService scales with token-gated traffic (1K/sec). The virtual queue capacity scales with Redis cluster size. Kafka throughput scales with partition count. Each tier can be scaled independently.

The primary limitation is that the virtual queue does not address bots. Automated scripts can join the queue microseconds after sale start, grabbing positions ahead of human users. CAPTCHA at queue join and device fingerprinting add bot mitigation but are not part of this architecture. The Tiered variant addresses bot detection via a dedicated BotDetector service with device fingerprinting.

The reservation TTL mechanism adds another dimension of complexity. When a user reserves an item, the stock is decremented in Redis but payment has not yet been processed. If the user abandons the checkout (closes the browser, payment card is declined, network timeout), the reserved stock must be returned to the pool. The 5-minute TTL on reservations triggers a background expiry process that detects abandoned reservations, INCRs the Redis counter to release the stock, and updates the order status to EXPIRED in PostgreSQL. This TTL-based release mechanism is essential for preventing inventory 'leakage' — without it, abandoned reservations would permanently drain available stock.

The WebSocket connection cost for real-time queue position updates is a significant infrastructure consideration. With 10M users in the queue, maintaining 10M persistent WebSocket connections requires substantial memory and network resources. Production implementations often use long polling (30-second intervals) instead of WebSocket to reduce connection overhead, accepting slightly less responsive queue position updates as the trade-off.

This architecture appears in interviews at Amazon, Shopify, Ticketmaster, Nike, and Walmart. Interviewers expect candidates to explain demand shaping, articulate why Redis DECR replaces database locks, reason about the FIFO fairness guarantees, and identify bot vulnerability as the motivation for the tiered approach.

Architecture Overview

The virtual queue flash sale system uses 9 components organized into three layers: traffic ingestion (Client, API Gateway, QueueService), inventory reservation (ReservationService, InventoryCache, QueueCache, OrderDB), and async processing (CheckoutStream, TokenWorker, CheckoutWorker).

All traffic enters through the API Gateway, which performs JWT authentication (~3ms), enforces a 600K RPS rate limit, and routes queue operations to QueueService and reservation operations to ReservationService. The API Gateway is the single entry point — no traffic bypasses it.

The queue layer handles the traffic burst. When the sale opens, 500K users/sec hit POST /api/v1/sales/{id}/queue. QueueService writes each user into QueueCache (Redis sorted set with ZADD, scored by join timestamp) and returns their queue position. This is the highest-traffic operation — 500K RPS of simple ZADD writes. QueueService runs 20 pods with 100 threads each for 400K sustained throughput.

The token release mechanism is the demand shaping engine. TokenWorker runs a controlled loop, reading the next batch of users from QueueCache via ZPOPMIN, generating time-limited purchase tokens (5-minute TTL), and writing them back to QueueCache. It releases 1000 tokens/sec — this rate is configurable per sale. Token-holders poll or receive WebSocket notifications that their token is ready.

The reservation layer handles token-gated purchases. ReservationService validates the purchase token against QueueCache, performs an atomic DECR on InventoryCache (Redis), and if the result is >= 0, the reservation succeeds. The service persists the order to OrderDB with status RESERVED and publishes a checkout event to CheckoutStream (Kafka). At 1000/sec token-gated traffic, the backend operates at sustainable capacity with massive headroom.

The async processing layer handles payment. CheckoutWorker consumes checkout events from Kafka, calls the external payment gateway (~200ms), updates order status to CONFIRMED or FAILED in OrderDB, and releases stock back via Redis INCR on payment failure. The 5-minute reservation TTL ensures abandoned reservations expire and stock returns to the pool.

InventoryCache (Redis) is the zero-overselling guarantor. Each item's stock is stored as a simple integer counter. DECR is atomic and single-threaded — no lock contention, no race conditions. At 1000 DECR/sec, Redis handles this trivially. The counter going negative means sold out — the service INCRs it back immediately.

QueueCache (Redis sorted set) serves dual purpose: FIFO queue storage for user positions and token storage for purchase admission. ZADD gives O(log N) insertion, ZPOPMIN gives O(log N) dequeue — both sub-millisecond at 10M entries.

OrderDB (PostgreSQL) is the durable persistence layer. Each reservation creates an order record with status RESERVED. CheckoutWorker updates the status to CONFIRMED after successful payment or FAILED after payment decline. The database uses 16 partitions with 3 replicas for durability and throughput. Strong consistency prevents double reservations. At 1K inserts/sec (token-gated), the database operates well within capacity — a dramatic improvement over the naive approach where the database was the bottleneck.

The separation of concerns across services is a key architectural strength. Each component has a clear responsibility: QueueService manages the waiting line, TokenWorker controls admission rate, ReservationService handles inventory and orders, CheckoutWorker processes payment. This decomposition enables independent scaling, independent deployment, and independent failure isolation. If CheckoutWorker falls behind (payment gateway slowdown), reservations continue unaffected — Kafka buffers the backlog.

Architecture Preview
Loading architecture preview...
Key Design Decisions
Virtual Queue (Redis Sorted Set)

Choice

FIFO queue backed by Redis ZADD/ZPOPMIN with timestamp score for ordering

Rationale

Without a queue, 10M users hammer inventory simultaneously — even Redis melts at 10M ops/sec on a single key. The virtual queue shapes demand to a sustainable 1000 reservations/sec, reducing backend load by 10,000x. Redis sorted set gives O(log N) insertion and dequeue with sub-millisecond latency at 10M entries. FIFO ordering provides fairness.

Token Bucket Admission Control

Choice

TokenWorker releases 1000 purchase tokens/sec from the queue head

Rationale

The token release rate is the demand shaping knob. At 1000/sec, the backend sees a steady, sustainable flow regardless of burst size. The rate is tunable per sale — higher for large inventory, lower for limited drops. Token TTL (5 minutes) prevents hoarding — unused tokens expire and the slot returns to the queue.

Redis Atomic DECR for Inventory

Choice

Single-threaded Redis DECR replaces PostgreSQL SELECT FOR UPDATE

Rationale

Redis DECR is atomic, single-threaded, and completes in ~1ms. PostgreSQL SELECT FOR UPDATE takes 50-100ms and creates lock contention. At 1000 concurrent requests, Redis handles DECR trivially while PostgreSQL serializes them into a seconds-long queue. Redis gives 100x lower latency with zero lock contention.

Separate QueueService and ReservationService

Choice

Dedicated services for queue management vs inventory reservation

Rationale

QueueService handles 500K RPS of queue joins (simple ZADD, low CPU). ReservationService handles 1K RPS of reservations (token validation + DECR + DB write + Kafka publish — complex logic). Separating them prevents the 500K queue-join flood from starving reservation processing. Each scales independently.

Kafka for Async Payment

Choice

Payment processing decoupled via Kafka event stream

Rationale

Payment gateway calls take 200-500ms and fail ~5% of the time. If payment blocks the reservation response, users wait 500ms+ and see timeouts. Kafka decouples reservation acknowledgment from payment — users see 'reserved' in 40ms while payment settles asynchronously. Failed payments release stock back via Redis INCR.

Scale & Performance

Target RPS

500K queue joins/sec, 1K reservations/sec

Latency (p99)

<50ms reservation (token-gated)

Storage

~10 GB/month (Redis + PostgreSQL)

Availability

99.9% (replicated Redis, Kafka, multi-AZ DB)

Time & Space Complexity
OperationTimeSpaceNotes
Queue join (POST /queue)O(log N) Redis ZADD, N = queue sizeO(1) per entry (~200 bytes)Sub-millisecond at 10M entries. 500K ZADD/sec on 6-node cluster.
Token release (TokenWorker)O(log N) Redis ZPOPMIN per tokenO(1) per token (~100 bytes with TTL)1000 tokens/sec release rate. Configurable per sale.
Reservation (POST /reserve)O(1) Redis DECR + O(1) DB INSERT + O(1) Kafka publishO(1) per order (~500 bytes)Token-gated at 1K/sec. Total latency ~40ms (DECR + DB + Kafka).
Database Schema (HLD)
queue:{sale_id}:users (Redis Sorted Set)

Virtual FIFO queue implemented as a Redis sorted set, scored by join timestamp. Stores user queue positions and time-limited purchase tokens (5-minute TTL). ZADD for insertion, ZPOPMIN for token release. Expected to hold 10M entries at peak, auto-evicting after 600s TTL.

member: user_id (STRING, the user waiting in queue)score: join_timestamp (FLOAT, Unix epoch for FIFO ordering)token:{user_id} (STRING, purchase token with 300s TTL)

10M entries x 200 bytes = ~2GB working set. 6-node Redis cluster for 500K ZADD/sec throughput.

stock:{sale_id}:{item_id} (Redis Counter)

Atomic inventory counter using Redis DECR. Initialized to total stock at sale start. ReservationService does DECR — negative result means sold out (INCR to restore). Failed payments also trigger INCR to release stock.

value: remaining_stock (INTEGER, decremented atomically per reservation)

One key per item. Small working set (~1MB for 10K items). 3600s TTL for post-sale cleanup.

orders (PostgreSQL)

Stores flash sale order records. Status lifecycle: RESERVED -> CONFIRMED or RESERVED -> EXPIRED/FAILED. Written by ReservationService at 1K inserts/sec (token-gated). Updated by CheckoutWorker after payment processing. Partitioned by order_id.

order_id VARCHAR PK (UUID)user_id VARCHAR FK (purchasing user, indexed)sale_id VARCHAR FK (flash sale event ID)item_id VARCHAR FK (item purchased)status VARCHAR (RESERVED / CONFIRMED / EXPIRED / FAILED)reserved_at TIMESTAMPTZ (reservation timestamp)confirmed_at TIMESTAMPTZ (payment confirmation timestamp, nullable)

Indexes: idx_orders_user ON (user_id, reserved_at DESC), idx_orders_sale_status ON (sale_id, status)

16 partitions x 3 replicas. Strong consistency. 5-minute reservation TTL requires background expiry detection.

Event Contracts
checkout-eventscheckout-events

Emitted by ReservationService after successful inventory reservation (Redis DECR >= 0). Each event carries order details for async payment processing by CheckoutWorker. Partitioned by order_id for per-order ordering guarantee. Throughput: ~1K msg/sec (token-gated).

Key Schema

order_id: string (partition key for per-order ordering)

Value Schema

{ order_id: string, user_id: string, sale_id: string, item_id: string, reserved_at: string }

Solution Comparison
VariantTierLatencyThroughputCostComplexityReliability
Naive (Direct DB Lock)T150ms-5s purchase (lock contention)~200 concurrent purchases$300/month (single DB + service)Low — 4 components, no cache, no queue99% (single DB, no redundancy)
Queue (Virtual Queue + Token Bucket)T2<50ms reservation (token-gated)10M concurrent users, 1K reservations/sec$2,500/month (Redis + Kafka + workers)Medium — virtual queue, Redis DECR, Kafka99.9% (replicated components)
Tiered (Waiting Room + Bot Detection + CDN)T3<30ms reservation (post-admission)50M+ concurrent users, 5K reservations/sec$8,000/month (CDN + Lambda + full pipeline)High — 12 components, CDN edge, bot detection99.9% (defense-in-depth, multi-layer)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions
Why a virtual queue instead of letting everyone hit inventory directly?

Without a queue, 10M users hammering inventory simultaneously creates a thundering herd. Even Redis at 10M ops/sec on a single key experiences hot-key degradation. The virtual queue shapes demand to a sustainable 1000 reservations/sec, reducing backend load by 10,000x while maintaining FIFO fairness. The queue absorbs the traffic burst while the backend processes at a steady rate.

How does the token bucket prevent thundering herd?

The TokenWorker releases exactly 1000 purchase tokens per second via ZPOPMIN from the queue head. Only users with valid, unexpired tokens can call the reservation endpoint. Even if 10M users are in the queue, the reservation backend never sees more than 1000 concurrent attempts. This converts an unpredictable burst into a predictable, steady stream.

What happens if a user's token expires before they use it?

Tokens have a 5-minute TTL. If the user does not call the reservation endpoint within 5 minutes (browser closed, network issue, indecision), the token expires in Redis. The slot is effectively lost — no DECR happens, so no stock is consumed. In a production system, expired tokens could trigger re-queuing, but this template lets them expire silently to keep the model simple.

Why Redis sorted set instead of a regular list for the queue?

Redis sorted sets support ZADD with timestamp score for O(log N) insertion and ZPOPMIN for O(log N) dequeue — both sub-millisecond at 10M entries. A regular list (LPUSH/RPOP) would work for basic FIFO but does not support priority-based ordering (needed if VIP tiers are added later) or efficient position lookups (ZRANK). The sorted set is the industry-standard data structure for virtual queues.

How does this compare to Ticketmaster's queue system?

Ticketmaster uses a similar virtual queue concept (they call it 'Smart Queue Technology') with randomized admission at sale start (lottery-based rather than pure FIFO) to prevent bots from gaming join time. The core architecture — queue at ingress, controlled token release, atomic inventory reservation — is the same. This template uses FIFO ordering for simplicity; Ticketmaster's lottery approach adds bot resistance at the cost of perceived fairness.

What is the bot vulnerability in this architecture?

Bots can join the queue microseconds after sale start using automated HTTP requests, securing positions ahead of human users who are still loading the page. Without CAPTCHA at queue join or device fingerprinting, the fastest HTTP client wins. The Tiered variant addresses this with a dedicated BotDetector service using canvas fingerprinting, WebGL analysis, and behavioral signals to filter automated requests before they enter the queue.

Related Templates

Discussion

Sign in to join the discussion.

Ready to design your own Flash Sale?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator