Production-grade flash sale with defense-in-depth: CloudFront CDN with Lambda@Edge virtual waiting room absorbs 90%+ of traffic at the edge, device fingerprinting blocks bots, VIP priority queue serves high-value customers first, and tiered inventory (Redis + PostgreSQL) guarantees zero overselling with durable reconciliation.
The tiered flash sale architecture is the production-grade approach used by platforms like Nike SNKRS, Ticketmaster, and Amazon Lightning Deals to handle 50M+ concurrent users competing for limited-quantity inventory. It solves two problems that the Queue variant leaves unaddressed: bot/scalper exploitation and traffic volume that overwhelms even application-level queues.
The first problem is bots. In the Queue variant, automated scripts join the virtual queue microseconds after sale start, securing top positions ahead of human users. Without bot mitigation, scalpers capture 30-50% of limited-edition inventory using purpose-built software. Device fingerprinting — analyzing browser canvas hash, WebGL renderer, timezone, installed fonts, mouse movement patterns, and request timing — detects bots with ~95% accuracy and ~1-2% false positive rate. The BotDetector service sits between the API Gateway and the Load Balancer, filtering automated requests before they reach inventory.
The second problem is scale. At 50M concurrent users, even the API Gateway receives 10M+ RPS at sale start. Application-level components cannot absorb this traffic economically. The solution is pushing admission control to the network edge: a CloudFront CDN with Lambda@Edge implements a virtual waiting room across 400+ global edge locations. The Lambda@Edge function assigns queue positions, issues admission tokens, and returns a waiting page — all without any traffic reaching the origin. Only admitted users (5K/sec globally) proceed to the API Gateway. This absorbs 90%+ of traffic at the CDN edge.
The tiered inventory architecture combines Redis for the fast reservation path with PostgreSQL for durable persistence. Redis atomic DECR provides sub-millisecond inventory checks with zero overselling. PostgreSQL stores the durable order records and provides a reconciliation baseline for Redis every 30 seconds. If Redis fails, the system falls back to PostgreSQL (with higher latency but maintained correctness). The tiered approach gives both speed and durability.
VIP priority adds a business dimension to the technical architecture. Lambda@Edge reads the user's loyalty tier from a signed JWT claim and assigns priority queue positions — Gold and Platinum customers skip ahead in the waiting room. This maximizes revenue per sale: VIP customers have 20x higher lifetime value and 3x higher conversion rates. The waiting room maintains FIFO ordering within each tier.
The async processing pipeline handles payment and confirmation via Kafka. PaymentWorker processes payments asynchronously (~200ms per transaction), updates order status, and releases stock back via Redis INCR on failure. ConfirmationWorker sends email and push notification confirmations. The entire post-reservation pipeline is decoupled from the purchase path.
The global distribution aspect adds another layer of complexity. With 50M users across multiple continents, the CDN edge waiting room must coordinate token release across 400+ edge locations. Lambda@Edge functions at each PoP maintain local counters that synchronize with a central budget allocator every 100ms. Each edge location receives a token budget (e.g., 50 tokens/sec for a US-East PoP, 5 tokens/sec for a Singapore PoP) proportional to its traffic share. This distributed admission control prevents any single region from consuming all available inventory while maintaining globally consistent release rates.
The operational complexity of the tiered approach is significant. Monitoring dashboards must track metrics at every layer: CDN hit rate, Lambda@Edge cold start frequency, bot detection accuracy and false positive rate, admission token utilization, Redis DECR throughput, PostgreSQL reconciliation drift, Kafka consumer lag, and payment success rate. Alerting thresholds at each layer enable rapid diagnosis when the system degrades — the layered architecture makes it possible to isolate the failing component quickly.
This architecture represents the frontier of flash sale system design. Interviewers at FAANG companies expect senior candidates to discuss CDN-edge traffic absorption, bot detection trade-offs (accuracy vs false positives), VIP prioritization business justification, and tiered storage with reconciliation. The progression from Naive to Queue to Tiered demonstrates deepening understanding of the distributed systems challenges in high-contention inventory management.
The tiered flash sale system uses 12 components organized into four defense layers: CDN edge (Client, CDN, WaitingRoom), security and routing (ApiGateway, BotDetector, LoadBalancer), core transaction (SaleService, InventoryCache, InventoryDB), and async processing (OrderStream, PaymentWorker, ConfirmationWorker).
The first defense layer is the CDN edge. All traffic enters through CloudFront CDN, which serves cached product pages and availability data with a 2-second TTL (95% cache hit rate for GET requests). Purchase attempts trigger Lambda@Edge — the virtual waiting room. Lambda@Edge assigns a queue position based on arrival time and user tier (VIP users get priority positions), issues an admission token with 5-minute TTL, and returns a waiting page. The function executes at the edge location closest to the user, so no traffic crosses the internet to reach the origin. Lambda@Edge releases 5000 admission tokens per second globally — only token-holders proceed further.
The second defense layer is security and routing. Admitted users hit the API Gateway, which validates the admission token and JWT authentication (~3ms), enforces a 100K RPS rate limit, and routes purchase requests through the BotDetector. The BotDetector analyzes device fingerprint (canvas hash, WebGL renderer, fonts, timezone), behavioral signals (mouse movement, typing patterns), and request timing. Requests scoring above the 0.85 bot confidence threshold are rejected (429 Too Many Requests). Clean requests proceed to the Load Balancer, which distributes across SaleService pods via least-connections algorithm.
The third layer is core transaction processing. SaleService receives ~5K purchase requests per second (down from 10M at CDN edge — a 2000x reduction). It validates the admission token, performs an atomic DECR on InventoryCache (Redis) for instant stock reservation, persists the order to InventoryDB (PostgreSQL) for durability, and publishes an order event to OrderStream (Kafka). InventoryCache and InventoryDB are reconciled every 30 seconds to ensure consistency.
The fourth layer is async processing. OrderStream (Kafka, 64 partitions) carries order events to PaymentWorker and ConfirmationWorker. PaymentWorker calls the external payment gateway (~200ms), updates order status in InventoryDB, and triggers Redis INCR on payment failure to release reserved stock. ConfirmationWorker sends email confirmations via SES and push notifications via SNS.
The reconciliation process between InventoryCache (Redis) and InventoryDB (PostgreSQL) runs every 30 seconds. A background reconciliation job compares the Redis stock counter with the count of non-expired orders in PostgreSQL for each item. If drift is detected (e.g., a Redis INCR was lost due to a network partition during a payment failure), the reconciliation job corrects the Redis counter. This ensures that over time, Redis and PostgreSQL converge to the same stock count, even in the presence of transient failures.
The ConfirmationWorker is the final component in the pipeline. It consumes order events from Kafka after PaymentWorker has updated the order status, and sends email confirmations via SES and push notifications via SNS. With 10 workers and 30ms processing time, it handles the confirmation volume with generous headroom. Confirmation delivery is best-effort — if SES or SNS fails, the worker retries with exponential backoff but does not block the pipeline.
The architecture provides defense-in-depth against both traffic overload and adversarial actors. Each layer reduces the traffic that reaches the next: CDN edge absorbs 90% (10M to 50K), bot detection filters 10% of remainder (50K to 45K), and only 5K/sec of verified, admitted, human purchase attempts reach the inventory system. This layered reduction is the key architectural pattern.
This sequence diagram shows the four defense layers a purchase request passes through: (1) CDN edge waiting room for admission control, (2) API Gateway + bot detection for security, (3) SaleService + Redis for atomic reservation, (4) Kafka workers for async payment. Each layer reduces the traffic that reaches the next, from 10M RPS at the CDN to 5K at the reservation service.
Step-by-Step Walkthrough
Pseudocode
// Layer 1: CDN Edge Waiting Room (Lambda@Edge)
async function handleEdgeRequest(request):
user_tier = verifyJWT(request.headers.authorization).tier
admission_token = getAdmissionToken(request.cookies)
if admission_token and isValid(admission_token):
return forwardToOrigin(request) // admitted user
// Assign queue position (VIP priority)
position = assignPosition(user_tier) // Gold+ gets priority
token = generateToken(position, ttl: 300)
return waitingPage(position, token)
// 90% of users stay here, polling for admission
// Layer 2: Bot Detection
async function verifyHuman(fingerprint, behavioral_signals):
canvas_score = analyzeCanvas(fingerprint.canvas_hash)
webgl_score = analyzeWebGL(fingerprint.webgl_renderer)
timing_score = analyzeRequestTiming(behavioral_signals)
bot_confidence = weightedAverage(canvas_score, webgl_score, timing_score)
if bot_confidence > 0.85:
return { status: 429, error: "BOT_DETECTED" }
return { status: 200, verified: true }
// Layer 3: Atomic Reservation
async function purchase(admission_token, user_id, sale_id, item_id):
validateAdmissionToken(admission_token) // from waiting room
remaining = await redis.decr("stock:" + sale_id + ":" + item_id)
if remaining < 0:
await redis.incr("stock:" + sale_id + ":" + item_id) // restore
return { status: 409, error: "SOLD_OUT" }
order_id = generateUUID()
await db.execute(
"INSERT INTO orders (order_id, user_id, sale_id, item_id, status)
VALUES ($1, $2, $3, $4, 'RESERVED')",
[order_id, user_id, sale_id, item_id]
)
await kafka.publish("order-events", { order_id, user_id, sale_id, item_id })
return { status: 200, order_id, message: "RESERVED" } // ~30ms totalChoice
Lambda@Edge at CloudFront for queue admission instead of application-level queue
Rationale
At 10M RPS, even an API Gateway would be overwhelmed. CloudFront has 400 Gbps capacity across 400+ PoPs. Lambda@Edge executes queue logic at the edge — no traffic reaches the origin until admitted. This absorbs 90% of traffic before it touches any backend infrastructure. An application-level queue requires all 10M requests to reach the origin, which itself becomes a scaling challenge.
Choice
Passive fingerprinting (canvas, WebGL, fonts) instead of CAPTCHA
Rationale
CAPTCHAs add 5-10 seconds of friction for legitimate users and are solvable by CAPTCHA farms ($2/1000). Device fingerprinting is invisible — zero user friction. Combined with behavioral analysis, it achieves ~95% bot detection with ~1-2% false positive rate. The false positive rate is the key trade-off: 1-2% of legitimate users are incorrectly blocked during the 60-second sale window.
Choice
Loyalty tier determines queue position within the waiting room
Rationale
VIP customers (Gold+) have 20x lifetime value and 3x conversion rate. Giving them priority access maximizes revenue per sale. Lambda@Edge reads the tier from a signed JWT claim — no database lookup at the edge. The waiting room maintains FIFO within each tier, so VIP users compete fairly with other VIPs.
Choice
Redis for fast-path reservation, PostgreSQL for durable persistence with 30-second reconciliation
Rationale
Redis DECR provides sub-millisecond atomic reservation with zero overselling. But Redis is volatile — node failure loses state. PostgreSQL provides durability. The 30-second reconciliation ensures Redis and PostgreSQL agree on stock counts. If Redis fails, the system falls back to PostgreSQL with higher latency (~50ms vs 1ms) but maintained correctness.
Choice
Dedicated BotDetector service rather than inline middleware
Rationale
Bot detection requires computational resources (fingerprint analysis, ML model inference) that would slow down the main purchase path if inline. A separate service allows independent scaling (10 pods for bot detection vs 15 for SaleService) and independent deployment of new detection models without risking the purchase flow.
Choice
Order events published to Kafka for async processing by PaymentWorker and ConfirmationWorker
Rationale
Payment gateway calls take 200-500ms and fail ~5%. Kafka decouples reservation (40ms response) from payment (200ms) and confirmation (30ms). Failed payments release stock via Redis INCR. The 64-partition topic provides per-order ordering guarantee and 200K msg/sec capacity — 40x headroom above the 5K reservation rate.
Target RPS
10M RPS at CDN, 5K reservations/sec at origin
Latency (p99)
<30ms reservation (post-admission)
Storage
~50 GB/month (Redis + PostgreSQL + Kafka retention)
Availability
99.9% (defense-in-depth, multi-AZ, replicated)
| Operation | Time | Space | Notes |
|---|---|---|---|
| CDN edge waiting room (POST /join) | O(1) Lambda@Edge execution (~10ms) | O(1) per admission token (~200 bytes) | Executes at 400+ edge locations. 5K tokens/sec global release rate. |
| Bot detection (POST /verify) | O(1) fingerprint comparison + ML inference (~5ms) | O(1) per request (stateless) | ~95% detection accuracy, ~1-2% false positive rate. 50K RPS capacity. |
| Purchase reservation (POST /purchase) | O(1) Redis DECR + O(1) DB INSERT + O(1) Kafka publish (~30ms total) | O(1) per order (~500 bytes) | Only 5K/sec reach this point (down from 10M at CDN edge). |
| Payment processing (async) | O(1) payment gateway call (~200ms) + O(1) DB UPDATE | O(1) per order update | 30 workers. Failed payments trigger Redis INCR to release stock. |
Durable inventory store reconciled with Redis every 30 seconds. PostgreSQL is the source of truth for stock counts. Columns include sale_id, item_id, total_stock, remaining_stock, and reconciled_at. Queried during Redis failure fallback (SELECT FOR UPDATE) and reconciliation.
Indexes: idx_inventory_sale ON (sale_id, item_id)
Reconciliation runs every 30 seconds. During Redis failure, serves as fallback via SELECT FOR UPDATE at ~200 TPS.
Stores flash sale order records with full lifecycle. Status: RESERVED -> CONFIRMED or RESERVED -> EXPIRED/FAILED. Written by SaleService at reservation time (~5K inserts/sec), updated by PaymentWorker after payment processing. Includes admission_token for audit trail and user_tier for analytics.
Indexes: idx_orders_user ON (user_id, reserved_at DESC), idx_orders_sale_status ON (sale_id, status), idx_orders_admission ON (admission_token)
32 partitions x 3 replicas. Strong consistency. 5-minute reservation TTL with background expiry.
Atomic inventory counter for fast-path reservation. Initialized from PostgreSQL at sale start. DECR on each reservation, INCR on payment failure or reservation expiry. Reconciled with PostgreSQL every 30 seconds.
6-node Redis cluster. 3600s TTL for post-sale cleanup. ~1ms DECR latency, 100K ops/sec capacity.
Admission tokens issued by Lambda@Edge waiting room. SET with NX and 300s TTL. Validated by SaleService before allowing purchase. Deleted after use to prevent replay.
5-minute TTL. One-time use (deleted after successful reservation). ~5K new tokens/sec globally.
Emitted by SaleService after successful inventory reservation. Consumed by PaymentWorker (payment processing) and ConfirmationWorker (email/push notification). Partitioned by order_id for per-order ordering. 64 partitions, 200K msg/sec capacity.
Key Schema
order_id: string (partition key for per-order ordering)
Value Schema
{ order_id: string, user_id: string, sale_id: string, item_id: string, user_tier: string, reserved_at: string }
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| Naive (Direct DB Lock) | T1 | 50ms-5s purchase (lock contention) | ~200 concurrent purchases | $300/month (single DB + service) | Low — 4 components, no cache, no queue | 99% (single DB, no redundancy) |
| Queue (Virtual Queue + Token Bucket) | T2 | <50ms reservation (token-gated) | 10M concurrent users, 1K reservations/sec | $2,500/month (Redis + Kafka + workers) | Medium — virtual queue, Redis DECR, Kafka | 99.9% (replicated components) |
| Tiered (Waiting Room + Bot Detection + CDN) | T3 | <30ms reservation (post-admission) | 50M+ concurrent users, 5K reservations/sec | $8,000/month (CDN + Lambda + full pipeline) | High — 12 components, CDN edge, bot detection | 99.9% (defense-in-depth, multi-layer) |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
Lambda@Edge executes at CloudFront's 400+ edge locations, processing requests before they cross the internet to the origin. At sale start, 10M users hit the nearest edge location. Lambda@Edge assigns queue positions and returns a waiting page — a lightweight HTML response with JavaScript polling. Only 5K admission tokens/sec are released globally. The remaining 99.95% of users see a waiting page served entirely from the edge. Origin traffic: ~50K RPS (admitted users + availability cache misses).
Device fingerprinting has a ~1-2% false positive rate — legitimate users with unusual browser configurations (privacy extensions, Tor Browser, accessibility tools, corporate VPNs) may be flagged as bots. During a 60-second flash sale, there is no time for manual appeal. The mitigation is a 'soft block' mode: instead of outright rejection, flagged users are deprioritized in the waiting room (moved to the back of the queue) rather than blocked. This reduces false-positive impact while still delaying bot traffic.
The user's loyalty tier (Bronze, Silver, Gold, Platinum) is encoded as a signed JWT claim at login time. Lambda@Edge verifies the JWT signature (public key cached at edge) and reads the tier claim — no database or API call needed. VIP users (Gold+) receive queue positions that sort ahead of non-VIP users in the waiting room. The JWT signature prevents clients from forging their tier.
If InventoryCache (Redis) becomes unavailable, the circuit breaker in SaleService triggers a fallback path: purchase attempts go directly to InventoryDB using SELECT FOR UPDATE (the naive approach). Latency increases from 1ms to 50ms per reservation, and throughput drops from 5K to ~200 TPS. The sale continues but at degraded capacity. The last reconciliation point (within 30 seconds) ensures InventoryDB has an accurate stock count. This graceful degradation is a key benefit of the tiered architecture.
The reconciliation interval balances consistency against database load. At 5K reservations/sec, 30 seconds accumulates up to 150K state changes. The reconciliation query compares Redis stock counts against PostgreSQL order counts and corrects any drift (from failed INCR operations, network partitions, etc.). Shorter intervals mean more database load; longer intervals increase the window of potential inconsistency. 30 seconds is the industry standard for this pattern.
Nike SNKRS uses a similar waiting room concept ('drawing' system for limited releases) with bot detection powered by their partnership with Shape Security (now F5). Ticketmaster's 'Smart Queue Technology' uses randomized admission (lottery) rather than FIFO to prevent bots from gaming join time. Both use CDN-edge admission control and tiered inventory. This template captures the common architectural patterns — CDN waiting room, bot filtering, priority access, atomic inventory — while simplifying the ML-based detection models that production systems use.
Sign in to join the discussion.
Ready to design your own Flash Sale?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator