Vetora logo
Hard9 componentsInterview: Very High

Live Sports Betting — Event-Sourced Ledger (WebSocket + Kafka)

Production-grade sportsbook using WebSocket streaming for real-time odds delivery, an event-sourced append-only ledger for bet recording, and Kafka for decoupled settlement processing. Sub-200ms bet acceptance with full audit trail.

KafkaWebSocketEvent SourcingReal-timeBetting
Problem Statement

The event-sourced ledger approach to live sports betting represents the industry standard used by production sportsbooks at DraftKings, FanDuel, and Bet365. It solves the two fundamental problems with the naive polling architecture: odds staleness and database bottleneck from polling reads.

The key insight is separating the odds delivery path from the bet acceptance path. Odds arrive from external feeds (Sportradar, Betgenius) via a dedicated OddsWorker, are published to Kafka for durability and fan-out, and cached in Redis for sub-100ms delivery to clients via WebSocket push. This eliminates the polling read load entirely — clients receive odds updates the moment they change, with no periodic database queries. The Redis cache serves as the authoritative source of current odds, updated within milliseconds of feed changes.

Bet acceptance validates against the Redis odds cache (not the database), writes to an append-only DynamoDB ledger, and publishes bet events to Kafka. The append-only pattern creates a regulatory audit trail inherently — every bet is an immutable record that can never be modified after placement. This is the industry standard for sportsbooks because gambling regulators require a complete, tamper-proof history of every wager.

The architecture decouples bet acceptance from settlement via Kafka. When a sporting event completes, a settlement command is published to Kafka. Settlement workers consume bet events grouped by sporting event, apply outcomes, calculate payouts, and write settlement records to the ledger — all as new immutable events. This event-sourced approach means the complete lifecycle of every bet is recorded: placement, outcome determination, settlement, and payout.

The primary challenge in this architecture is managing the odds staleness window. Between an odds update arriving at OddsWorker and reaching the Redis cache, there is a 50-500ms window where the cache contains stale odds. Bets placed during this window may be validated against prices that have already changed upstream. Production sportsbooks handle this with an odds tolerance threshold (e.g., accept bets within 5% of current odds) — a configurable trade-off between user experience (fewer rejections) and financial risk (accepting slightly stale prices).

This template is the most commonly discussed architecture in sportsbook system design interviews. Interviewers expect candidates to explain the WebSocket streaming model, reason about odds cache consistency, discuss the event-sourced ledger pattern for auditability, and analyze Kafka-based settlement for scalability. The comparison with the naive variant quantifies the improvement: 100x throughput, 10x lower latency, and near-zero odds-staleness rejections.

Architecture Overview

The event-sourced ledger architecture uses nine components organized into three logical layers: odds delivery, bet acceptance, and settlement processing.

The odds delivery layer handles the read-heavy path (60% of traffic). External odds feeds from providers like Sportradar push market updates to OddsWorker, which normalizes the data, publishes structured events to OddsStream (Kafka), and writes current prices directly to OddsCache (Redis). OddsService reads from OddsCache to serve client requests — either via WebSocket push for connected clients or via REST polling for simpler integrations. At 200K reads/sec peak, Redis handles this comfortably with a 6-node cluster. The key design decision is making Redis the authoritative source of current odds, not the database — this eliminates the naive approach's polling bottleneck entirely.

The bet acceptance layer handles the write-heavy path (30% of traffic). Bets arrive at the API Gateway (JWT auth, rate limiting), pass through the Load Balancer to BetService. BetService validates the bet against OddsCache (odds_version check in Redis, ~2ms), debits the user's balance, writes the bet to LedgerDB (DynamoDB append-only INSERT), and publishes a bet-placed event to BetStream (Kafka). The entire flow completes in under 200ms p99. The append-only ledger pattern means no bet record is ever modified — settlement and void events are new records referencing the original bet.

The settlement layer processes outcomes asynchronously. SettlementWorker consumes bet-placed events from BetStream, grouped by sporting event. When an event completes, the worker reads all open bets for that event from LedgerDB, applies the outcome, calculates winnings, credits user balances in OddsCache, and writes settlement records to LedgerDB. Settlement is sharded by event_id — multiple sporting events settle in parallel, with each event's bets processed as a batch. A single Premier League match with 500K bets settles in approximately 30 seconds.

Horizontal scaling is achieved at every layer independently. OddsService pods scale based on client connection count. BetService pods scale based on bet volume (CPU and thread utilization). OddsStream and BetStream scale via Kafka partition count (64 partitions each). SettlementWorker scales based on settlement queue depth — more workers are added during heavy settlement periods (Sunday evenings, major tournament finals).

The primary trade-off is single-region latency. This template runs in one AWS region. Users in other continents experience 100-200ms additional network latency for bet placement, which may cause odds-moved rejections for rapidly changing markets. The Global Compliant variant (V3) addresses this with multi-region deployment and jurisdiction-aware routing.

Architecture Preview
Loading architecture preview...
Request Flow — WebSocket Odds + Event-Sourced Bet Acceptance

The architecture splits into two completely independent data paths: the odds delivery path (OddsWorker -> Redis -> OddsService -> Client) and the bet acceptance path (Client -> BetService -> Redis validation -> DynamoDB ledger -> Kafka). These paths share the Redis OddsCache as a read/write coordination point but are otherwise independent, enabling independent scaling.

The key improvement over the naive approach is the elimination of database polling. Odds are pushed to clients via WebSocket within milliseconds of feed changes. The Redis cache serves as the authoritative source of current odds, updated by OddsWorker and read by both OddsService and BetService. No component reads odds from a database.

Loading diagram...

Step-by-Step Walkthrough

  1. 1External odds feed (Sportradar) pushes new prices to OddsWorker. The worker normalizes the data and writes to Redis OddsCache (~2ms) and publishes to Kafka OddsStream for durability
  2. 2Redis pushes odds updates to connected clients via WebSocket within ~50ms of the feed change — 60x faster than the naive 3-second polling interval
  3. 3Client places a bet: POST /bets with selection_id, odds, stake. API Gateway authenticates (JWT, ~3ms) and rate limits (per-user caps)
  4. 4BetService validates the bet: reads current odds from Redis (~2ms), compares odds_version. If odds have moved beyond tolerance, the bet is rejected. At <100ms staleness, rejection rate drops to <2% (vs 10-20% with polling)
  5. 5BetService writes the bet as an immutable event to DynamoDB (~30ms). This is an INSERT, never an UPDATE. The record includes full context: odds at placement, stake, user, timestamp
  6. 6BetService publishes a bet-placed event to Kafka BetStream (~5ms). This decouples bet acceptance from settlement — bets are accepted in real time, settled asynchronously
  7. 7When the sporting event ends, SettlementWorker consumes all bet-placed events for that event from Kafka. It applies the outcome, calculates payouts, and writes settlement events to DynamoDB as new immutable records
  8. 8User balances are credited in the odds cache (Redis) for immediate visibility. The DynamoDB ledger contains the complete, immutable history: bet placement -> outcome -> settlement -> payout

Pseudocode

// ODDS DELIVERY — Push-based via OddsWorker
async function onOddsUpdate(market_id, new_odds):
    // Write to Redis (authoritative odds cache)
    await redis.set("odds:" + market_id, serialize(new_odds))  // ~2ms

    // Publish to Kafka for durability + fan-out
    await kafka.produce("odds-updates", key=market_id, value=new_odds)

    // WebSocket push to all connected clients viewing this market
    websocket.broadcast(market_id, new_odds)  // ~50ms to all clients

// BET ACCEPTANCE — Validate against Redis, write to DynamoDB
async function placeBet(user_id, selection_id, odds, stake, idempotency_key):
    // Step 1: Validate odds against Redis cache (~2ms)
    current = await redis.get("odds:" + market_id)
    if current.status == "suspended": return 409  // Market suspended
    if abs(current.odds - odds) > tolerance: return 409  // Odds moved

    // Step 2: Append to immutable ledger (~30ms)
    await dynamodb.put({
        user_id,
        bet_id: snowflake_id(),
        event_id: current.event_id,
        market_id, selection_id, odds, stake,
        status: "PLACED",
        created_at: now()
    })

    // Step 3: Publish to Kafka for settlement (~5ms)
    await kafka.produce("bet-events", key=event_id, value=bet_event)
    return 201  // Total: ~50ms p99

// SETTLEMENT — Batch processing per event
async function settleEvent(event_id, outcome):
    bets = await dynamodb.query(
        index="event_id-index",
        KeyCondition: "event_id = :eid AND status = :placed"
    )
    for bet in bets:
        result = applyOutcome(bet, outcome)  // WON/LOST/VOID
        await dynamodb.put({  // New record, not UPDATE
            ...bet, status: result.status,
            payout: result.payout, settled_at: now()
        })
    // 500K bets in ~30 seconds (batch parallelism)
Data Model

The data model reflects a three-tier storage strategy: Redis for real-time odds (fast reads, ephemeral), DynamoDB for the bet ledger (durable, append-only, regulatory), and Kafka for event streaming (replay, fan-out, settlement coordination).

The key insight is that Redis is not a cache of a database — it is the primary store for current odds. OddsWorker writes directly to Redis, and all odds reads go to Redis. The database (DynamoDB) stores only the bet ledger. This clean separation eliminates cache invalidation complexity and allows each store to be optimized for its specific access pattern.

Loading diagram...

Step-by-Step Walkthrough

  1. 1Redis OddsCache stores current odds for all active markets. Written by OddsWorker within milliseconds of feed changes. Read by OddsService for client delivery and BetService for bet validation. TTL of 60 seconds auto-expires inactive markets
  2. 2DynamoDB bets table is the append-only regulatory ledger. Partition key: user_id (enables per-user audit queries). Sort key: bet_id (snowflake for chronological ordering). Global Secondary Index on event_id for settlement queries
  3. 3Kafka bet-events topic carries bet lifecycle events partitioned by event_id. SettlementWorker consumes events grouped by sporting event. 64 partitions provide settlement parallelism for concurrent sporting events
  4. 4The relationship between Redis and DynamoDB is validation: BetService reads current odds from Redis, validates the bet, then writes to DynamoDB. No data flows from DynamoDB to Redis — they are independent stores with different purposes
  5. 5Kafka receives bet events from BetService and delivers them to SettlementWorker. The Kafka topic is the coordination layer between real-time bet acceptance and asynchronous settlement

Pseudocode

// TIER 1: Redis Odds Cache (real-time, ~50MB)
// Write: OddsWorker SET on every feed update (50K/sec)
// Read: OddsService GET for client delivery (200K/sec)
// Read: BetService GET for bet validation (100K/sec)
SET "odds:match_123" '{"selections":[{"id":"home","odds":2.10}],"status":"active"}'
GET "odds:match_123"  // ~2ms — authoritative source of current odds

// TIER 2: DynamoDB Bet Ledger (durable, append-only)
// Write: BetService PUT on every bet (100K/sec peak)
// Read: SettlementWorker Query on event completion
// Read: Audit queries on user_id partition
dynamodb.put({
    user_id: "user_456",      // Partition key — per-user data locality
    bet_id: "bet_abc123",     // Sort key — chronological ordering
    event_id: "match_123",    // GSI — settlement queries
    odds: 2.10, stake: 50.00,
    status: "PLACED",
    created_at: "2026-05-27T15:30:00Z"
})

// TIER 3: Kafka Bet Events (streaming, 7-day retention)
// Write: BetService produce on every bet
// Read: SettlementWorker consume on event completion
kafka.produce("bet-events", key="match_123", value={
    bet_id: "bet_abc123", event_id: "match_123",
    user_id: "user_456", selection_id: "home",
    odds: 2.10, stake: 50.00
})
Key Design Decisions
WebSocket Streaming for Odds Delivery

Choice

Push-based odds updates via WebSocket instead of client polling

Rationale

WebSocket push eliminates the N-users polling multiplier entirely. Instead of N users each generating 0.33 QPS (one poll per 3 seconds), the server pushes updates only when odds actually change. At 100K connected users with 50K odds updates/sec, the server sends 50K messages total — not 33K polls per second. This reduces database read load by 99%+ and delivers odds to clients within 50ms of changes (vs 3 seconds of staleness with polling).

Redis as Authoritative Odds Cache

Choice

OddsWorker writes to Redis directly; services read from Redis, not the database

Rationale

Making Redis the authoritative source of current odds (rather than a cache in front of a database) eliminates cache invalidation complexity. OddsWorker writes to Redis within milliseconds of feed changes. OddsService and BetService read from Redis for all odds queries. The database is used only for the append-only bet ledger, not for odds reads. This separation allows odds reads to scale to 1M+ ops/sec on a Redis cluster while bet writes scale independently on DynamoDB.

Append-Only Bet Ledger (Event Sourcing)

Choice

DynamoDB with INSERT-only writes, no UPDATE on bet records

Rationale

Gambling regulations require a complete, immutable audit trail of every bet. An append-only ledger (never UPDATE, only INSERT) creates this trail inherently. Each bet lifecycle event (placed, settled, voided) is a separate record referencing the original bet_id. If a settlement bug is discovered, the events can be replayed to reconstruct correct state. This is the industry standard — DraftKings and FanDuel both use append-only ledgers for regulatory compliance.

Kafka for Bet Event Streaming

Choice

Kafka with event_id partitioning for settlement parallelism

Rationale

BetStream decouples bet acceptance from settlement processing. Bets are accepted in real time (sub-200ms), but settlement happens asynchronously when events complete. Kafka's partition-by-event_id ensures all bets for one sporting event land on the same partition, enabling the settlement worker to process them as a batch without distributed coordination. Kafka's durability and replay capability also serve as a safety net — if a settlement worker crashes, events are reprocessed from the last committed offset.

Separate OddsService and BetService

Choice

Independent microservices for reads and writes

Rationale

OddsService is read-heavy (200K/sec, simple Redis GET). BetService is write-heavy (100K/sec, complex validation + DB write + Kafka publish). Separating them allows independent scaling: 15 odds pods for reads vs 20 bet pods for writes. Different failure modes too — odds reads can serve stale-cached data during Redis issues; bet writes must be strongly consistent.

Batch Settlement per Sporting Event

Choice

Process all bets for one event as a batch, sharded by event_id

Rationale

A single Premier League match may have 500K open bets. Processing them individually would take hours. Batch settlement reads all bets for an event, computes outcomes in memory, and writes results in bulk — 500K bets settle in ~30 seconds. Sharding by event_id means multiple events settle in parallel (Sunday NFL with 15 games settles 15 batches concurrently).

Scale & Performance

Target RPS

350K peak (100K bets/sec + 200K odds reads + 50K misc)

Latency (p99)

<100ms odds delivery, <200ms bet acceptance p99

Storage

~500 GB/year (append-only ledger + Kafka retention)

Availability

99.9% (multi-AZ, Kafka replication, Redis cluster)

Time & Space Complexity
OperationTimeSpaceNotes
Odds delivery (GET /odds via Redis)O(1) — single Redis GET by market_idO(1) — constant response size per market200K reads/sec on 6-node Redis cluster. No database involved — eliminated the polling bottleneck entirely.
Bet acceptance (POST /bets)O(1) — Redis GET (odds check) + DynamoDB INSERT + Kafka produceO(1) — one new ledger record per betTotal ~50ms: 2ms Redis + 30ms DynamoDB + 5ms Kafka + 10ms CPU processing. 100K bets/sec with 20 pods.
Settlement (batch per event)O(B) — where B is open bets for the eventO(B) — loads all bets for the event into memory500K bets settles in ~30 seconds. Sharded by event_id — 15 NFL games settle in parallel with 20 workers.
Database Schema (HLD)
bets (DynamoDB)

Append-only bet ledger — every bet placement, settlement, and void is an immutable record. Partition key: user_id; sort key: bet_id (snowflake). Never UPDATE — only INSERT. This creates the regulatory audit trail required by gambling commissions. At 100K bets/sec peak, DynamoDB on-demand scales automatically. Secondary index on event_id supports settlement queries.

bet_id TEXT (sort key)user_id TEXT (partition key)event_id TEXT (GSI)market_id TEXTselection_id TEXTodds DECIMALstake DECIMALstatus TEXT (PLACED/WON/LOST/SETTLED/VOID)created_at TEXT (ISO 8601)

Partition: user_id

Indexes: PK on (user_id, bet_id), GSI on (event_id, bet_id) for settlement queries

Append-only: no UPDATEs or DELETEs ever. Settlement creates a new record (status=SETTLED) referencing the original bet_id. At 100K bets/sec peak, DynamoDB auto-scales — no capacity planning needed.

odds_cache (Redis)

In-memory cache of current odds for all active markets. Written by OddsWorker directly from external feed. Read by OddsService (for client delivery) and BetService (for bet validation). This is the authoritative source of current odds — not a cache of a database, but the primary read store. 100K markets x ~500 bytes = ~50MB odds data. 6-node cluster for throughput.

key: odds:{market_id}selections: ARRAY (selection IDs + current odds)status: TEXT (active/suspended)updated_at: TEXT (ISO 8601)

TTL: 60 seconds (auto-expire stale markets). Write rate: 50K updates/sec from OddsWorker. Read rate: 200K reads/sec from OddsService + BetService. 6 Redis nodes handle this comfortably.

odds_snapshots (Kafka topic: odds-updates)

Durable stream of odds update events from external feeds. Partitioned by market_id (64 partitions) for ordering within a market. OddsWorker publishes; OddsService may consume for WebSocket push. 7-day retention for replay capability. Serves as the audit trail for odds changes — regulators can verify what odds were available at any point in time.

key: market_idselections: ARRAYstatus: TEXTfeed_timestamp: TEXTprovider: TEXT (Sportradar/Betgenius)

Partition: market_id

50K updates/sec from external feeds. 64 partitions provide ~780 events/sec per partition — well within Kafka's per-partition throughput limits.

Event Contracts
Odds Updateodds-updates

Published by OddsWorker when external odds feeds deliver new prices. Consumed by OddsService for WebSocket push and OddsCache for cache updates. Partitioned by market_id for ordering within a market.

Key Schema

market_id (string)

Value Schema

{ market_id: string, selections: Array<{id: string, odds: number}>, status: "active"|"suspended", updated_at: string }

Bet Placedbet-events

Published by BetService when a bet is successfully accepted. Consumed by SettlementWorker for batch settlement on event completion. Partitioned by event_id for settlement parallelism.

Key Schema

event_id (string)

Value Schema

{ bet_id: string, event_id: string, user_id: string, selection_id: string, odds: number, stake: number }

Bet Settledbet-events

Published by SettlementWorker when a bet outcome is determined and payout calculated. Consumed by notification services and analytics. Same topic as bet-placed for unified bet lifecycle stream.

Key Schema

event_id (string)

Value Schema

{ bet_id: string, event_id: string, user_id: string, outcome: "WON"|"LOST"|"VOID", payout: number, settled_at: string }

What-If Scenarios

Odds change mid-bet (50-500ms staleness window)

Impact

Bet may be accepted at slightly stale odds. Financial risk is bounded by the staleness window duration and odds movement velocity. Typical impact: <0.1% of bets affected, with <5% price difference.

Mitigation

Odds tolerance threshold (e.g., accept bets within 5% of current odds). OddsWorker writes to Redis within milliseconds of feed changes, minimizing the window. Production sportsbooks accept this trade-off for better UX (fewer rejections).

Double settlement (SettlementWorker processes same event twice)

Impact

Users receive double payouts. Financial loss proportional to total payouts for the event. A major Premier League match could involve millions in double-paid winnings.

Mitigation

Idempotent settlement using event sequence numbers. INSERT ... ON CONFLICT (event_sequence_number) DO NOTHING. SettlementWorker checks if a settlement record already exists before processing. Kafka consumer offset management prevents reprocessing in normal operation.

Network partition during live match (Kafka becomes unavailable)

Impact

BetService cannot publish bet events to Kafka. Bets can still be written to DynamoDB ledger (bet acceptance continues), but settlement will be delayed because SettlementWorker has no events to consume.

Mitigation

BetService implements a local buffer for Kafka-unavailable periods. Events are flushed to Kafka when connectivity resumes. SettlementWorker can also query DynamoDB directly (fallback path) for bets that missed the Kafka stream. DynamoDB ledger is the source of truth, not Kafka.

Regulatory audit request (all bets for a user over 6 months)

Impact

DynamoDB query on user_id partition key returns all bets in chronological order. No impact on live traffic because the query runs on the user's partition, not a shared resource.

Mitigation

DynamoDB's partition-by-user_id design supports this query natively. For cross-user aggregate queries (e.g., all bets on a specific market), the event_id GSI is used. For complex analytics, events are streamed from Kafka to a data warehouse (not modeled in this template).

Failure Modes & Resilience
ComponentFailureImpactMitigation
Redis OddsCacheCluster node failureOdds delivery degrades for markets hosted on the failed shard. BetService cannot validate odds for those markets — bets on affected markets are rejected.Redis cluster with 6 nodes and automatic failover (sentinel). Failed node's slots are redistributed to remaining nodes within seconds. OddsWorker detects the failover and re-publishes current odds to the new node.
Kafka (BetStream)Broker failureBet events cannot be published. BetService must decide: accept bet (write to DynamoDB but no Kafka event) or reject bet (fail-safe). In production, accepting and buffering locally is preferred.3-broker MSK cluster with replication factor 3. Single broker failure has no impact — partitions are replicated. Multi-broker failure is handled by local buffering in BetService with async flush when Kafka recovers.
DynamoDB LedgerDBThrottling (exceeded provisioned capacity)Bet writes are rejected with ProvisionedThroughputExceededException. Bets fail even though odds validation passed. Users experience intermittent bet placement failures.DynamoDB On-Demand mode (used in this template) auto-scales with no provisioned capacity limits. If the account-level DynamoDB limit is reached, contact AWS support to increase limits proactively before major sporting events.
OddsWorkerExternal feed disconnectionOdds stop updating in Redis cache. OddsCache serves increasingly stale odds. Bets are accepted at stale prices, creating financial risk.OddsWorker implements heartbeat monitoring on the feed connection. If no updates arrive for 5 seconds, the worker sets affected markets to 'suspended' in OddsCache, preventing bets on stale odds. Alert triggers for feed disconnection.
Scaling Strategy

Horizontal scaling at every layer: (1) OddsService pods scale on WebSocket connection count — 1 pod per 10K connections, auto-scale at 8K. (2) BetService pods scale on CPU utilization — auto-scale at 70%, scale-down at 30%. (3) Kafka partitions scale manually by adding partitions (requires consumer rebalance). (4) SettlementWorker scales on Kafka consumer lag — add workers when lag exceeds 50K events. (5) Redis OddsCache scales by adding nodes to the cluster (resharding). (6) DynamoDB auto-scales via on-demand mode. Pre-scaling for major events: 2 hours before Super Bowl or Champions League final, manually scale BetService to 2x normal pods and SettlementWorker to 3x.

Monitoring & Alerting

Key metrics: (1) Odds freshness — time between feed update and Redis cache write; alert if p99 > 500ms. (2) Bet acceptance latency p99 — should be < 200ms; alert at 300ms. (3) Kafka consumer lag on BetStream — alert if lag exceeds 10,000 events (settlement falling behind). (4) Redis OddsCache hit rate — should be > 99%; drops indicate cache node failures. (5) Bet rejection rate by reason — ODDS_CHANGED should be < 2% (vs 10-20% in naive); alert at 5%. (6) DynamoDB write latency — alert if p99 > 100ms (throttling indicator). (7) Settlement duration per event — alert if > 5 minutes. Dashboard: Grafana with panels for live odds update rate, bet acceptance throughput, Kafka consumer lag, Redis cluster health, and settlement progress. SLIs: odds freshness p99 < 500ms, bet acceptance p99 < 200ms, settlement completion within 5 minutes.

Cost Analysis

At 100K concurrent users: OddsCache Redis 6-node cluster (~$900/month), DynamoDB on-demand for bet ledger (~$800/month at 100K writes/day), MSK Kafka 3-broker cluster (~$600/month), ECS Fargate — OddsService 15 pods + BetService 20 pods + workers (~$1,200/month), API Gateway + ALB (~$100/month). Total: ~$3,600/month. This is 5x the naive approach's cost but handles 100x the concurrent users — the per-user cost drops from $0.73/user to $0.036/user. DynamoDB on-demand pricing scales linearly with bet volume; Kafka costs are fixed regardless of throughput (until partition limits are reached).

Security Considerations

Authentication: JWT tokens validated at API Gateway (~3ms). Rate limiting: per-user bet rate limiting at API Gateway (max 20 bets/minute). Anti-fraud: OddsWorker monitors for suspicious betting patterns (large bets placed milliseconds before odds change — possible insider trading). Idempotency keys prevent duplicate bet submission from network retries. Data encryption: TLS 1.3 for all inter-service communication; DynamoDB encryption at rest (AES-256). Responsible gambling: deposit limits enforced via balance check in OddsCache; self-exclusion list checked during authentication. PII handling: user_id is a pseudonymous UUID; real identity stored separately in a KYC database (not modeled in this template).

Deployment Strategy

Blue/green deployment for OddsService and BetService — traffic is shifted from blue to green at the ALB level after health checks pass. Kafka consumers (SettlementWorker, OddsWorker) use rolling restarts — one consumer per partition ensures no processing gap. DynamoDB schema changes are backward-compatible (additive fields only). Redis OddsCache is not deployed — it is a shared infrastructure component that persists across deployments. Rollback: flip ALB target group back to blue within 30 seconds if green shows elevated error rates.

Real-World Examples
  • DraftKings uses a microservices architecture with Kafka-based event streaming and an append-only bet ledger for their real-time sportsbook platform
  • FanDuel's sportsbook platform uses WebSocket streaming for live odds delivery and event-sourced bet recording for regulatory compliance across multiple US states
  • Bet365 processes millions of bets per day using a distributed architecture with in-memory odds caching and asynchronous settlement, similar to this template's design
Solution Comparison
VariantTierLatencyThroughputCostComplexityReliability
V0: Naive (Polling + Monolith)T150-200ms odds, 100-300ms bet~2K RPS total$730/monthLow99% (single DB)
V1: Event-Sourced Ledger (WebSocket + Kafka)T2<100ms odds, <200ms bet350K RPS peak$3,500/monthMedium99.9% (multi-AZ)
V3: Global Compliant (Event-Sourced + Jurisdiction Router)T4<100ms odds, <250ms bet350K RPS peak$8,000/monthVery High99.99% (multi-region)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions
Why WebSocket instead of Server-Sent Events (SSE) for odds streaming?

WebSocket is preferred over SSE for live betting because it supports bidirectional communication — the same connection can deliver odds updates (server to client) and accept bet placement commands (client to server). SSE is unidirectional (server to client only), requiring a separate HTTP connection for bet placement. In practice, most production sportsbooks use WebSocket for the odds stream and REST/HTTP for bet placement, but the shared connection reduces connection count and simplifies client-side code.

What happens during the odds staleness window (50-500ms between feed and cache)?

During this window, the Redis cache contains odds that are slightly behind the external feed. Bets placed during this window are validated against the cached (slightly stale) odds. If the cache odds have not changed, the bet is accepted at the cached price. Production sportsbooks add an odds tolerance threshold — accept bets within 5% of current odds. This reduces rejection rates during the staleness window while limiting financial risk from large price movements.

How does the append-only ledger handle settlement disputes?

Because every bet lifecycle event is an immutable record in DynamoDB, settlement disputes can be resolved by replaying the event history. The sequence is: bet_placed (original bet with odds at placement) -> outcome_determined (game result) -> bet_settled (payout calculation). If a user claims their bet was settled incorrectly, support can reconstruct the exact sequence: the odds at placement, the game outcome, and the settlement calculation. No records were modified — the history is tamper-proof.

How does the system handle market suspension (VAR review)?

When a market is suspended (e.g., VAR review in football), OddsWorker sets a suspended flag in OddsCache (Redis). BetService checks this flag before accepting any bet — suspended markets reject immediately with a clear error. The flag is pushed to connected clients via WebSocket within milliseconds, so users see the market as suspended instantly (vs up to 3 seconds with polling). The flag is lifted when the odds feed resumes.

Why DynamoDB instead of PostgreSQL for the bet ledger?

DynamoDB's on-demand mode handles 100K writes/sec without capacity planning. PostgreSQL would require careful tuning (connection pooling, partitioning, vacuuming) at this write volume. DynamoDB's append-only pattern (partition key: user_id, sort key: bet_id) naturally supports the regulatory audit pattern — read all bets for a user in chronological order. The trade-off is query flexibility: DynamoDB requires careful key design, while PostgreSQL supports arbitrary SQL queries. Settlement queries use a secondary index on event_id.

Related Templates

Discussion

Sign in to join the discussion.

Ready to design your own Live Sports Betting?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator