Production-grade payment platform: multi-region active-active, PCI-compliant card vault with HSM-backed tokenization, ML-based fraud scoring, double-entry accounting ledger, distributed saga orchestration, and webhook delivery with HMAC signatures. Handles 200K+ TPS across regions.
The multi-region distributed payment system is the production-grade architecture used by platforms processing billions of dollars in daily transaction volume — Stripe, Adyen, PayPal, and Square. It solves three critical limitations of the single-region saga approach: single-region failure vulnerability, absence of fraud detection, and lack of PCI-compliant card data isolation.
The most impactful addition is the FraudService, which scores every charge using a gradient-boosted ML model trained daily on historical fraud data. The model evaluates velocity features (how many charges on this card in the last hour), geo anomaly features (is the card being used in a different country than its billing address), amount deviation features (is this charge significantly larger than the cardholder's typical spend), and device fingerprint features (has this device been associated with previous fraud). Each scoring takes 20-50ms — an acceptable trade-off when it blocks 2-5% of charges as fraudulent. At $50 average charge and 100K TPS, that prevents $500K to $1.25M in fraud per day. The ROI on 30ms of latency is extraordinary.
PCI-DSS Level 1 compliance requires that any system storing or transmitting raw card numbers undergoes a full compliance audit — expensive, time-consuming, and scope-expanding. The CardVault solves this by isolating all card data in a dedicated service with HSM-backed encryption. Raw card PANs enter CardVault and never leave; all other services handle only non-reversible tokens. This means only CardVault (one small service) is in PCI scope, while the entire rest of the payment platform operates outside PCI scope using tokens. This architectural decision saves millions in annual compliance costs.
The double-entry accounting ledger ensures financial integrity. Instead of recording a single 'charge' entry, every financial movement creates two entries: a debit from the customer's account and a credit to the merchant's account. The sum of all entries in the ledger is always zero — any imbalance is a critical alert indicating a bug or fraud. This is not optional for regulated financial services; it is a legal requirement in most jurisdictions. The trade-off is 2x write amplification: 100K charges per second becomes 200K ledger entries per second.
Multi-region active-active deployment ensures the system survives a full region failure. Traffic is routed to the nearest healthy region by the API Gateway. LedgerDB uses cross-region async replication with conflict resolution. The idempotency layer (Redis) prevents duplicate charges even when the same request is routed to different regions during a failover. The cost is operational complexity: 12 components across 2 regions with distributed saga coordination, ML model deployment, and cross-region data consistency.
This is the architecture senior candidates are expected to propose in system design interviews at payment companies. The progression from naive (synchronous, no safety) to saga (async, idempotent) to distributed (multi-region, fraud-aware, PCI-compliant) demonstrates the layered thinking that distinguishes senior engineers from mid-level candidates.
The distributed payment system uses 11 components organized into a synchronous saga path (MerchantClient, ApiGateway, MainLB, PaymentOrchestrator, CardVault, FraudService, LedgerDB) and an asynchronous processing pipeline (PaymentStream/Kafka, NetworkWorker, WebhookWorker, ReconciliationDB). The architecture is deployed active-active across us-east-1 and us-west-2.
The charge critical path begins at the ApiGateway, which authenticates the merchant API key, enforces rate limits (250K RPS), and routes to the nearest healthy region. MainLB distributes across PaymentOrchestrator pods. The orchestrator executes a four-step distributed saga. Step 1: send the raw card PAN to CardVault for tokenization. CardVault encrypts the PAN with AES-256-GCM using HSM-managed keys and returns a non-reversible token. Step 2: send charge details to FraudService for ML scoring. The gradient-boosted model returns a risk score (0-100) and decision (allow/block/review). Charges scoring above 80 are blocked immediately, and the saga compensates by releasing the card token. Step 3: write double-entry ledger entries to LedgerDB — DEBIT customer_account and CREDIT merchant_account in a single SQL transaction. Step 4: publish a charge_authorized event to PaymentStream (Kafka). The HTTP response returns with payment_id and status 'pending'.
The asynchronous pipeline processes the card network authorization. NetworkWorker consumes charge_authorized events, calls CardVault's detokenize endpoint to get the card BIN for network routing, and makes the external card network call (200-500ms). On success, it updates LedgerDB and publishes payment_result. WebhookWorker delivers events to merchant webhook URLs with HMAC-SHA256 signatures for authenticity verification. ReconciliationDB ingests daily card network settlement files and runs batch matching against LedgerDB records — flagging discrepancies (amount mismatches, chargebacks, missing charges) for manual review.
LedgerDB is sharded by merchant_id across 128 partitions with 3 synchronous replicas per shard. The double-entry journal (ledger_entries table) is the authoritative financial record — every debit has a corresponding credit, and the sum of all entries is always zero. Cross-region async replication enables read queries from both regions, while writes are routed to the primary region for the shard. Conflict resolution uses idempotency keys for charge deduplication and last-writer-wins for status updates.
CardVault is a deliberately small, tightly scoped service. It handles only two operations: tokenize (raw PAN in, token out) and detokenize (token in, BIN and network out — never the full PAN). It runs in an isolated VPC with no internet access, mTLS on all connections, and HSM-backed key management. This isolation means only CardVault undergoes PCI-DSS audit, saving the rest of the platform from PCI scope.
FraudService runs on 10 pods with 100 threads each, processing 100K+ scorings per second. The ML model (gradient-boosted trees, not deep learning — inference runs on CPU without GPU) is retrained daily on a labeled dataset of confirmed fraud cases. Velocity features are computed from an in-memory sliding window; geo features use a Redis-backed IP geolocation cache. The 30ms average scoring time is the primary latency addition compared to the Saga variant.
Choice
Dedicated service with HSM-backed encryption for all card data
Rationale
PCI-DSS Level 1 audits cost $100K-$500K per service annually. By isolating card data in CardVault, only one service is in PCI scope — saving millions in compliance costs. All other services handle only tokens, which are not considered cardholder data under PCI-DSS. This is exactly how Stripe, Adyen, and Square architect their card data isolation.
Choice
FraudService scores every charge with a gradient-boosted model
Rationale
Adding 30ms to the charge path prevents $500K-$1.25M in daily fraud at scale. The model evaluates velocity, geo anomalies, amount deviation, and device fingerprint. Charges scoring above threshold 80 are blocked immediately. The fraud savings exceed the infrastructure and latency cost by 1,000x or more.
Choice
Every financial movement creates debit + credit entries summing to zero
Rationale
Single-entry ledgers cannot prove the books balance. Double-entry guarantees every debit has a corresponding credit. A non-zero balance is a critical alert. This is a regulatory requirement for financial services and the gold standard for auditable record-keeping. The trade-off is 2x write amplification (200K entries/sec at 100K charges/sec).
Choice
Active-active across us-east-1 and us-west-2
Rationale
A single-region failure stops all payments — merchants lose revenue every second. Active-active ensures that traffic routes to the surviving region during a failure. LedgerDB uses cross-region async replication; idempotency keys prevent duplicate charges during failover. The operational cost is high but the availability improvement from 99.9% to 99.99% is critical for payment infrastructure.
Choice
Tokenize, fraud score, ledger write, Kafka publish — each with compensation
Rationale
The four-step saga ensures that partial failures are always cleaned up. If fraud scoring blocks a charge, the token is released. If the ledger write fails, the token is released and logged. If the Kafka publish fails, a reconciliation job detects the orphaned ledger entry. Two-phase commit would require all four participants to be available simultaneously — too brittle for a payment system.
Choice
Read-replica augmented with settlement data, isolated from production path
Rationale
Settlement reconciliation involves joining millions of settlement records with charge records — heavy batch queries that would impact production charge latency if run on LedgerDB. ReconciliationDB is a dedicated read-replica augmented with card network settlement file data, keeping batch processing completely isolated from the critical charge path.
Target RPS
200K+ charges/sec (cross-region)
Latency (p99)
<300ms charge API (tokenize + fraud + ledger + publish)
Storage
~100 TB/year (sharded ledger + Kafka + settlement data)
Availability
99.99% (multi-region active-active)
| Operation | Time | Space | Notes |
|---|---|---|---|
| Charge saga (POST /api/v1/payments/charge) | O(1) tokenize + O(1) fraud score + O(1) ledger INSERT x 2 + O(1) Kafka publish | O(1) per payment + O(2) ledger entries | Total wall time ~100ms: tokenize (5ms) + fraud (30ms) + ledger (40ms) + Kafka (5ms) + overhead (20ms). Card network is async. |
| Fraud ML scoring (inline) | O(F) feature computation + O(D) tree inference, where F = features, D = tree depth | O(1) per scoring request | F ~ 50 features, D ~ 10 tree depth. Feature computation dominates (velocity lookups). Total ~30ms. 10 pods handle 100K+ scorings/sec. |
| Settlement reconciliation (daily batch) | O(N log N) sort-merge join on network_ref | O(N) settlement file buffer | N = daily charge count (~8.6B at 100K TPS). Run on ReconciliationDB, fully isolated from production. Completes in 2-4 hours. |
| Card tokenization (inline) | O(1) HSM encryption + O(1) vault DB INSERT | O(1) per card token | HSM encryption is constant time (~2ms). Idempotent: same PAN always returns same token. 8 pods handle 100K+ tokenizations/sec. |
Double-entry accounting journal. Every financial movement creates two entries (debit + credit) summing to zero. Sharded across 128 partitions by merchant_id. Write-heavy: 200K entries/sec at peak (100K charges x 2). Retained for 7 years per financial regulations.
Indexes: idx_ledger_payment ON (payment_id), idx_ledger_account ON (account_id, created_at DESC)
128 shards x 3 replicas. Balance check: SUM(CASE WHEN entry_type='debit' THEN amount ELSE -amount END) = 0 always.
Charge metadata including fraud score, card token, and saga state. Status lifecycle: pending -> fraud_blocked | succeeded | failed | refunded.
Indexes: idx_payments_merchant ON (merchant_id, created_at DESC), idx_payments_status ON (status) WHERE status = 'pending'
Same shard as ledger_entries for transactional consistency. Fraud score stored for audit trail.
Card network settlement records ingested daily (T+2). Matched against LedgerDB payments by network_ref. 99.95% auto-match rate; 0.05% flagged for manual review.
Indexes: idx_settlements_ref ON (network_ref), idx_settlements_status ON (status) WHERE status != 'matched'
Separate from LedgerDB to isolate batch reconciliation queries from production charge path.
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| Naive (Single Service + SQL) | T1 | 350ms-3s charges | ~450 RPS | $300/month (single DB + 3 pods) | Low — no cache, no workers, no Kafka | 99% (single DB, no failover) |
| Idempotent Ledger + Saga (Kafka) | T2 | <200ms charges (async card network) | 100K+ RPS | $5,000/month (Kafka + Redis + sharded DB) | Medium — Kafka, Redis, 3 worker types | 99.9% (replicated, saga compensation) |
| Multi-Region Distributed (Active-Active) | T3 | <300ms charges (fraud + tokenization) | 200K+ RPS | $20,000/month (multi-region, 12 components) | High — card vault, fraud ML, double-entry | 99.99% (multi-region, active-active) |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
PCI-DSS Level 1 compliance requires annual on-site audits by a Qualified Security Assessor, costing $100K-$500K per service in scope. Every service that stores, processes, or transmits raw card numbers must be audited. By isolating all card data in CardVault, only one service undergoes the full PCI audit. The rest of the platform handles tokens (not cardholder data) and operates outside PCI scope. This architectural decision can save millions in annual compliance costs and dramatically reduces the security surface area.
The model uses gradient-boosted decision trees (XGBoost/LightGBM), not deep learning. Tree-based inference runs on CPU in microseconds per prediction — the 30ms latency comes from feature computation (velocity lookups from in-memory sliding windows, geo checks against Redis). The model is retrained daily on confirmed fraud labels. At 100K+ scorings per second across 10 pods, each pod handles ~10K inferences/sec — well within CPU capacity for tree models.
The ApiGateway detects the region failure via health checks and routes all traffic to the surviving region within 30-60 seconds. The surviving region has full read-write capability (active-active). LedgerDB cross-region replication may have a few seconds of lag, but the idempotency layer (Redis, also cross-region) prevents duplicate charges. Any charges in-flight during the failover are either completed by the surviving region's workers or retried by merchants (idempotency prevents duplicates). Settlement reconciliation catches any edge cases.
Single-entry accounting records events (a charge happened for $100) but cannot prove the books balance. Double-entry records movements: DEBIT customer $100, CREDIT merchant $100. The sum of all entries is always zero. If it is not zero, something is wrong — a missed refund, an incorrect amount, or fraud. This self-checking property is why every regulated financial institution uses double-entry accounting. It is a legal requirement, not an optimization.
Two-phase commit (2PC) requires all participants to vote 'yes' during the prepare phase while holding locks. If any participant is slow or unavailable, the entire transaction blocks. In a payment saga spanning CardVault, FraudService, LedgerDB, and Kafka, a slow fraud scoring would block all concurrent charges. The saga pattern allows each step to complete independently — if fraud scoring blocks the charge, the saga compensates by releasing the token. No locks are held across services, and each step is independently retryable.
This template closely mirrors Stripe's publicly documented architecture: HSM-backed card vault for PCI isolation, ML-based fraud scoring (Stripe Radar), idempotency keys before processing, async card network routing, webhook delivery with signatures, and multi-region deployment. The main simplifications are: one ML model instead of Stripe's ensemble of models, simplified settlement reconciliation (Stripe's is significantly more complex with partner banks), and no treasury/banking integration.
Sign in to join the discussion.
Ready to design your own Payment System?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator