Hard4 componentsInterview: Very High

Payment System — Naive (Single Service + SQL Transactions)

Q: Why is payment system design asked so frequently in interviews?

Payment systems combine financial correctness (at-most-once charging, double-entry accounting), distributed systems challenges (idempotency, saga compensation, eventual consistency), security requirements (PCI compliance, fraud detection), and scaling constraints (async processing, sharding). Companies like Stripe, PayPal, Square, Adyen, Amazon, and Shopify ask it because it maps directly to their core business. Interviewers expect candidates to start with the naive synchronous approach, identify the blocking bottleneck and idempotency gap, and propose async processing with idempotency keys as the first optimization.

Q: Why does the synchronous card network call cap throughput at ~450 RPS?

Each card network call takes 200-500ms (300ms average). During this time, the request thread is blocked — it cannot serve other requests. With 3 pods running 50 threads each (150 threads total), maximum throughput is 150 threads / 300ms = 500 RPS theoretical, ~450 RPS practical. The Saga variant makes card network calls asynchronous via Kafka, so the charge API returns in ~50ms after a ledger write and Kafka publish — the thread is freed immediately. This increases throughput to 100K+ RPS with the same number of threads.

Q: How do double charges happen without idempotency?

A merchant sends a charge request. The payment system processes it successfully and the card is charged. But the HTTP response is lost due to a network timeout between the payment API and the merchant. The merchant's retry logic sends the same charge request again. Without idempotency keys, the system has no way to know this is a retry — it processes the request as a new charge, resulting in a double charge. The Saga variant stores an idempotency key in Redis (SETNX) before any processing, so retries return the original result.

Q: What is the first optimization an interviewer expects?

Make the card network call asynchronous. Instead of blocking the request thread for 300ms, write the charge to the database with status 'pending', publish an event to Kafka, and return immediately. A background worker consumes the event and calls the card network. This drops charge API latency from 300ms to ~50ms and eliminates the thread pool bottleneck. The second optimization is idempotency keys to prevent double charges on retries.

Q: Why is the absence of fraud detection a critical flaw?

Card-testing attacks involve fraudsters submitting thousands of small charges using stolen card numbers to identify which cards are valid. Without fraud detection (velocity checks, geo anomaly detection, ML risk scoring), every card-testing charge succeeds. At scale, this exposes the system to millions in fraudulent charges per day. The Distributed variant adds ML-based fraud scoring that blocks 2-5% of charges, preventing $500K-$1.25M in daily fraud at 100K TPS.

Q: Why does this template use a single database instead of sharding?

At the naive tier (~450 RPS), a single PostgreSQL instance handles the load comfortably. Sharding adds operational complexity (routing, cross-shard queries, rebalancing) that is unnecessary at this scale. The Saga variant shards the ledger by merchant_id across 64 partitions when traffic reaches 100K TPS. The progression from single DB to sharded DB is a natural part of the scaling discussion interviewers expect.

The simplest payment system: one service handling charges synchronously. Calls the card network directly (200-500ms blocking), writes the result to PostgreSQL, returns. No idempotency, no fraud detection, no async processing. Demonstrates why synchronous card network calls cap throughput at ~450 RPS.

PaymentsBeginnerBottleneck AnalysisSynchronous

Try in Simulator

Problem Statement

Designing a payment processing system is one of the most critical system design interview questions because it forces candidates to reason about financial correctness, idempotency, and the fundamental trade-off between synchronous simplicity and asynchronous throughput. The naive approach is where every candidate should start — it establishes the baseline that makes the improvements in the saga and distributed variants measurable and concrete.

The core challenge is processing a payment charge: receive a request from a merchant, authorize the charge with the card network (Visa, Mastercard, Amex), record the result in a database, and return the outcome. In the naive approach, this entire flow is synchronous. When a charge request arrives, PaymentService calls the card network directly — an external HTTP call that takes 200-500ms — then writes the result to PostgreSQL in a single SQL transaction. The merchant receives the final result (succeeded or failed) in the HTTP response. No pending state, no async processing, no webhooks.

The synchronous card network call is the fatal bottleneck. Each request thread is blocked for 200-500ms waiting for the card network response. With 3 pods running 50 threads each (150 threads total), maximum throughput is approximately 150 threads divided by 300ms average latency, which equals roughly 500 requests per second. During Black Friday traffic spikes, this ceiling is hit within minutes. Thread pool exhaustion causes charge requests to queue, latency spikes to seconds, and merchants experience timeouts that trigger retries — which brings us to the second critical flaw.

Without idempotency keys, network timeouts between the merchant and the payment API cause duplicate charges. If the first charge request succeeded but the HTTP response was lost due to a network timeout, the merchant retries with an identical request. The naive system processes it as a new charge, creating a second charge for the same purchase. At a 0.1% timeout rate and 500 RPS, that is one double charge every two seconds — a catastrophic failure mode for a payment system where every cent must be accounted for exactly once.

The absence of fraud detection is equally dangerous. A production payment system blocks 2-5% of charges as fraudulent using velocity checks (too many charges per card per minute), geo anomaly detection (card used in two countries within an hour), and ML-based risk scoring. The naive approach accepts every charge indiscriminately, making it vulnerable to card-testing attacks where fraudsters systematically test thousands of stolen card numbers against the API. At $50 average charge, undetected fraud at scale costs millions per day.

This template makes these failure modes visible and quantifiable. Run the simulation at 1,000 RPS and watch thread pool utilization spike to 100% while charge latency climbs past 3 seconds. The comparison with the Saga variant — where async Kafka processing drops charge latency to 50ms and idempotency keys eliminate double charges — provides the concrete evidence for why production payment systems never use synchronous card network calls.

Architecture Overview

The naive payment system is a four-component linear architecture: MerchantClient, Load Balancer, PaymentService, and PaymentDB (PostgreSQL). There is no API Gateway, no cache, no message queue, no fraud service, and no separation between the charge processing and card network interaction.

All traffic enters through the Load Balancer (AWS ALB), which distributes requests across PaymentService pods using round-robin. The LB adds approximately 1.5ms of routing latency and supports up to 5,000 RPS — well above the system's actual ceiling of approximately 450 RPS imposed by the synchronous card network bottleneck. The LB is not the constraint; the card network call is.

PaymentService is a stateless REST API running on 3 pods with 50 threads each (150 threads total). It handles four operations: (1) charge — receive the request, call the card network synchronously (200-500ms), write the result to PaymentDB, return; (2) refund — look up the original charge, call the card network to reverse it (another 200-500ms blocking call), write the refund record; (3) status check — read from PaymentDB; (4) reporting — SQL GROUP BY queries against PaymentDB. The charge and refund paths are dominated by the synchronous card network call. Status checks and reports compete with charge writes for the same database connection pool.

PaymentDB is a single PostgreSQL primary with no read replicas and no sharding. It stores three tables: payments (charge records with payment_id, amount, status, card_network), refunds (reversal records linked to original payments), and audit_log (append-only event history). The connection pool (200 connections) is shared between charge writes and status reads. At peak load, charge writes consume the majority of connections, causing status checks and reports to queue. There is no partitioning — a single B-tree index on payment_id handles all lookups.

The system has zero redundancy at the data layer. If the PostgreSQL primary fails, both charges and status checks are unavailable. There is no cache to serve stale reads, no read replica for reporting queries, and no multi-AZ deployment for database failover. The entire system is a single-region, single-AZ deployment where any component failure causes total downtime.

The concrete scaling ceiling is approximately 450 concurrent charge requests per second. At 300ms average card network latency, each thread handles roughly 3 charges per second. With 150 threads (3 pods times 50 threads), maximum throughput is 450 RPS. Beyond this, the thread pool is exhausted, requests queue in the LB, and charge latency climbs from 350ms to multiple seconds. At 1,000 RPS, the system is fully saturated with a growing queue that eventually causes LB connection timeouts.

Architecture Preview

Loading architecture preview...

Open in Simulator

Key Design Decisions

Synchronous Card Network Calls

Choice

Call the card network inline during the HTTP request

Rationale

The simplest possible approach: receive request, call external API, write result, return. No Kafka, no workers, no async state management. The cost is that each thread is blocked for 200-500ms per charge, capping throughput at ~450 RPS. The Saga variant moves card network calls to async Kafka workers, reducing charge API latency from 300ms to 50ms.

No Idempotency Keys

Choice

No deduplication mechanism for retry safety

Rationale

Without idempotency keys, the system has no way to distinguish a retry from a new charge. A network timeout between the merchant and the API causes the merchant to retry, creating a duplicate charge. The Saga variant adds Redis SETNX idempotency keys checked before any processing begins, guaranteeing at-most-once charging.

No Fraud Detection

Choice

Accept all charges without risk scoring

Rationale

A production payment system uses ML models to score every charge for fraud risk (velocity checks, geo anomalies, behavioral signals). The naive approach skips fraud entirely, making it vulnerable to card-testing attacks. The Distributed variant adds a dedicated FraudService with gradient-boosted ML scoring on the critical path.

Single PostgreSQL Database

Choice

One database for payments, refunds, and audit log

Rationale

A single PostgreSQL primary eliminates sharding complexity, replication lag, and consistency issues. ACID transactions guarantee that the charge and its audit entry are atomic. The cost is that charge writes, status reads, and report queries compete for the same connection pool and buffer cache — mutual degradation at load.

No Async Processing

Choice

No Kafka, no workers, no event-driven pipeline

Rationale

The entire charge lifecycle (validate, authorize, record, respond) happens within a single HTTP request-response cycle. This eliminates eventual consistency — the merchant knows the final charge status immediately. The trade-off is that the 200-500ms card network call blocks the thread, limiting throughput to ~3 charges/sec/thread.

Scale & Performance

Target RPS

~450 sustained (thread pool ceiling)

Latency (p99)

350ms-3s charge requests (dominated by card network call)

Storage

~2 GB/month at modest scale

Availability

~99% (single DB, no redundancy)

Time & Space Complexity

Operation	Time	Space	Notes
Charge (POST /api/v1/payments/charge)	O(1) card network call + O(1) DB INSERT	O(1) per payment record (~500 bytes)	Constant time but 200-500ms wall clock due to synchronous card network call. Thread is blocked the entire time.
Status check (GET /api/v1/payments/{id})	O(log N) index seek on payment_id	O(1) single row read	Fast in isolation (~10ms), but competes with charge writes for DB connections at load.
Refund (POST /api/v1/payments/{id}/refund)	O(log N) lookup + O(1) card network call + O(1) DB INSERT	O(1) per refund record	Same synchronous card network blocking as charges. At lower volume (10% of charges), less impactful.
Report (GET /api/v1/payments/report)	O(N) full scan with GROUP BY per merchant	O(M) aggregation buffer, M = distinct groups	Heavy query that competes with charge writes. No caching — every report hits the database.

Database Schema (HLD)

payments

Stores all payment charge records. Written synchronously after the card network responds — status is always final (succeeded or failed), never pending. The hottest table: INSERTs on every charge and SELECTs on every status check compete for the same connection pool.

payment_id VARCHAR PK (UUID)merchant_id VARCHAR (indexed for reporting)amount INTEGER (cents)currency VARCHAR (ISO 4217)status VARCHAR (succeeded|failed)card_network VARCHAR (visa|mastercard|amex)network_ref VARCHAR (card network reference ID)created_at TIMESTAMPTZ (indexed)

Indexes: idx_payments_merchant ON (merchant_id, created_at DESC)

No partitioning. At ~450 charge writes/sec, table grows ~1.5 GB/month with indexes.

refunds

Stores refund records linked to original payments. Written synchronously after the card network processes the reversal. Status is always final.

refund_id VARCHAR PK (UUID)payment_id VARCHAR FK (indexed)amount INTEGER (cents, may be partial)status VARCHAR (succeeded|failed)created_at TIMESTAMPTZ

Indexes: idx_refunds_payment ON (payment_id)

Refund volume is ~10% of charge volume. Small table relative to payments.

audit_log

Append-only log of payment events. Each charge and refund appends an entry. Read for dispute investigation and basic reporting.

event_id VARCHAR PK (UUID)payment_id VARCHAR (indexed)action VARCHAR (created|succeeded|failed|refunded)timestamp TIMESTAMPTZ

Indexes: idx_audit_payment ON (payment_id)

Write-only during normal operation. No retention policy in the naive approach.

Solution Comparison

Variant	Tier	Latency	Throughput	Cost	Complexity	Reliability
Naive (Single Service + SQL)	T1	350ms-3s charges	~450 RPS	$300/month (single DB + 3 pods)	Low — no cache, no workers, no Kafka	99% (single DB, no failover)
Idempotent Ledger + Saga (Kafka)	T2	<200ms charges (async card network)	100K+ RPS	$5,000/month (Kafka + Redis + sharded DB)	Medium — Kafka, Redis, 3 worker types	99.9% (replicated, saga compensation)
Multi-Region Distributed (Active-Active)	T3	<300ms charges (fraud + tokenization)	200K+ RPS	$20,000/month (multi-region, 12 components)	High — card vault, fraud ML, double-entry	99.99% (multi-region, active-active)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions

Why is payment system design asked so frequently in interviews?

Payment systems combine financial correctness (at-most-once charging, double-entry accounting), distributed systems challenges (idempotency, saga compensation, eventual consistency), security requirements (PCI compliance, fraud detection), and scaling constraints (async processing, sharding). Companies like Stripe, PayPal, Square, Adyen, Amazon, and Shopify ask it because it maps directly to their core business. Interviewers expect candidates to start with the naive synchronous approach, identify the blocking bottleneck and idempotency gap, and propose async processing with idempotency keys as the first optimization.

Why does the synchronous card network call cap throughput at ~450 RPS?

Each card network call takes 200-500ms (300ms average). During this time, the request thread is blocked — it cannot serve other requests. With 3 pods running 50 threads each (150 threads total), maximum throughput is 150 threads / 300ms = 500 RPS theoretical, ~450 RPS practical. The Saga variant makes card network calls asynchronous via Kafka, so the charge API returns in ~50ms after a ledger write and Kafka publish — the thread is freed immediately. This increases throughput to 100K+ RPS with the same number of threads.

How do double charges happen without idempotency?

A merchant sends a charge request. The payment system processes it successfully and the card is charged. But the HTTP response is lost due to a network timeout between the payment API and the merchant. The merchant's retry logic sends the same charge request again. Without idempotency keys, the system has no way to know this is a retry — it processes the request as a new charge, resulting in a double charge. The Saga variant stores an idempotency key in Redis (SETNX) before any processing, so retries return the original result.

What is the first optimization an interviewer expects?

Make the card network call asynchronous. Instead of blocking the request thread for 300ms, write the charge to the database with status 'pending', publish an event to Kafka, and return immediately. A background worker consumes the event and calls the card network. This drops charge API latency from 300ms to ~50ms and eliminates the thread pool bottleneck. The second optimization is idempotency keys to prevent double charges on retries.

Why is the absence of fraud detection a critical flaw?

Card-testing attacks involve fraudsters submitting thousands of small charges using stolen card numbers to identify which cards are valid. Without fraud detection (velocity checks, geo anomaly detection, ML risk scoring), every card-testing charge succeeds. At scale, this exposes the system to millions in fraudulent charges per day. The Distributed variant adds ML-based fraud scoring that blocks 2-5% of charges, preventing $500K-$1.25M in daily fraud at 100K TPS.

Why does this template use a single database instead of sharding?

At the naive tier (~450 RPS), a single PostgreSQL instance handles the load comfortably. Sharding adds operational complexity (routing, cross-shard queries, rebalancing) that is unnecessary at this scale. The Saga variant shards the ledger by merchant_id across 64 partitions when traffic reaches 100K TPS. The progression from single DB to sharded DB is a natural part of the scaling discussion interviewers expect.

Related Templates

Payment System — Idempotent Ledger + Saga Payment System — Multi-Region Distributed

Discussion

Ready to design your own Payment System?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator