FAANG-scale ad click aggregation using dual processing paths: a real-time speed layer for dashboard freshness and a batch reconciliation layer for billing-grade accuracy. Handles 200K+ RPS peak.
The lambda architecture for ad click aggregation exists because of an irreconcilable tension in distributed systems: you cannot simultaneously optimize for real-time freshness and billing-grade accuracy. The stream processing variant provides near-real-time dashboards but accepts approximate counts — events near window boundaries may be miscounted, late arrivals may be dropped, and consumer rebalances create brief gaps. For campaign monitoring dashboards, these approximations are acceptable. For billing — where every click represents money — they are not.
The lambda architecture resolves this by running two parallel processing paths over the same data. The speed layer (identical to the stream processing variant) produces fast approximate counts for dashboards. The batch layer periodically reprocesses the complete raw event log to produce exact counts for billing. The serving layer merges results from both paths, using the speed layer for recent data and the batch layer for reconciled historical data.
This is the architecture used by the largest ad platforms in the world. Google's ad billing system uses a similar dual-path approach with Millwheel (speed) and Flume (batch). Meta's ad analytics pipeline uses streaming aggregation for real-time dashboards and periodic MapReduce jobs for billing reconciliation. The pattern appears at FAANG scale because the cost of billing errors at millions of dollars per day in ad spend makes the operational complexity of dual processing paths worthwhile.
The key design challenge is managing the divergence between the speed and batch layers. During the window between batch reconciliation runs (typically hourly), the speed layer's approximate counts and the batch layer's exact counts may differ by 0.1-2%. The serving layer must handle this gracefully — typically by showing the speed layer's count with a freshness indicator and updating to the batch layer's count when reconciliation completes. This reconciliation dance is a rich source of interview discussion: How do you handle the switchover? What if the batch layer finds more clicks than the speed layer? How do you communicate billing corrections to advertisers?
This template is the most complex variant (12 components) and targets the FAANG-level interview discussion. Candidates are expected to articulate the dual-path rationale, explain the reconciliation strategy, reason about the cost of operating two full processing pipelines, and identify when the added complexity is justified versus when the simpler stream processing approach is sufficient.
The lambda architecture ad click aggregator uses 12 components organized into three layers: an ingestion layer, parallel speed and batch processing layers, and a unified serving layer.
The ingestion layer is shared between both processing paths. Clicks arrive at an API Gateway (handling authentication, rate limiting at 200K RPS, and request validation), pass through a Load Balancer to the Click Ingestion Service, and are written to Kafka. Critically, the ingestion service also writes raw events to Object Storage (S3-compatible) in partitioned Parquet format — this is the batch layer's input. The dual write (Kafka + Object Storage) happens asynchronously: the Kafka produce is the primary path (ack in 3-5ms), and the Object Storage write is fire-and-forget with a local buffer for batching.
The speed layer consumes events from Kafka via a real-time stream processor (identical to the Stream Processing variant). It performs 1-minute tumbling window aggregation and writes results to a Real-time Database optimized for recent queries (last 1-6 hours). The speed layer prioritizes freshness over accuracy — it counts all events it sees within the window, accepts the small error from window boundary effects, and does not handle late arrivals beyond a 30-second grace period. Dashboard queries for recent data hit this layer.
The batch layer runs periodically (hourly by default). A Batch Processor reads the raw events from Object Storage, performs a full MapReduce-style aggregation over the complete hour's data, deduplicates using exact click_id sets (no Bloom filter approximation), and writes reconciled totals to a Billing Database (durable, ACID-compliant store). The batch layer has perfect accuracy because it processes every event exactly once from the immutable event log. The trade-off is latency — results are only available after the batch job completes (10-30 minutes after the hour ends).
The Query Service acts as the serving layer, merging results from both paths. For recent data (last 1-2 hours), it reads from the Real-time Database (speed layer output). For historical data (older than 2 hours), it reads from the Billing Database (batch layer output). For the overlap period (last 1-2 hours where both layers have data), it applies a reconciliation strategy: use the speed layer's count as the displayed value with a 'preliminary' flag, and switch to the batch layer's count when the reconciliation job completes. Redis provides a cache layer for the most frequently accessed campaign dashboards.
The 12-component architecture provides maximum resilience. If the speed layer fails, dashboards degrade to hourly-refresh (batch data only). If the batch layer fails, dashboards continue with approximate real-time data. If Kafka fails, the Object Storage writer continues (events buffered locally), ensuring the batch layer still receives all data for reconciliation. The system handles 200K+ RPS at peak with horizontal scaling at every layer: Kafka partitions, consumer instances, Object Storage throughput, and batch processor parallelism.
The primary trade-off is operational complexity. Twelve components means twelve things to monitor, scale, and debug. Two processing pipelines double the compute cost. The reconciliation logic adds application complexity. This cost is justified when billing accuracy is worth more than the infrastructure overhead — typically at $1M+ per day in ad spend.
The lambda architecture processes every click event through two independent pipelines simultaneously. The speed layer (Kafka → Stream Processor → Real-time DB) prioritizes freshness — dashboards update within 30-90 seconds. The batch layer (Object Storage → Batch Processor → Billing DB) prioritizes accuracy — exact deduplication and reconciled totals for billing.
The serving layer merges results from both paths: recent data comes from the speed layer (labeled 'preliminary'), while historical data comes from the batch layer (labeled 'reconciled'). The switchover happens transparently as batch jobs complete — the dashboard user sees the label change from 'preliminary' to 'reconciled' without any manual intervention.
Step-by-Step Walkthrough
Pseudocode
// INGESTION — Async dual write (Kafka + Object Storage)
async function handleClick(campaign_id, ad_id, user_id):
event = { campaign_id, ad_id, user_id, click_id: uuid(), event_time: now() }
// Primary path: Kafka produce (fast, durable)
await kafka.produce("ad-clicks", key=campaign_id, value=event) // ~3ms
// Secondary path: buffer for Object Storage (async, fire-and-forget)
parquet_buffer.append(event) // Local buffer, no network call
if parquet_buffer.size >= 10_000 or parquet_buffer.age >= 30s:
flush_to_s3(parquet_buffer) // Background thread, non-blocking
return 202 // ~8ms total (gateway + Kafka ack)
// SPEED LAYER — Real-time stream processing (identical to V1)
function onWindowClose(window_start, window_end, accumulated_events):
for campaign_id, events in group_by(accumulated_events, "campaign_id"):
await realtime_db.upsert("realtime_aggregates", {
campaign_id, window_start,
click_count: events.length,
unique_users: distinct(events, "user_id").size,
status: "preliminary"
})
// BATCH LAYER — Hourly reconciliation (exact dedup)
function reconcileHour(hour_bucket):
// Read all raw events for this hour from Object Storage
events = s3.readParquet("clicks/year=.../month=.../day=.../hour=...")
// Exact dedup — no Bloom filter, no approximation
seen_click_ids = new Set()
deduped_events = []
for event in events:
if event.click_id not in seen_click_ids:
seen_click_ids.add(event.click_id)
deduped_events.append(event)
// Write reconciled totals to Billing DB
for campaign_id, events in group_by(deduped_events, "campaign_id"):
await billing_db.upsert("billing_aggregates", {
campaign_id, hour_bucket,
exact_click_count: events.length,
deduped_count: original_count - events.length,
billable_amount: events.length * campaign.cpc_rate,
status: "reconciled"
})
// Signal that this hour is now reconciled
await meta_db.update("reconciliation_meta",
{ hour_bucket, batch_reconciled: true, reconciled_at: now() })
// SERVING LAYER — Merge results from both paths
async function getDashboard(campaign_id, time_range):
results = []
for hour in time_range:
meta = await meta_db.get("reconciliation_meta", hour)
if meta.batch_reconciled:
row = await billing_db.get(campaign_id, hour)
results.append({ ...row, freshness: "reconciled" })
else:
row = await realtime_db.get(campaign_id, hour)
results.append({ ...row, freshness: "preliminary" })
return resultsThe lambda architecture uses four storage tiers, each optimized for a different purpose. Object Storage is the immutable archive — the raw truth that can always be reprocessed. The Real-time DB serves dashboards with fast, approximate data. The Billing DB serves financial queries with exact, reconciled data. The reconciliation_meta table is the coordination point that tells the serving layer which source to trust for each time window.
This multi-tier design means that losing any single database is recoverable. If the Real-time DB crashes, dashboards degrade to hourly-refresh from the Billing DB. If the Billing DB crashes, it can be rebuilt from Object Storage by replaying the batch job. Object Storage itself is replicated across availability zones — it is the system's ultimate safety net.
Step-by-Step Walkthrough
Pseudocode
// TIER 1: Object Storage (immutable archive)
// Write pattern: append-only Parquet files, partitioned by time
s3.putObject("clicks/year=2025/month=01/day=15/hour=12/batch_001.parquet",
parquet_encode([
{ campaign_id, click_id, ad_id, event_time, schema_version: 2 },
... // ~10K events per file, flushed every 30 seconds
])
)
// Storage cost: ~$0.023/GB/month. At 50K clicks/sec: ~2 TB/year raw
// Parquet compression: 5-10x smaller than JSON
// TIER 2: Real-time DB (speed layer output, approximate)
// Write pattern: batch upsert every 60 seconds (1 row per campaign per window)
INSERT INTO realtime_aggregates (campaign_id, window_start, click_count, status)
VALUES ('camp_123', '2025-01-15 12:00', 847, 'preliminary')
ON CONFLICT (campaign_id, window_start)
DO UPDATE SET click_count = excluded.click_count;
// TIER 3: Billing DB (batch layer output, exact)
// Write pattern: batch upsert every hour (1 row per campaign per hour)
INSERT INTO billing_aggregates
(campaign_id, hour_bucket, exact_click_count, deduped_count, billable_amount, status)
VALUES ('camp_123', '2025-01-15 12:00', 50847, 312, 50847 * 0.03, 'reconciled');
// 312 duplicates removed out of 51159 raw events (0.6% dupe rate)
// TIER 4: Reconciliation meta (coordination)
UPDATE reconciliation_meta
SET batch_reconciled = true, reconciled_at = now()
WHERE hour_bucket = '2025-01-15 12:00';
// After this UPDATE, the Query Service reads from Billing DB for this hourChoice
Parallel speed layer (stream) + batch layer (MapReduce)
Rationale
The speed layer provides real-time dashboard freshness (30-90 second lag) but with approximate counts. The batch layer provides exact billing-grade counts but with hourly latency. Neither alone satisfies both requirements — real-time freshness for advertiser dashboards AND exact accuracy for billing. The dual-path approach is the standard FAANG solution, trading infrastructure cost for the ability to serve both use cases from a single data stream.
Choice
S3-compatible storage in partitioned Parquet format
Rationale
Object storage provides durable, immutable, and infinitely scalable archival of raw events. Parquet format enables efficient columnar reads for batch processing. Unlike Kafka (which has finite retention), Object Storage retains events indefinitely for audit, replay, and regulatory compliance. The batch processor reads directly from Object Storage without putting load on Kafka.
Choice
Full reprocessing every hour with exact deduplication
Rationale
Hourly batches balance accuracy latency with compute cost. More frequent batches (every 5 minutes) would reduce the reconciliation window but increase compute cost linearly. Less frequent batches (daily) would reduce cost but leave billing data stale for too long. The hour boundary aligns with typical advertiser reporting granularity and billing cycle expectations.
Choice
Time-series DB for speed layer, ACID DB for batch layer
Rationale
The speed layer writes frequently (every minute) with high throughput but tolerates approximate data — a time-series database optimized for recent writes and time-range queries is ideal. The batch layer writes infrequently (hourly) but requires ACID guarantees for billing accuracy — a traditional relational or strongly-consistent database is appropriate. Merging both into one database would force trade-offs in either direction.
Choice
Dedicated API Gateway with rate limiting and auth
Rationale
At 200K+ RPS, the ingestion endpoint is an attractive target for DDoS and click fraud. An API Gateway provides rate limiting per API key, request validation, and authentication without burdening the ingestion service. The 3ms overhead is negligible compared to the protection it provides. The Gateway also enables A/B testing of ingestion logic by routing traffic percentages to different service versions.
Target RPS
200K+ peak (FAANG-scale)
Latency (p99)
30-90s dashboard, hourly billing reconciliation
Storage
~2 TB/year (raw events + aggregates)
Availability
99.99% (dual-path redundancy)
| Operation | Time | Space | Notes |
|---|---|---|---|
| Click ingestion (Kafka + S3 dual write) | O(1) — one Kafka produce (~3ms) + async S3 buffer append | O(1) per event — one Kafka message + one Parquet row (buffered locally) | The S3 write is amortized: events are buffered locally and flushed in 10K-event batches every 30 seconds. Per-event cost is negligible. |
| Speed layer aggregation (tumbling window) | O(1) amortized per event — hash map increment; O(C) per window flush | O(C × U) — campaigns × unique users in the current window | Identical to V1 stream processing. Window flush produces one row per campaign per minute. |
| Batch reconciliation (hourly MapReduce) | O(N log N) — sort + group by campaign_id over N events in the hour; dedup via hash set | O(N) — must hold all unique click_ids in memory for exact deduplication | At 200K RPS, one hour = ~720M events. Full dedup set requires ~23 GB memory. Parallelized across campaign_id partitions to reduce per-worker memory. |
| Dashboard query (serving layer merge) | O(1) for cache hit; O(log N) for DB read + O(1) reconciliation_meta check | O(W) — where W is the number of time windows in the query range | The serving layer checks reconciliation_meta to decide which database to query — adds ~2ms overhead but ensures correct freshness labels. |
Immutable archive of every raw click event, stored as columnar Parquet files partitioned by year/month/day/hour. This is the batch layer's input — the source of truth for billing reconciliation. Parquet format enables efficient columnar reads (only scan columns needed for aggregation). Unlike Kafka (finite retention), Object Storage retains events indefinitely for audit, replay, and regulatory compliance. Write path: the ingestion service buffers events locally and flushes Parquet files every 30 seconds.
Partition: year/month/day/hour (Hive-style partitioning)
Storage cost: ~$0.023/GB/month (S3 Standard). At 50K clicks/sec, ~2 TB/year of raw events. Parquet compression reduces this by 5-10x vs JSON.
Time-series optimized store for the speed layer's approximate aggregates. Written every minute by the stream processor with tumbling window results. Dashboard queries for recent data (last 1-6 hours) read from this table. Values are labeled 'preliminary' until the batch layer reconciles them. Optimized for fast time-range scans with columnar indexes.
Indexes: PK on (campaign_id, window_start), idx_recent ON (window_start DESC)
Speed layer counts may diverge from batch by 0.1-0.5% due to window boundaries, late arrivals, and duplicate events not yet deduped.
ACID-compliant store for exact billing-grade aggregates. Written hourly by the batch processor after full deduplication using exact click_id sets (no Bloom filter approximation). Dashboard queries for historical data (older than 2 hours) read from this table. Each row includes the exact deduplicated count and the billable amount computed from the advertiser's CPC rate.
Indexes: PK on (campaign_id, hour_bucket), idx_billing_status ON (status, reconciled_at)
Batch job processes ~180M events/hour at peak. Full dedup via exact click_id set requires ~5.7 GB memory for 180M unique IDs.
Tracks which hourly windows have been reconciled by the batch layer. The Query Service checks this table to decide whether to serve data from the speed layer (preliminary) or batch layer (reconciled). When a batch job completes, it updates this table, and subsequent queries automatically switch to reconciled data. This is the key to the lambda architecture's serving layer merge logic.
Switchover is transparent to dashboard users except for a label change from 'preliminary' to 'reconciled'.
Raw ad click events produced by the Ingestion Service. Dual-written to both Kafka (speed layer input) and Object Storage (batch layer input). Partitioned by campaign_id.
Key Schema
campaign_id (STRING)
Value Schema
{ click_id: STRING, campaign_id: STRING, ad_id: STRING, user_id: STRING, publisher_id: STRING, event_time: TIMESTAMP, schema_version: INT }
Preliminary aggregated results from the speed layer's 1-minute tumbling windows. Used for real-time dashboard updates. Labeled as preliminary until the batch layer reconciles.
Key Schema
campaign_id (STRING)
Value Schema
{ campaign_id: STRING, window_start: TIMESTAMP, window_end: TIMESTAMP, click_count: BIGINT, unique_users: BIGINT, status: STRING }
Emitted when the hourly batch processor completes reconciliation for a time window. Signals the serving layer to switch from speed layer data to exact batch data for the reconciled hour.
Key Schema
hour_bucket (STRING)
Value Schema
{ hour_bucket: TIMESTAMP, campaigns_reconciled: INT, total_clicks: BIGINT, total_deduped: BIGINT, reconciled_at: TIMESTAMP }
Batch reconciliation job fails mid-way through hourly processing
Impact
The reconciliation_meta table is not updated for that hour. The serving layer continues to serve speed layer data (preliminary) for the affected hour. Billing data for that hour is delayed. If the failure is not detected, the next batch job may skip the failed hour, leaving a permanent gap in billing-grade data.
Mitigation
Implement checkpoint-based recovery: the batch job writes progress checkpoints to DynamoDB. On retry, it resumes from the last checkpoint rather than restarting. Alert on batch job failure with PagerDuty integration. Idempotent writes to the Billing DB ensure retries do not double-count.
Speed and batch layers diverge by >2% for a high-spend campaign
Impact
The advertiser's real-time dashboard shows a click count that differs significantly from their billing invoice. This creates confusion, advertiser complaints, and potential contract disputes. At $0.50 CPC with 1M daily clicks, a 2% divergence means $10K billing discrepancy per day.
Mitigation
Automated divergence monitoring: compare speed and batch layer counts per campaign after each reconciliation. Alert if divergence >1%. Investigate root cause (late arrivals, consumer rebalance, duplicate events). The serving layer displays a disclaimer during the preliminary window.
Object Storage (S3) becomes temporarily unavailable during peak traffic
Impact
The Ingestion Service's local Parquet buffer fills up. If S3 is down for >5 minutes at 200K RPS, the local buffer (~60M events, ~20 GB) may exhaust disk. The speed layer is unaffected (Kafka is still operating). The batch layer misses events for the duration of the S3 outage, causing the hourly reconciliation to be incomplete.
Mitigation
Configure the Ingestion Service with a large local buffer (100 GB SSD). Implement S3 write retries with exponential backoff. If the buffer fills, fall back to writing to a secondary S3 bucket or EBS volume. Post-recovery, replay missed events from Kafka (7-day retention) to backfill S3.
API Gateway rate limiting triggers during a legitimate traffic spike (Super Bowl ad campaign)
Impact
Legitimate click events above the 200K RPS rate limit are rejected with 429 errors. Rejected clicks are permanently lost — they were never written to Kafka or S3. The advertiser sees fewer clicks than expected, leading to under-billing and revenue loss for the ad platform.
Mitigation
Implement adaptive rate limiting that auto-scales the limit based on authenticated API key tier. Premium advertisers get higher limits. Add a spillover queue (SQS) for rate-limited requests that can be processed after the spike subsides. Pre-provision capacity for known large events.
| Component | Failure | Impact | Mitigation |
|---|---|---|---|
| Kafka cluster | Full cluster outage (all brokers down) | Speed layer ingestion stops completely. Click events queue in the Ingestion Service's memory buffer (limited). The S3 writer may still operate via local buffering, preserving data for the batch layer. Dashboards go stale as no new speed layer data arrives. | Multi-AZ MSK deployment with min.insync.replicas=2. If full outage occurs, the Ingestion Service switches to S3-only mode (batch layer continues, speed layer recovers on Kafka restart). Kafka events are replayed from S3 post-recovery. |
| Batch Processor | OOM during large-hour deduplication | The batch job crashes when the click_id dedup set exceeds available memory. The affected hour is not reconciled — the serving layer continues showing preliminary data. Billing reports for that hour are delayed until the job is fixed and retried. | Partition the dedup job by campaign_id so each worker handles a subset. Use external sort on S3 for campaigns with >10M clicks/hour. Size batch processor instances at 2x expected peak memory (r7g.2xlarge with 64 GB for 720M events/hour). |
| Real-time Database (speed layer) | Primary failover during window flush | The stream processor's batch INSERT fails. In-memory window state is retained and retried on the next flush cycle. Dashboard queries return stale data (from the last successful flush) during failover. No data loss since events remain in Kafka. | RDS Multi-AZ with 30-second automatic failover. Stream processor retries flush with exponential backoff. Query Service serves cached data from Redis during DB failover. |
| API Gateway | Misconfigured rate limit blocks all traffic | 100% of click events rejected with 429 errors. Both speed and batch layers receive zero events. All clicks during the misconfiguration window are permanently lost. Dashboards go stale; billing data has a gap. | Canary deployment for rate limit configuration changes — apply to 5% of traffic for 10 minutes before full rollout. Automated rollback if error rate exceeds 5%. Store rate limit configs in versioned S3 with one-click revert. |
Ingestion: auto-scale 8 → 32 pods on CPU >60%. Kafka: 5 → 10 brokers with partition count increase (32 → 128). Stream Processor: match pod count to Kafka partition count for maximum parallelism. Batch Processor: scale horizontally via EMR auto-scaling (add worker nodes for larger hours). S3: infinitely scalable with no provisioning. API Gateway: auto-scales to any throughput. The ceiling is approximately 500K RPS with 128 Kafka partitions and proportional consumer scaling. Beyond this, consider multi-region deployment with regional Kafka clusters and cross-region batch reconciliation.
Key metrics: (1) Speed-to-batch divergence per campaign — alert if >1% after reconciliation. (2) Batch job duration — alert if hourly job takes >30 minutes (normal is 10-20 minutes). (3) Batch job completion — alert if any hour is un-reconciled for >2 hours. (4) Kafka consumer lag (speed layer) — alert if >10K messages for 5 minutes. (5) S3 write buffer size — alert if local buffer exceeds 50 GB. (6) API Gateway 429 rate — alert if >0.1% of requests are rate-limited. (7) Reconciliation_meta freshness — alert if last reconciled hour is >90 minutes behind current time. Dashboard: Grafana with speed vs batch comparison charts per campaign, batch job progress timeline, Kafka lag heatmap, S3 buffer gauge, and rate limit rejection rate. SLIs: click ingestion p99 <10ms, batch reconciliation within 30 minutes of hour end, speed-batch divergence <0.5%.
At ~200K RPS target: Amazon MSK 5-broker kafka.m7g.xlarge ($1,350/month), S3 Object Storage ~2 TB/year ($50/month), Batch Processor EC2 r7g.2xlarge spot instances ($320/month), Real-time DB RDS db.r7g.xlarge ($370/month), Billing DB RDS db.r7g.large ($185/month), ElastiCache Redis ($150/month), ECS Fargate Ingestion 8 pods ($520/month), Stream Processor 32 pods ($1,040/month), Query Service 6 pods ($390/month), API Gateway ($200/month), ALB ($25/month). Total: ~$4,600/month. Cost per million clicks: ~$0.009. The dual-pipeline overhead (batch processor + billing DB) adds ~$505/month over V1, but provides billing-grade accuracy worth millions in prevented billing disputes.
API Gateway: OAuth 2.0 API key authentication with per-advertiser rate limiting. Request validation rejects malformed payloads before they enter the pipeline. Kafka: SASL/SCRAM + TLS; topic-level ACLs restrict which services can produce/consume. S3: server-side encryption (SSE-S3), bucket policies restrict access to the batch processor IAM role only. Click fraud: the batch layer's exact dedup catches all duplicate click_ids, preventing billing fraud from replay attacks. IP hashing: SHA-256 applied at the Ingestion Service before events enter Kafka or S3. GDPR: S3 lifecycle policy deletes raw events after 1 year; billing data retained for 7 years per tax compliance. SOC 2 audit trail: the immutable S3 archive provides tamper-proof evidence.
Canary deployment for the Ingestion Service and API Gateway — route 5% of traffic to the new version for 30 minutes, monitor error rate and latency, then shift to 100%. Stream Processor uses rolling Kafka consumer group deployment. Batch Processor: deploy new version and run the next hourly job with both old and new versions in parallel — compare outputs before switching. Database migrations use online DDL (pg_osc or pt-online-schema-change) to avoid downtime. Rollback: ALB target group switch for Ingestion, consumer group rollback for Stream Processor, batch job version revert in Step Functions.
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| V0: Naive (Single Service + SQL) | T1 | ~25ms click ingestion | ~500 RPS | $385/month | Low (4 components) | 99% (single DB) |
| V1: Stream Processing (Kafka + Windowed) | T2 | <5ms click ack | ~50K RPS | $2,200/month | Medium (8 components) | 99.9% (multi-AZ) |
| V2: Lambda Architecture (Speed + Batch) | T3 | <8ms click ack | ~200K RPS | $5,800/month | High (12 components) | 99.99% (dual-path) |
| V3: CQRS + Event Sourcing | T4 | 5-8ms write | ~100K RPS | $4,500/month | Very High (10 components) | 99.99% (independent paths) |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
Kafka Streams and Flink both offer exactly-once processing guarantees, but these guarantees have caveats: they depend on idempotent producers, transactional consumers, and checkpoint-based recovery. Under extreme conditions (network partitions, consumer group rebalances, clock skew), exactly-once can degrade to at-least-once. For billing, 'usually exactly-once' is insufficient — the batch layer provides a mathematical guarantee by reprocessing the complete immutable event log.
The Query Service checks a metadata table that tracks which hourly windows have been reconciled. For reconciled windows, it reads from the Billing Database (exact counts). For un-reconciled windows, it reads from the Real-time Database (approximate counts) and labels the values as 'preliminary.' When a batch job completes, it updates the metadata table, and subsequent queries automatically switch to the reconciled data. The switchover is transparent to dashboard users except for the label change.
In normal operation, the speed layer's approximate counts diverge from the batch layer's exact counts by 0.1-0.5%. The main sources of divergence are: (1) late-arriving events that miss the speed layer's grace period, (2) window boundary effects where a click falls in different windows, and (3) duplicate events from producer retries that the speed layer counts twice but the batch layer deduplicates. At 50K clicks/hour per campaign, this translates to 50-250 click difference.
Lambda architecture is overkill when: (1) billing accuracy is not critical (internal analytics, non-financial metrics), (2) traffic volume is under 10K RPS (stream processing alone handles this), (3) the team cannot staff the operational complexity (12 components, dual pipelines), or (4) the cost of infrastructure exceeds the cost of billing errors. For most ad platforms under $100K/day in ad spend, the stream processing variant with periodic batch audits is sufficient.
Raw events in Object Storage are stored in a schema-versioned Parquet format. The batch processor includes a schema registry that maps event versions to processing logic. When the event schema changes (e.g., adding a new field), new events use the new schema while old events retain their original schema. The batch processor handles both versions transparently, applying default values for fields added in newer schemas. This forward-compatible approach avoids reprocessing historical data when the schema evolves.
Sign in to join the discussion.
Ready to design your own Ad Click Aggregator?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator