Event-sourced ad click aggregation where every click is an immutable event in a durable log. Separate command and query models enable independent scaling, full auditability, and projection replay.
CQRS (Command Query Responsibility Segregation) with Event Sourcing represents a fundamentally different approach to ad click aggregation — one that prioritizes auditability, replayability, and schema flexibility over raw throughput. In this architecture, the source of truth is not a database table of current state, but an immutable, append-only log of every event that ever occurred. Current state (campaign click counts, aggregated metrics) is derived from this event log through projections — materialized views that can be rebuilt from scratch at any time.
This approach is particularly compelling for ad click aggregation because of the billing accuracy requirement. When an advertiser disputes a charge ('You billed me for 1 million clicks but I only received 800K'), the system must be able to reconstruct exactly what happened: which clicks were received, when they were processed, how they were aggregated, and whether any were duplicated or missed. With a traditional database, answering these questions requires forensic log analysis. With event sourcing, the answer is definitive: replay the event log through the aggregation logic and produce the exact count.
The CQRS pattern separates the system into a command side (accepting and validating click events) and a query side (serving aggregated metrics). The command side is optimized for write throughput — appending events to a durable log is fast and simple. The query side is optimized for read patterns — projections build exactly the data structures that dashboard queries need. This separation enables independent scaling: add more command processors during traffic spikes, add more query replicas during reporting periods.
Event sourcing also provides a powerful schema evolution story. When you need a new aggregation dimension (e.g., clicks by device type, which was not originally tracked), you add the field to new events and create a new projection that processes both old and new event schemas. The projection replays the entire event log, extracting device information from IP-based heuristics for old events and from the explicit field for new events. No database migration, no ETL pipeline, no data loss — the event log is the single source of truth.
The trade-offs are real: event sourcing introduces eventual consistency between the command and query sides (seconds of lag), projection rebuilds can take hours for large event logs, and the programming model requires understanding event replay semantics (idempotent event handlers, event versioning, snapshot optimization). This template is designed for candidates who want to demonstrate advanced distributed systems knowledge and for teams building billing-critical systems where auditability is a hard requirement.
The CQRS + Event Sourcing architecture for ad click aggregation uses 10 components organized into three tiers: the command side (write path), the event store, and the query side (read path with projections).
The command side handles click ingestion. Clicks arrive at a Load Balancer and are routed to the Command Service, which performs validation (required fields, timestamp bounds, API key authentication) and writes a ClickReceived event to the Event Log. The Event Log is an append-only, durable store — conceptually similar to Kafka's compacted topic or AWS EventBridge's event bus, but implemented as a dedicated event store with sequence numbering, timestamp indexing, and partition-by-aggregate support. Each event receives a monotonically increasing sequence number and is immutable once written. The Command Service returns HTTP 202 Accepted as soon as the event is durably persisted (ack in 5-8ms).
The Event Log is the single source of truth for the entire system. Every click event ever received is stored here, indefinitely. Events are partitioned by campaign_id for processing locality and indexed by timestamp for time-range replays. The log supports two read patterns: tail reading (projections consuming new events in real time) and replay reading (rebuilding projections from the beginning or from a snapshot).
The query side consists of two independent projections, each building a different materialized view from the same event stream. The Dashboard Projector consumes events in real time and maintains a time-bucketed click count per campaign in the Dashboard Database (optimized for recent queries, time-series indexes). The Billing Projector consumes the same events but applies strict deduplication (maintaining a set of seen click_ids) and writes reconciled totals to the Billing Database (ACID-compliant, optimized for exact aggregation queries). Both projectors can be rebuilt from scratch by replaying the Event Log from the beginning — this is the key event sourcing guarantee.
A Redis cache sits in front of the Query Service, caching the most frequently accessed campaign dashboards for sub-2ms reads. The Query Service reads from the appropriate database based on the query type: dashboard queries hit the Dashboard Database (or Redis cache), billing queries hit the Billing Database. Both databases are eventually consistent with the Event Log — the Dashboard Projector has a lag of 1-5 seconds, and the Billing Projector has a lag of 5-30 seconds (it does more processing per event for deduplication).
The architecture enables several powerful operational capabilities. Projection rebuild: if you discover a bug in the Dashboard Projector's aggregation logic, you fix the code and replay the Event Log from the beginning — the Dashboard Database is rebuilt with the correct logic, no data loss. Schema evolution: when you add a new aggregation dimension, you create a new projection and replay the log — no ETL, no migration. A/B testing: run two versions of a projection side by side against the same event stream and compare outputs. Audit: any billing dispute can be resolved by replaying the Event Log through the Billing Projector and producing a deterministic, reproducible count.
Scaling is independent per tier. The Command Service scales horizontally via the Load Balancer (more pods for higher write throughput). The Event Log scales via partitioning (more partitions for higher write throughput and parallelism). Each projector scales independently — the Dashboard Projector can have more instances than the Billing Projector because dashboard freshness is more latency-sensitive. The Query Service scales based on dashboard query volume, independent of write traffic.
Event sourcing inverts the traditional database model: instead of storing current state and losing history, it stores every state change as an immutable event and derives current state through projections. The write path is trivially simple — validate the click and append it to the log. All 'intelligence' lives on the read path, where projections build exactly the data structures that queries need.
The most powerful capability is projection rebuild: if you discover a bug in your aggregation logic, you fix the code and replay the entire event log through the corrected projection. The Dashboard DB is rebuilt with the correct logic — no data migration, no ETL pipeline, no data loss. This is the core event sourcing guarantee that makes it compelling for billing-critical systems.
Step-by-Step Walkthrough
Pseudocode
// COMMAND SIDE — Validate + Append (simple by design)
async function handleClickCommand(campaign_id, ad_id, user_id):
// Validate only — no aggregation, no dedup, no side effects
if not campaign_id: throw ValidationError("campaign_id required")
if not ad_id: throw ValidationError("ad_id required")
event = ClickReceived {
aggregate_id: campaign_id,
schema_version: 2,
payload: { campaign_id, ad_id, user_id, click_id: uuid() },
event_time: now()
}
// Append to immutable log — gets monotonic sequence number
seq = await event_log.append(event) // ~5ms, durable
return 202 // Accepted — not "processed"
// DASHBOARD PROJECTOR — Fast, approximate (1-5s lag)
class DashboardProjector:
last_seq = 0 // Tracks position in event log
async function processEvent(event):
if event.type != "ClickReceived": return
campaign_id = event.payload.campaign_id
window = truncate_to_minute(event.event_time)
// Upsert time-bucketed counter
await dashboard_db.execute(
"INSERT INTO dashboard_projection (campaign_id, window_start, click_count)
VALUES ($1, $2, 1)
ON CONFLICT (campaign_id, window_start)
DO UPDATE SET click_count = click_count + 1",
[campaign_id, window]
)
last_seq = event.sequence_number
// Push to Redis for hot campaigns
await redis.setex("campaign:" + campaign_id, 60, latest_aggregate)
// BILLING PROJECTOR — Exact dedup, ACID (5-30s lag)
class BillingProjector:
last_seq = 0
async function processEvent(event):
click_id = event.payload.click_id
// Exact dedup via click_id set (no approximation)
exists = await dedup_db.exists("dedup_set", click_id)
if exists: return // Duplicate — skip silently
await dedup_db.insert("dedup_set", { click_id, first_seen: now() })
// Count only first occurrence
campaign_id = event.payload.campaign_id
hour = truncate_to_hour(event.event_time)
await billing_db.execute(
"INSERT INTO billing_projection (campaign_id, hour_bucket, exact_clicks)
VALUES ($1, $2, 1)
ON CONFLICT (campaign_id, hour_bucket)
DO UPDATE SET exact_clicks = exact_clicks + 1",
[campaign_id, hour]
)
last_seq = event.sequence_number
// PROJECTION REBUILD — The core event sourcing guarantee
async function rebuildProjection(projector, from_seq = 0):
// Optionally start from latest snapshot
snapshot = await snapshots_db.getLatest(projector.name)
if snapshot:
projector.loadState(snapshot.state_data)
from_seq = snapshot.snapshot_seq
// Replay every event from that point forward
events = event_log.readFrom(from_seq) // Iterator, not full load
for event in events:
projector.processEvent(event)
// Full projection is now rebuilt with current logic
log("Rebuilt from seq " + from_seq + " to " + projector.last_seq)This schema embodies the event sourcing principle: the Event Log is the only table that stores original facts. Everything else — dashboard_projection, billing_projection, dedup_set, projection_snapshots — is derived and can be deleted and rebuilt from the Event Log at any time.
The last_seq_processed field in each projection table is the bookmark that tracks how far the projection has consumed from the Event Log. If a projector restarts, it resumes from its last_seq_processed rather than replaying from the beginning. Snapshots further optimize this by periodically checkpointing the entire projection state, allowing rebuilds to start from a recent snapshot rather than sequence #0.
Step-by-Step Walkthrough
Pseudocode
// EVENT LOG — Immutable, append-only (the only "real" table)
// Write: O(1) append, ~5ms per event
// Read: sequential scan (tail) or range scan (replay)
event_log.append({
sequence_number: next_seq(), // Monotonic, gap-free
event_type: "ClickReceived",
aggregate_id: "camp_123", // Partition key for locality
schema_version: 2, // Forward-compatible evolution
payload: { campaign_id, ad_id, user_id, click_id },
event_time: "2025-01-15T12:00:03Z",
stored_at: now()
})
// NEVER: UPDATE event_log SET ...
// NEVER: DELETE FROM event_log WHERE ...
// DASHBOARD PROJECTION — Derived, rebuildable
// Projector reads from event_log WHERE seq > last_seq_processed
dashboard_projection = {
campaign_id: "camp_123",
window_start: "2025-01-15 12:00",
click_count: 847,
unique_users: 612,
ctr: 0.034,
last_seq_processed: 15_847_293 // Bookmark for resume
}
// BILLING PROJECTION — Derived, with exact dedup
// Projector checks dedup_set before counting
if not dedup_set.contains(click_id):
dedup_set.add(click_id)
billing_projection[campaign_id][hour].exact_clicks += 1
billing_projection[campaign_id][hour].billable_amount =
exact_clicks * cpc_rate
// SNAPSHOT — Periodic checkpoint for fast rebuild
projection_snapshots.save({
projection_name: "dashboard_v2",
snapshot_seq: 15_000_000, // Resume from here, not seq#0
state_data: serialize(dashboard_projection), // ~500 MB
created_at: now()
})
// Rebuild time: ~17min from seq#0 vs ~3min from latest snapshotChoice
Immutable event log as single source of truth
Rationale
Traditional CRUD databases store current state — once a row is updated, the previous value is lost. Event sourcing stores every state change as an immutable event, enabling full reconstruction of any historical state. For ad billing, this means any dispute can be resolved by replaying events. The trade-off is storage cost (all events stored forever) and query complexity (current state must be computed from events).
Choice
Two independent projections from the same event stream
Rationale
Dashboard and billing queries have fundamentally different requirements. Dashboards need freshness (1-5 second lag acceptable) and time-series access patterns. Billing needs accuracy (exact dedup, ACID guarantees) and aggregation access patterns. Building both from the same event stream ensures consistency while allowing each to optimize for its access pattern. If a third access pattern emerges (e.g., fraud detection), you add a third projection without changing the write path.
Choice
Command Service validates but does not aggregate
Rationale
The command side's only job is to validate click events and append them to the event log. It does not compute aggregates, check deduplication, or query current state. This keeps the write path fast (5-8ms) and simple. All 'intelligence' lives in the projections on the query side, where it can be changed, versioned, and replayed independently of the write path.
Choice
All projections can be rebuilt from the event log
Rationale
This is the core event sourcing guarantee. If a projection has a bug, you fix the code and rebuild from scratch. If you need a new aggregation dimension, you create a new projection and replay. If a database is corrupted, you rebuild from the event log. The cost is rebuild time — at 100M events, a full replay takes 1-4 hours depending on projection complexity. Snapshots (periodic checkpoints) reduce this to minutes for common cases.
Choice
Cache layer between Query Service and databases
Rationale
Even with optimized projections, database queries have 5-20ms latency. Redis reduces this to sub-2ms for the most popular campaigns. The cache is populated by the Dashboard Projector (push-based) rather than by the Query Service (pull-based), ensuring the cache is always warm for active campaigns. Cache invalidation is event-driven — the projector updates the cache as it processes new events.
Target RPS
100K+ (write path), unlimited read scaling
Latency (p99)
5-8ms write, 1-5s dashboard lag, sub-2ms cached reads
Storage
~3 TB/year (immutable event log + projections)
Availability
99.99% (independent write/read paths)
| Operation | Time | Space | Notes |
|---|---|---|---|
| Click command (append to Event Log) | O(1) — single append to immutable log | O(1) — one event record per click | 5-8ms per write. The command side does zero aggregation — validation only. This is the fastest write path among all variants for its simplicity. |
| Dashboard projection (real-time tail read) | O(1) per event — hash map increment in the projector's in-memory state | O(C × W) — campaigns × active time windows in the projection | 1-5 second lag from Event Log to Dashboard DB. The projector processes events sequentially per partition — throughput limited by single-thread processing speed (~100K events/sec per consumer). |
| Billing projection (with deduplication) | O(1) amortized per event — hash set lookup for click_id dedup + counter increment | O(D) — where D is the number of unique click_ids in the dedup set (~3.2 GB for 100M IDs) | 5-30 second lag due to dedup overhead. The dedup set is the memory bottleneck — periodically trimmed to IDs from the last 7 days. |
| Projection rebuild (full replay) | O(E) — where E is the total number of events in the log (sequential scan) | O(P) — full projection state must fit in memory during rebuild | At 100M events, full rebuild takes ~17 minutes at 100K events/sec. Snapshots reduce this to ~3 minutes by starting from a recent checkpoint. |
The single source of truth for the entire system — an immutable, append-only log of every ClickReceived event. Each event gets a monotonically increasing sequence number and is never modified after write. Partitioned by campaign_id (aggregate_id) for processing locality. Supports two read patterns: tail reading (projections consuming new events in real time) and replay reading (rebuilding projections from seq#0 or from a snapshot). Events are versioned for schema evolution — old events retain their original schema version while new events use the latest.
Partition: aggregate_id (campaign_id)
Indexes: PK on sequence_number, idx_aggregate ON (aggregate_id, sequence_number), idx_event_time ON (event_time)
Immutable after write — no UPDATEs or DELETEs ever. At 100K events/sec write throughput, replay speed ~100K events/sec per consumer. Storage grows ~3 TB/year.
Materialized view built by the Dashboard Projector from the event log. Stores time-bucketed click counts per campaign, optimized for recent time-range queries (last 1-6 hours). Updated in real time as the projector processes new events (1-5 second lag). Can be rebuilt from scratch by replaying the event log from sequence #0 — this is the core event sourcing guarantee. If the projection logic has a bug, fix the code and rebuild.
Indexes: PK on (campaign_id, window_start), idx_recent ON (window_start DESC)
Rebuilding from 100M events takes ~17 minutes at 100K events/sec. Snapshots reduce this to minutes for incremental rebuilds.
Materialized view built by the Billing Projector with strict deduplication. Maintains exact click counts per campaign per hour for billing accuracy. The projector tracks every click_id it has ever seen (via the dedup_set table) to prevent double-counting. Processing lag is 5-30 seconds (more work per event than the dashboard projector). Can also be fully rebuilt from the event log.
Indexes: PK on (campaign_id, hour_bucket)
Billing disputes are resolved by replaying the event log through this projector — deterministic, reproducible count every time.
Tracks every click_id the Billing Projector has processed. Used for exact deduplication — if a click_id appears twice in the event log (from producer retries or network issues), only the first occurrence is counted. Unlike Bloom filters (which have false positives), this set provides exact dedup at the cost of memory. At 100M unique click_ids, the set requires ~3.2 GB of memory.
Indexes: PK on click_id
Periodically trimmed: click_ids older than 7 days are removed (matching Kafka retention) since duplicates beyond that window are astronomically unlikely.
Periodic checkpoints of projection state used to speed up rebuilds. Instead of replaying from sequence #0 (hours for large event logs), a projection can resume from the latest snapshot. Each snapshot captures the full projection state at a specific sequence number. Created automatically every N events or on a schedule. Multiple snapshots are retained for rollback capability.
Snapshot size: ~500 MB for the dashboard projection at 100M events. Created every 10M events by default. Retention: last 5 snapshots.
Immutable event appended to the Event Log when a click passes command-side validation. This is the single source of truth — all projections derive their state from this stream.
Key Schema
aggregate_id / campaign_id (STRING)
Value Schema
{ sequence_number: BIGINT, event_type: STRING, aggregate_id: STRING, schema_version: INT, payload: { click_id: STRING, campaign_id: STRING, ad_id: STRING, user_id: STRING }, event_time: TIMESTAMP }
Emitted when a projector updates its materialized view. Used to trigger cache warming in Redis and to notify downstream systems that fresh data is available.
Key Schema
projection_name + campaign_id (STRING)
Value Schema
{ projection_name: STRING, campaign_id: STRING, window_start: TIMESTAMP, click_count: BIGINT, unique_users: BIGINT, last_seq_processed: BIGINT }
Emitted when the Billing Projector finishes processing all events up to a given sequence number. Signals that billing data is current and accurate through that point.
Key Schema
hour_bucket (STRING)
Value Schema
{ hour_bucket: TIMESTAMP, last_seq_processed: BIGINT, campaigns_processed: INT, total_exact_clicks: BIGINT, total_deduped: BIGINT, completed_at: TIMESTAMP }
Dashboard Projector has a bug that under-counts clicks for campaigns starting with 'camp_A'
Impact
All campaigns matching the pattern show incorrect click counts on dashboards. Advertisers notice discrepancies and file support tickets. The billing projection is unaffected (different code path), so invoices remain correct, but the dashboard data is wrong.
Mitigation
This is where event sourcing shines: fix the bug in the Dashboard Projector code, then replay the entire Event Log through the corrected projector. The Dashboard DB is rebuilt with accurate counts — no data loss, no manual correction, deterministic and reproducible. The replay takes ~17 minutes for 100M events or ~3 minutes from the latest snapshot.
Event Log storage fills up (disk exhaustion on the event store)
Impact
The Command Service cannot append new events — all click ingestion fails with 503 errors. Both projections continue processing their backlog but receive no new events. The entire system effectively halts for new data while existing data remains accessible.
Mitigation
Monitor Event Log disk usage with alerts at 70% capacity. Implement log compaction or archival to S3 for events older than 30 days. Scale storage horizontally by adding partitions across additional disks or nodes. The Event Log should be on provisioned IOPS storage (io2) with auto-expanding volume policy.
Both projectors fall behind the Event Log by >1 hour during a traffic spike
Impact
Dashboard data is over 1 hour stale — advertisers cannot see recent campaign performance. Billing data is similarly delayed, potentially affecting real-time budget enforcement (campaigns may overspend their daily budget before the billing projection catches up).
Mitigation
Auto-scale projector instances based on consumer lag (add instances when lag >5 minutes). For the Dashboard Projector, implement a fast-forward mode that skips detailed per-event processing and reads pre-aggregated batches from the Event Log. For the Billing Projector, increase instance memory to handle larger dedup sets without GC pauses.
Schema version 3 events are published but the Billing Projector only handles v1 and v2
Impact
The Billing Projector encounters unknown schema version events and either crashes (if no error handling) or skips them (if defensive). Skipped events mean under-counted clicks in billing — advertisers are under-billed. If it crashes, the projection falls behind and billing data goes stale.
Mitigation
Schema Registry with forward compatibility enforcement — v3 schema must be registered and compatible before producers can publish v3 events. All projectors include a default handler for unknown schema versions that logs the event and applies a best-effort extraction of known fields. Alert on unknown schema version encounters.
| Component | Failure | Impact | Mitigation |
|---|---|---|---|
| Event Log (event store) | Write unavailability | All click ingestion fails — the Command Service cannot append events. Both projections continue processing existing events but receive nothing new. Complete write-path outage until the event store recovers. | Deploy the Event Log on a distributed store (e.g., EventStoreDB cluster, Kafka with replication.factor=3, or Aurora PostgreSQL Multi-AZ). Cross-AZ replication ensures a single node failure does not cause write unavailability. |
| Dashboard Projector | Crash during event processing | Dashboard data goes stale — the last_seq_processed bookmark stops advancing. The Dashboard DB still serves the last written data but does not update with new clicks. Billing projection is unaffected (independent process). | The projector tracks last_seq_processed in the Dashboard DB. On restart, it resumes from the bookmark — no data loss, just a brief increase in dashboard lag. Auto-restart via Kubernetes liveness probe within 30 seconds. |
| Billing Projector | OOM from dedup set growth | The Billing Projector crashes when the dedup set exceeds available memory. Billing data goes stale at the point of crash. Dashboard projection continues normally. Billing reports for the affected period are delayed. | Size instances at 2x expected dedup set (r7g.2xlarge with 64 GB for 100M+ unique click_ids). Implement periodic dedup set trimming (remove click_ids older than 7 days). Use RocksDB-backed state store instead of pure in-memory set for larger-than-RAM dedup sets. |
| Redis Cache | Node failure / split brain | Dashboard queries bypass cache and hit the Dashboard DB directly. Latency increases from ~2ms to 5-20ms. If many campaigns are queried simultaneously, the Dashboard DB connection pool may saturate. | ElastiCache Multi-AZ with automatic failover. The Dashboard Projector repopulates the cache as it processes events — cache warm-up is automatic. The Query Service implements a local in-memory fallback cache (30-second TTL) for the top 100 campaigns. |
Command Service: CPU-based auto-scaling from 6 to 24 pods (target 60% utilization). Event Log: scale via partitioning — more partitions enable more parallel projector consumers. Dashboard Projector: one instance per Event Log partition (match partition count). Billing Projector: scale independently based on dedup set size and processing lag. Query Service: request-rate auto-scaling from 4 to 16 pods. Redis: vertical scaling or cluster mode for larger working sets. The write path ceiling is limited by Event Log throughput (~100K RPS per partition with 16 partitions). The read path scales independently and has no practical ceiling since projections can be replicated.
Key metrics: (1) Event Log write latency — alert if p99 >20ms (normal is 5-8ms). (2) Dashboard Projector lag (last_seq_processed vs Event Log head) — alert if >10K events behind for 3 minutes. (3) Billing Projector lag — alert if >50K events behind for 5 minutes. (4) Dedup set size — alert if >80% of allocated memory. (5) Projection rebuild duration — track and alert if >30 minutes (snapshot may be stale). (6) Redis cache hit rate — alert if <60%. (7) Event schema version distribution — alert on unknown versions. Dashboard: Grafana with panels for Event Log write rate, projector lag gauges (dashboard vs billing), dedup set memory usage, cache hit rate, and projection rebuild history. SLIs: event append p99 <15ms, dashboard freshness <10 seconds, billing projection lag <60 seconds, projection rebuild <20 minutes.
At ~100K RPS target: Event Log store (EventStoreDB or MSK 3-broker kafka.m7g.xlarge, $810/month), Dashboard DB RDS db.r7g.large ($185/month), Billing DB RDS db.r7g.large ($185/month), ElastiCache Redis cache.r7g.large ($150/month), ECS Fargate Command Service 6 pods ($390/month), Dashboard Projector 16 pods ($520/month), Billing Projector 8 pods ($260/month), Query Service 4 pods ($260/month), ALB ($25/month). Total: ~$2,785/month. Cost per million clicks: ~$0.011. The event store and projector pods are the largest cost drivers. Snapshot storage adds ~$15/month (S3). The total is lower than V2 Lambda ($5,800) because there is no batch processor infrastructure, but higher than V1 ($2,200) due to the dual-projection overhead and event store requirements.
Command-side authentication: API key validated before event append — unauthenticated events never enter the Event Log. Event Log immutability: no UPDATE or DELETE operations permitted at the application or database level (enforce via IAM policies and DB roles). Click fraud: the Billing Projector's exact dedup set prevents all duplicate-based fraud. Audit trail: the immutable Event Log serves as a complete, tamper-proof audit trail for regulatory compliance (SOC 2, ad industry TAG certification). IP hashing: SHA-256 applied in the Command Service before the event is appended — raw IPs are never stored. GDPR right-to-erasure: handled via crypto-shredding — user_id is encrypted with a per-user key; deleting the key renders all that user's events unreadable without modifying the immutable log.
Rolling deployment for the Command Service (stateless, simple pod replacement). Projectors use a blue-green pattern: deploy the new projector version alongside the old one, both reading from the Event Log. Verify the new version produces identical outputs for a 10-minute window, then decommission the old version. Event Log schema changes go through the Schema Registry with compatibility checks. Projection rebuilds are triggered post-deployment if the projection logic changed. Rollback: revert Command Service to previous image; for projectors, revert to previous version and trigger a rebuild from the latest snapshot.
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| V0: Naive (Single Service + SQL) | T1 | ~25ms click ingestion | ~500 RPS | $385/month | Low (4 components) | 99% (single DB) |
| V1: Stream Processing (Kafka + Windowed) | T2 | <5ms click ack | ~50K RPS | $2,200/month | Medium (8 components) | 99.9% (multi-AZ) |
| V2: Lambda Architecture (Speed + Batch) | T3 | <8ms click ack | ~200K RPS | $5,800/month | High (12 components) | 99.99% (dual-path) |
| V3: CQRS + Event Sourcing | T4 | 5-8ms write | ~100K RPS | $4,500/month | Very High (10 components) | 99.99% (independent paths) |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
CQRS (Command Query Responsibility Segregation) is a pattern that separates the write model from the read model. You can do CQRS without event sourcing — e.g., a write database and a read replica. Event Sourcing is a pattern that stores state changes as immutable events rather than current state. You can do event sourcing without CQRS — e.g., one event log with one projection. In practice, they are often combined because event sourcing naturally produces an append-only write model and derived read models (projections), which is exactly the CQRS separation.
Projection rebuild time depends on event volume and projection complexity. For a simple counter projection (sum clicks by campaign), the bottleneck is event log read throughput — approximately 100K events/second per consumer, so 100M events takes ~17 minutes. For a complex projection with deduplication (maintaining a set of seen click_ids), memory becomes the bottleneck — at 100M unique click_ids, the set requires ~3.2 GB of memory. Snapshots (periodic checkpoints of projection state) can reduce rebuild time to minutes by starting from the latest snapshot rather than the beginning.
Events in the log are versioned. When the ClickReceived event schema changes (e.g., adding a device_type field), new events use v2 while old events remain v1. Projections include a schema registry that maps event versions to processing logic. The v2 projection handler extracts device_type directly; the v1 handler applies a default or heuristic. This forward-compatible approach means old events never need to be migrated — they are processed as-is by version-aware handlers.
The command side is strongly consistent — once a write is acknowledged, the event is durably persisted and will never be lost. The query side is eventually consistent — projections process events with a lag of 1-30 seconds depending on the projection's processing speed. This means a user might click an ad and not see the count increment for a few seconds on the dashboard. For billing, this lag is irrelevant because billing reports are generated hourly/daily. For dashboards, 1-5 second lag is acceptable.
CQRS + Event Sourcing is overkill when: (1) you do not need full audit trails or billing dispute resolution, (2) your team is not experienced with event-driven architectures (the learning curve is steep), (3) traffic is under 10K RPS (stream processing is simpler and sufficient), or (4) you do not anticipate needing new aggregation dimensions or projection changes over time. The Lambda Architecture variant provides similar billing accuracy with a more familiar batch processing model.
Sign in to join the discussion.
Ready to design your own Ad Click Aggregator?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator