Hard10 componentsInterview: High

Logging Pipeline — Kafka + Indexing Pipeline (Hot/Warm/Cold)

Q: Why tiered storage instead of indexing everything in Elasticsearch?

At 10 PB per day, keeping all data in Elasticsearch would cost approximately $500,000 per day in compute and storage ($2,300/TB/month). S3 storage costs approximately $23/TB/month (Standard) to $0.004/TB/month (Glacier). The 7-day hot tier indexes approximately 70 PB, while the remaining years of data sit in S3 at 100x lower cost. Since 95 percent of searches hit the last 24 hours, the hot tier handles the vast majority of queries.

Q: How does the roll-off process work?

RolloffWorker runs continuously with 20 parallel workers. Each worker queries IndexDB for data older than 7 days, reads the documents, compresses them into Parquet files organized by tenant/date/hour, and writes to S3. After successful S3 upload confirmation, the worker deletes the corresponding Elasticsearch documents. At terabytes per hour, the roll-off keeps the hot tier at a stable 7-day size.

Q: What happens when Elasticsearch is down for maintenance?

Kafka continues accepting log batches from agents — the deep buffer (50M messages) provides hours of retention at normal ingest rates. ParserWorker pauses consumption, and Kafka consumer lag grows. When Elasticsearch comes back online, workers resume processing and work through the backlog. No log data is lost. During the outage, search is unavailable but ingest continues uninterrupted.

Q: Why does this variant lack streaming alerting?

This pipeline indexes logs and pre-computes dashboard metrics but does not implement streaming alerting (e.g., alert if 5xx rate exceeds 1 percent in a 5-minute window). Detecting anomalies requires either polling Elasticsearch periodically (minutes of delay) or monitoring Redis metrics (limited to pre-defined aggregations). The Multi-Tier variant (v2) adds a dedicated AlertEngine that evaluates rules on every parsed log record in real-time, providing sub-5-second alert latency.

Industry-standard logging pipeline: Kafka for async ingest buffering, parser workers for log enrichment, Elasticsearch for hot-tier full-text search, S3 for cold-tier archival. Handles 100M log lines per second with tiered cost optimization.

ObservabilityKafkaElasticsearchTiered StorageProduction

Try in Simulator

Problem Statement

The Kafka-based logging pipeline is the industry-standard architecture used by every major observability platform — Datadog, Splunk, Elastic, and Grafana all use variations of this pattern. It solves the two fundamental problems that make the naive direct-to-database approach unworkable at scale: the write throughput ceiling and the search performance degradation.

The first problem is write throughput. Production systems generate log data at rates that no single database can absorb synchronously. A fleet of 10,000 hosts generating 10,000 log lines per second each produces 100 million lines per second at peak. The naive approach's 500 lines per second ceiling is seven orders of magnitude too low. The solution is asynchronous ingest via Kafka: agents publish log batches to Kafka topics, which provides sub-5ms acknowledgment regardless of downstream processing capacity. Kafka acts as a deep buffer (50 million messages) that absorbs traffic bursts during incidents and deployments while parser workers process at a sustainable rate.

The second problem is search performance. SQL LIKE on a PostgreSQL text column is an O(N) sequential scan that becomes unusable beyond a few million rows. The solution is Elasticsearch (or OpenSearch), which maintains an inverted index mapping every term to the documents containing it. This enables sub-second full-text search across billions of documents. The trade-off is that indexing is not instant — there is a delay between ingest and searchability while parser workers consume from Kafka, parse and enrich the logs, and bulk-index into Elasticsearch.

The third problem is storage economics. At 100 million lines per second with an average 4KB per batch, the system generates approximately 10 petabytes per day. Storing all of this in Elasticsearch would cost approximately $500,000 per day in compute and storage. The solution is tiered storage: a hot tier (Elasticsearch, 7 days, ~$2,300/TB/month) for recent searchable data, and a cold tier (S3, 7 years, ~$0.01/TB/month) for long-term archival. Roll-off workers continuously move expired data from hot to cold, achieving a 100x cost reduction for 95 percent of the data.

The architecture splits into two paths: the ingest path (agents to Kafka to parser workers to Elasticsearch and S3) and the query path (engineers through an API gateway to a search service that queries Elasticsearch for recent data and S3 for historical data). Redis caches pre-computed dashboard metrics to avoid expensive Elasticsearch aggregation queries on every dashboard refresh.

This architecture handles the scale requirements that the naive approach cannot: 100M lines per second ingest, sub-5-second search on recent data, 60-second search on historical data, and 7-year retention at manageable cost. The operational complexity is significantly higher (10 components versus 3), requiring expertise in Kafka partition management, Elasticsearch cluster sizing, S3 lifecycle policies, and worker auto-scaling.

Logging pipeline design at this level appears in interviews at Datadog, Splunk, Elastic, AWS CloudWatch, Google Cloud Logging, and Grafana Labs. Interviewers expect candidates to explain the Kafka buffering pattern, the Elasticsearch inverted index advantage, and the tiered storage cost optimization.

Architecture Overview

The Kafka-based logging pipeline uses 10 components organized into separate ingest and query paths with shared data layers. The ingest path handles the write-heavy workload (100M lines per second); the query path handles search and dashboard reads (60K queries per second).

The ingest path starts with AgentClient, representing millions of log agents streaming batched log lines via gRPC. Batches of approximately 1,000 lines in protobuf format (~4KB per batch) are sent to IngestLB (AWS NLB), which distributes across IngestService pods using round-robin. IngestService validates the batch, adds tenant_id and ingest_timestamp, and publishes to IngestStream (Kafka). The HTTP response returns immediately — indexing is fully async. At 100M lines per second, Kafka's 256 partitions and 10M msg/sec capacity provide massive headroom.

ParserWorker consumes from IngestStream. For each batch, it parses structured (JSON) and unstructured (grok pattern) logs, enriches with metadata (service name, host, trace_id), computes sliding-window metrics for MetricsCache, and bulk-indexes into IndexDB (Elasticsearch/OpenSearch). 100 workers process in parallel with bulk batches of 1,000 documents per index call. The hot tier retains 7 days of indexed data.

RolloffWorker runs continuously, reading expired data (older than 7 days) from IndexDB and archiving it to ArchiveStore (S3) in compressed Parquet format. After successful archive, the hot tier data is deleted. Cold tier retains data for 7 years. 20 workers handle the continuous roll-off at terabytes per hour.

The query path starts with QueryClient (engineers and dashboards). Requests flow through ApiGateway (AWS API Gateway — auth, rate limiting) to SearchService. For recent queries (last 7 days), SearchService queries IndexDB (Elasticsearch, ~500ms p99). For historical queries, it reads from ArchiveStore (S3 Select on Parquet, ~30s p99). MetricsCache (Redis) stores pre-computed aggregations (error rates, p99 latencies) for dashboard reads (~2ms).

The architecture provides clear separation of concerns: Kafka handles burst absorption, ParserWorker handles parsing and indexing, Elasticsearch handles search, S3 handles archival, and Redis handles dashboard metrics. Each component scales independently. The primary limitation is the lack of streaming alerting — anomaly detection requires polling Elasticsearch or building metrics from the Redis cache, adding minutes of delay compared to the streaming approach in the Multi-Tier variant.

Architecture Preview

Loading architecture preview...

Open in Simulator

Key Design Decisions

Kafka Ingest Buffer

Choice

Async publish to Kafka with sub-5ms ack, decoupled from indexing

Rationale

Log traffic is extremely bursty — a deployment or incident can spike ingest 10x in seconds. Kafka absorbs the burst (deep buffer of 50M messages) while parser workers process at a sustainable rate. Without Kafka, index workers would need to auto-scale instantly with traffic, which takes minutes and causes data loss during the ramp-up. Kafka also provides replay capability if workers crash.

Elasticsearch Hot Tier

Choice

Inverted index for 7 days of full-text searchable log data

Rationale

Full-text search on structured log data (service:checkout AND level:ERROR) is Elasticsearch's core strength. The inverted index enables sub-second search across billions of documents. 95 percent of searches hit the last 24 hours, so the 7-day hot tier handles the vast majority of queries. The trade-off is cost: ~$2,300/TB/month versus ~$0.01/TB/month for S3.

S3 Cold Tier with Parquet

Choice

Compressed columnar Parquet files in S3 for multi-year retention

Rationale

Parquet is columnar — scanning only the message column of 1TB of logs reads approximately 10GB instead of 1TB, reducing S3 scan cost 100x. S3 Select and Athena can query Parquet in-place without loading into a database. Combined with S3 Glacier for data older than 90 days, the total cost is approximately $0.004/GB/month for long-term retention.

Pre-Computed Metrics in Redis

Choice

ParserWorker writes sliding-window aggregations to Redis during indexing

Rationale

Dashboard queries (error rate, p99 latency, throughput by service) would hammer Elasticsearch if computed on every refresh. ParserWorker computes these aggregations as it processes logs and writes them to MetricsCache. Dashboards read from Redis in ~2ms instead of running 5-second Elasticsearch aggregation queries.

Separate Ingest and Query Paths

Choice

IngestLB and ApiGateway are separate entry points for write and read traffic

Rationale

Log ingest (100M lines/sec, 85% of traffic) and search queries (60K/sec, 15% of traffic) have completely different latency requirements and traffic patterns. Separating them prevents a burst of ingest traffic from starving search queries, and prevents expensive search queries from affecting ingest acknowledgment latency.

Multi-Tenant Isolation via Per-Tenant Indices

Choice

Each tenant's logs stored in a separate Elasticsearch index (logs-{tenant_id}-{date})

Rationale

Per-tenant indices enable: (1) tenant-level rate limiting on IngestService, (2) per-tenant Kafka partitions for processing isolation, (3) per-tenant index lifecycle management, and (4) tenant-scoped search without cross-tenant data leakage. This prevents a chatty microservice from causing ingestion lag for other tenants.

Scale & Performance

Target RPS

100M lines/sec ingest, 60K queries/sec

Latency (p99)

<10ms ingest ack, <5s hot search, <60s cold search

Storage

~10 PB/day (hot: 7d indexed, cold: 7y archived)

Availability

99.9% (replicated Kafka, ES, S3)

Time & Space Complexity

Operation	Time	Space	Notes
Ingest log batch (POST /api/v1/logs)	O(1) Kafka publish per batch (~5ms)	O(B) per batch, B = batch size (~4KB)	Kafka publish is fire-and-forget from the ingest endpoint. Actual indexing is async via ParserWorker.
Search hot tier (POST /api/v1/search)	O(log N) inverted index lookup + O(K) result fetch	O(K) result buffer, K = matching documents	N = total documents in Elasticsearch. Inverted index provides sub-second search across billions of docs.
Search cold tier (POST /api/v1/search, historical)	O(P * C) where P = Parquet partitions scanned, C = column scan per partition	O(K) result buffer	S3 Select on Parquet. Columnar format scans only needed columns. p99 latency ~30s depending on data volume.
Dashboard read (GET /api/v1/metrics/dashboard)	O(1) Redis GET	O(1) per metric value	Pre-computed aggregations in Redis. Sub-2ms latency. No Elasticsearch query required.

Database Schema (HLD)

logs (Elasticsearch/OpenSearch)

Hot-tier search index storing 7 days of parsed, enriched log data. Per-tenant indices (logs-{tenant_id}-{date}) for multi-tenant isolation. Write-heavy from ParserWorker bulk indexing (1000 docs per batch). Read path serves full-text search queries in sub-second for most queries. RolloffWorker expires data older than 7 days to the cold tier.

log_id STRING (UUID, document ID)tenant_id STRING (partition field for per-tenant indices)service STRING (source service name, keyword field)level STRING (DEBUG/INFO/WARN/ERROR, keyword field)message TEXT (full-text indexed with standard analyzer)timestamp DATE (log event time, date field for range queries)trace_id STRING (distributed trace correlation, keyword field)

Inverted index on message enables sub-second full-text search. 3 data nodes, 2 replicas per shard, 64 shards total. Bulk index API for high-throughput ingestion.

log_archive (S3 Parquet)

Cold-tier S3 archive storing compressed Parquet log files organized by tenant/date/hour. Written by RolloffWorker at terabytes per hour. Queried by SearchService via S3 Select for historical queries. Standard-IA for 90 days, Glacier for 7 years.

archive_key STRING (S3 key: tenant/date/hour/shard.parquet)tenant_id STRING (partition column in Parquet)date STRING (date partition for efficient range scans)size_bytes BIGINT (Parquet file size)

Columnar Parquet format enables scanning only needed columns, reducing S3 scan cost 100x. Total storage grows at ~10 PB/day.

dashboard_metrics (Redis)

Pre-computed sliding-window aggregations written by ParserWorker during log indexing. Keys follow the pattern metric:{tenant_id}:{metric}:{bucket}. SearchService reads for dashboard queries at ~2ms. 86400s TTL covers 24 hours of minute-granularity buckets.

value NUMBER (metric value — rate, count, percentile)count NUMBER (sample count in bucket)bucket STRING (time bucket identifier)

10K metrics x 1440 minutes/day x ~100 bytes = ~1.4 GB working set. LRU eviction with 86400s TTL.

Solution Comparison

Variant	Tier	Latency	Throughput	Cost	Complexity	Reliability
Naive (Direct-to-Database)	T1	2ms ingest, 2-30s search	~500 lines/sec	$200/month (single RDS)	Low — 3 components, no async	99% (single DB, no failover)
Kafka + Indexing Pipeline	T2	<10ms ingest, <5s hot search	100M lines/sec	$15,000/month (Kafka + ES + S3)	Medium — Kafka, ES, workers	99.9% (replicated components)
Multi-Tier with Alerting	T3	<10ms ingest, <1s hot, <30s warm	100M+ lines/sec	$25,000/month (3-tier + alerting)	High — 11 components, 3 tiers	99.9% (replicated, multi-tier)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions

Why Kafka instead of directly writing to Elasticsearch?

Log traffic is extremely bursty — a deployment or incident can spike ingest 10x in seconds. Without Kafka, Elasticsearch bulk indexing would need to absorb the full burst immediately, requiring massive over-provisioning or risking index rejection and data loss. Kafka acts as a deep buffer: agents get sub-5ms ack, and parser workers consume at a sustainable rate regardless of burst size. Kafka also provides replay capability — if Elasticsearch is down for maintenance, no data is lost.

Why tiered storage instead of indexing everything in Elasticsearch?

At 10 PB per day, keeping all data in Elasticsearch would cost approximately $500,000 per day in compute and storage ($2,300/TB/month). S3 storage costs approximately $23/TB/month (Standard) to $0.004/TB/month (Glacier). The 7-day hot tier indexes approximately 70 PB, while the remaining years of data sit in S3 at 100x lower cost. Since 95 percent of searches hit the last 24 hours, the hot tier handles the vast majority of queries.

How does the roll-off process work?

RolloffWorker runs continuously with 20 parallel workers. Each worker queries IndexDB for data older than 7 days, reads the documents, compresses them into Parquet files organized by tenant/date/hour, and writes to S3. After successful S3 upload confirmation, the worker deletes the corresponding Elasticsearch documents. At terabytes per hour, the roll-off keeps the hot tier at a stable 7-day size.

What happens when Elasticsearch is down for maintenance?

Kafka continues accepting log batches from agents — the deep buffer (50M messages) provides hours of retention at normal ingest rates. ParserWorker pauses consumption, and Kafka consumer lag grows. When Elasticsearch comes back online, workers resume processing and work through the backlog. No log data is lost. During the outage, search is unavailable but ingest continues uninterrupted.

Why does this variant lack streaming alerting?

This pipeline indexes logs and pre-computes dashboard metrics but does not implement streaming alerting (e.g., alert if 5xx rate exceeds 1 percent in a 5-minute window). Detecting anomalies requires either polling Elasticsearch periodically (minutes of delay) or monitoring Redis metrics (limited to pre-defined aggregations). The Multi-Tier variant (v2) adds a dedicated AlertEngine that evaluates rules on every parsed log record in real-time, providing sub-5-second alert latency.

Related Templates

Logging Pipeline — Naive (Direct-to-Database)Logging Pipeline — Multi-Tier with Alerting & Distributed Query

Discussion

Ready to design your own Logging Pipeline?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator