Full production-grade search autocomplete with sharded Redis trie for sub-millisecond lookups, real-time trending detection via Kafka stream processing, ML-based re-ranking using user context and history, and per-user personalization. The architecture used by Google, Amazon, and LinkedIn at scale.
The streaming trending + ML ranking approach represents the state of the art for search autocomplete systems at companies like Google, Amazon, LinkedIn, and Twitter. It builds on the trie-based approach (V1) by adding three critical capabilities: real-time trending detection, ML-based personalized re-ranking, and per-user search history personalization.
The first limitation of the V1 approach is trending latency. The trie is rebuilt every 5 minutes, so a brand-new query — breaking news, a viral meme, a product launch — takes up to 5 minutes to appear in suggestions. The Redis trending boost layer in V1 reduces this to 1-2 minutes, but for truly real-time trending (surface within seconds), a streaming architecture is required. TrendAggregator consumes search logs from Kafka in real-time, maintains sliding-window counters at 10-second, 1-minute, and 5-minute granularities, and pushes trending boosts to TrieCache (Redis) as soon as a trend is detected. Detection threshold: a prefix's 10-second count exceeding 2x its 1-hour baseline. This surfaces breaking queries within 10-30 seconds of the trend starting.
The second limitation is personalization. The V1 trie returns the same suggestions for everyone — global popularity ranking only. A developer typing 'react' should see 'react hooks tutorial' while a chemistry student should see 'reaction kinetics formula.' The MLScorer addresses this by re-ranking the top-100 trie candidates using a gradient-boosted decision tree (GBDT) model with features: (1) global popularity score from trie, (2) trending boost score from Redis, (3) user search history similarity — cosine similarity between candidate and user's last 20 searches, (4) geographic relevance — does the candidate have local intent matching the user's location, (5) time-of-day signal — some queries are diurnal (e.g., 'breakfast restaurants' peaks in the morning), (6) device type — mobile users prefer shorter suggestions. The GBDT model inference takes ~2ms for 100 candidates — fast enough for inline scoring.
The third limitation is per-user context. The ML scorer needs user features to personalize. PersonalizationCache (Redis) stores each user's last 20 completed searches with a 30-day TTL. When a user types a prefix, SuggestionService fetches their history in parallel with the trie lookup — both are sub-millisecond Redis operations. The history is passed to the MLScorer as a feature vector. For anonymous users (no user_id), the ML scorer falls back to global ranking (equivalent to V1 behavior).
The trie itself is stored in a sharded Redis cluster rather than in-memory per pod. At 1B queries with top-100 per node, the serialized trie is approximately 40-60 GB — too large for a single pod's memory. Sharding across 8 Redis nodes (each holding ~8 GB of trie data) distributes the memory requirement. The trade-off is a network hop per lookup (~1ms) versus zero-network in-memory traversal (<0.1ms in V1). At 500K RPS, the 1ms Redis cost is acceptable and buys unlimited horizontal scaling — adding Redis nodes extends the trie capacity without changing the service code.
The architecture uses 10 components: Client, ApiGateway, SuggestionLB, SuggestionService (30 pods), TrieCache (Redis, 8 shards), PersonalizationCache (Redis, 4 nodes), MLScorer (20 workers), SearchLogStream (Kafka, 128 partitions), TrendAggregator (30 workers), TrieRebuilder (5 workers), and QueryDB (DynamoDB). The total infrastructure cost is approximately $8,000/month — 3x the V1 approach — justified by personalized suggestions that improve search click-through rate by 15-25% (a direct revenue driver for search-dependent businesses).
Interviewers asking about the advanced variant expect candidates to discuss real-time stream processing for trend detection, ML feature engineering for search ranking, the trade-off between personalization quality and latency, and the operational complexity of a 10-component system with multiple failure modes.
The streaming trending + ML ranking system uses ten main components organized into four layers: the request path (Client, ApiGateway, SuggestionLB, SuggestionService), the data stores (TrieCache/Redis sharded, PersonalizationCache/Redis, QueryDB/DynamoDB), the ML layer (MLScorer), and the async pipeline (SearchLogStream/Kafka, TrendAggregator, TrieRebuilder).
The request path handles 500K autocomplete requests per second at peak. Client sends GET /api/v2/autocomplete?q=prefix&limit=10&user_id=abc. ApiGateway authenticates, rate-limits at 600K RPS (headroom for traffic spikes), and routes to SuggestionLB. SuggestionLB uses least-connections algorithm — this accounts for variable ML scorer latency. SuggestionService (30 pods x 300 threads) orchestrates the three-phase lookup.
Phase 1 (trie lookup): SuggestionService performs HGET on the sharded TrieCache to retrieve the top-100 candidates for the given prefix. The trie is sharded across 8 Redis nodes by consistent hash on the prefix. Sub-millisecond lookup. This phase also fetches any trending boost from TrieCache (separate key pattern 'trend:{prefix}'). Phase 2 (personalization): In parallel with Phase 1, SuggestionService fetches the user's search history from PersonalizationCache via LRANGE user:{user_id}:history 0 19. This returns the last 20 searches. Cache miss (anonymous user) is a no-op. Phase 3 (ML re-ranking): SuggestionService sends the 100 candidates plus user context (history, location, device, time) to MLScorer via gRPC. MLScorer runs the GBDT model (~2ms inference), scores each candidate, and returns the re-ranked top-10. Total Phase 3 latency: 5-10ms including gRPC overhead.
The fallback path is critical for reliability. If MLScorer latency exceeds 15ms or the scorer is unhealthy (circuit breaker open), SuggestionService returns the static trie top-10 without ML re-ranking. This degrades quality (no personalization) but preserves latency (< 5ms total). If TrieCache is down (catastrophic Redis failure), the system returns empty suggestions — there is no in-memory fallback because the trie is too large for individual pods.
The async pipeline has two consumers: TrendAggregator and TrieRebuilder, both consuming from SearchLogStream (Kafka, 128 partitions). TrendAggregator runs 30 workers, each handling ~4 partitions. It maintains in-memory sliding-window counters and detects trends by comparing short-window counts against long-window baselines. When a trend is detected, it writes a boost to TrieCache — visible to the next autocomplete request. TrendAggregator also batch-writes updated counts to QueryDB every 30 seconds for long-term persistence. TrieRebuilder runs 5 workers on a 10-minute cron. It reads all counts from DynamoDB, builds the trie with top-100 per node, and bulk-writes to TrieCache using shadow-key-swap for zero-downtime updates.
PersonalizationCache handles write-behind updates: when a user completes a search (POST /api/v2/search/log), SuggestionService both publishes to Kafka (for trending) and writes to PersonalizationCache (LPUSH + LTRIM to maintain last 20). The 30-day TTL auto-expires inactive users, keeping storage bounded at approximately 20 GB for 100M users.
This sequence diagram traces the three-phase autocomplete flow: trie lookup, personalization fetch, and ML re-ranking. Phases 1 and 2 execute in parallel (both are sub-millisecond Redis operations), and Phase 3 (ML scoring) processes the combined results. The fallback path is also shown — if the ML scorer is slow or unavailable, the system returns static trie results.
The async pipeline shows the real-time trending detection flow: search logs published to Kafka, consumed by TrendAggregator with sliding-window counters, and trending boosts pushed to TrieCache within seconds of detection.
Step-by-Step Walkthrough
Pseudocode
// THREE-PHASE AUTOCOMPLETE LOOKUP
async function autocomplete(prefix: string, userId?: string, location?: LatLng):
// Phase 1 + Phase 2: parallel Redis lookups (<1ms each)
[trieResult, trendingBoost, userHistory] = await Promise.all([
trieCache.hget(`trie:${shard(prefix)}:${prefix}`), // Top-100 candidates
trieCache.get(`trend:${prefix}`), // Trending boost
userId ? personalizationCache.lrange(`user:${userId}:history`, 0, 19) : []
])
if (!trieResult) return [] // No suggestions for this prefix
candidates = deserialize(trieResult) // 100 candidates with scores
// Phase 3: ML re-ranking (5-10ms)
try {
features = {
candidates,
trending: trendingBoost,
history: userHistory,
location,
timeOfDay: new Date().getHours(),
deviceType: detectDevice()
}
ranked = await mlScorer.score(features) // gRPC call, 2ms inference
return ranked.slice(0, 10)
} catch (err) {
// Fallback: circuit breaker open or ML timeout
// Return static trie results (V1 behavior)
return candidates.slice(0, 10).map(c => c.query)
}
// TREND DETECTION — TrendAggregator (real-time Kafka consumer)
function onSearchLogged(event: SearchEvent):
prefix = event.query.substring(0, 3) // Track 3-char prefixes
counters[prefix].window_10s.increment()
counters[prefix].window_1min.increment()
counters[prefix].window_5min.increment()
// Check trending threshold every 10 seconds
if counters[prefix].window_10s.count > 2 * counters[prefix].baseline_per_10s:
trieCache.set(`trend:${prefix}`, {
boost_score: counters[prefix].window_10s.count / counters[prefix].baseline_per_10s,
trending_queries: getTopQueriesForPrefix(prefix),
detected_at: new Date().toISOString()
}, { ttl: 600 }) // Auto-expire in 10 minutesChoice
Trie stored in a sharded 8-node Redis cluster, not in pod memory
Rationale
The V1 approach stores the full trie (10 GB for top-10 per node) in each pod's memory. The V2 approach stores top-100 per node, inflating the trie to 40-60 GB — too large for individual pods. Sharding across 8 Redis nodes distributes this storage. The cost is a 1ms network hop per lookup (vs <0.1ms in-memory), but this enables the ML scorer to receive 100 candidates instead of 10, dramatically improving re-ranking quality. The Redis cluster also provides persistence and replication, eliminating the pod-restart cold-start problem from V1.
Choice
GBDT model with 6 features, 2ms inference, inline scoring
Rationale
Deep learning models (transformers, BERT) provide superior relevance but require 50-100ms inference — too slow for inline autocomplete scoring. GBDT models inference in approximately 2ms for 100 candidates, fitting within the 100ms p99 SLO. The 6 features (popularity, trending, history similarity, geo-relevance, time-of-day, device type) capture the most impactful personalization signals. The model is trained offline on click-through data (did the user select the suggestion?) and deployed as a lightweight binary — no GPU required.
Choice
10-second, 1-minute, 5-minute sliding windows with 2x baseline threshold
Rationale
The V1 approach detects trends on a 5-minute batch cycle. For breaking events (earthquake, election result, product launch), users start searching immediately and expect relevant suggestions. TrendAggregator's 10-second sliding window detects surges within 10-30 seconds. The 2x threshold (10-second count > 2x the per-10-second average from the 1-hour window) balances sensitivity and false positive rate. Lower thresholds (1.5x) generate too many false trends from normal traffic variance.
Choice
Dedicated Redis cluster for user history, separate from trie cache
Rationale
Trie data is query-centric (keyed by prefix, read on every request), while personalization data is user-centric (keyed by user_id, read only for logged-in users). Mixing them creates hot-key imbalance: prefix 'how' might get 1M reads/sec while individual user keys get 1 read/session. Separate clusters allow independent scaling, eviction policies (trie uses LRU, personalization uses LFU), and failure domains — TrieCache failure breaks autocomplete, but PersonalizationCache failure only degrades personalization (fallback to global ranking).
Choice
Write new trie to shadow keys, RENAME atomically
Rationale
TrieRebuilder writes new trie data to keys prefixed with 'trie_new:{shard}:{prefix}' while the live trie uses 'trie:{shard}:{prefix}'. When the rebuild completes, all shadow keys are RENAMEd to live keys atomically (Redis RENAME is O(1)). This ensures SuggestionService never reads a partially-rebuilt trie. The trade-off is temporary double memory usage during the swap window (~10 seconds), requiring the Redis cluster to have 2x headroom.
Choice
gRPC with Protobuf instead of REST with JSON
Rationale
MLScorer is called on every autocomplete request (400K/sec). gRPC with Protobuf reduces serialization overhead by approximately 60% compared to JSON. HTTP/2 connection multiplexing enables high throughput over fewer connections — 30 SuggestionService pods each open 4 gRPC channels to 20 MLScorer workers, totaling 120 multiplexed streams instead of 120K HTTP/1.1 connections. The 3ms round-trip includes serialization, network, and deserialization — JSON REST would add ~3ms from serialization alone.
Target RPS
500K peak (30 pods x ~17K RPS each)
Latency (p99)
<1ms trie + <1ms personalization + 5-10ms ML scoring = 15-25ms total
Storage
~60 GB trie (Redis), ~20 GB personalization (Redis), ~100 GB counts (DynamoDB)
Availability
99.95% (multi-AZ, ML fallback path, degraded-mode serving)
| Operation | Time | Space | Notes |
|---|---|---|---|
| Trie lookup (Redis HGET) | O(1) — Redis hash table lookup by prefix key | O(K) — K=100 candidates returned | Sub-millisecond. The Redis shard is selected by consistent hash on the prefix. HGET is O(1) amortized. Payload: ~5 KB for 100 suggestions. |
| ML scoring (GBDT inference) | O(K * D) — K candidates, D tree depth (typically 8-12) | O(K) — feature vectors for K candidates | K=100, D=10 average tree depth, 200 trees in ensemble. Total: 100 * 10 * 200 = 200K comparisons. GBDT is optimized for batch inference — 100 candidates scored in ~2ms on a single CPU core. |
| Trend detection (TrendAggregator) | O(1) per message — increment counter in hash map | O(Q) — Q unique query prefixes in sliding window | Each search log message increments 3 counters (10s, 1min, 5min window). Q is approximately 10M unique prefixes. Hash map: ~500 MB per worker. 30 workers total, each handling ~4 Kafka partitions. |
| Trie rebuild (TrieRebuilder) | O(N * L) — N queries, L average prefix length | O(N * L) — full trie in memory during build | 1B queries, L=5 average. Build takes approximately 5 minutes on 5 parallel workers. Shadow-key-swap to Redis adds approximately 2 minutes for bulk write pipeline. Total rebuild cycle: ~7 minutes within the 10-minute schedule. |
Sharded Redis hash storing top-100 suggestions per prefix. Each entry is a serialized list of (query_string, popularity_score) pairs. 8 Redis nodes, each holding ~8 GB of trie data. Written by TrieRebuilder every 10 minutes via shadow-key-swap. Read by SuggestionService on every autocomplete request.
Indexes: Redis hash table — O(1) key lookup by prefix, Consistent hash ring for shard selection by prefix
At 1B queries with average prefix depth of 5, there are approximately 500M unique prefix entries. Distributed across 8 shards: ~62M entries per shard. Each entry averages ~100 bytes (100 suggestions compressed with string interning), totaling ~6-8 GB per shard.
Redis list storing each user's last 20 completed searches. Written on every search completion (LPUSH + LTRIM). Read on every autocomplete request for logged-in users. 30-day TTL auto-expires inactive users.
Indexes: Redis hash table — O(1) key lookup by user_id
100M users x ~200 bytes per user (20 queries x 10 bytes avg) = ~20 GB total. LFU eviction policy ensures frequently active users stay cached while dormant users are evicted. 85% cache hit rate for logged-in users.
Redis hash entries storing real-time trending boost scores per prefix. Written by TrendAggregator within seconds of trend detection. Read by SuggestionService and passed to MLScorer as a feature. TTL 600s — trends auto-expire if not sustained.
Indexes: Redis hash table — O(1) key lookup
Approximately 10K-100K active trending entries at any time. Memory footprint: ~100 MB. Updated in real-time by TrendAggregator — a new trend is visible to the next autocomplete request within seconds.
Persistent store for aggregated query counts across multiple time windows. Written by TrendAggregator in 30-second batches. Read by TrieRebuilder via full scan every 10 minutes. On-Demand capacity absorbs rebuild spikes.
Indexes: Primary key: (prefix, query)
1B items x ~120 bytes = ~120 GB. The 2-character prefix partition key distributes data across ~1,296 DynamoDB partitions. On-Demand capacity auto-scales for periodic full-scan reads during trie rebuild.
Emitted by SuggestionService on every completed search. Two consumer groups: TrendAggregator (real-time trend detection) and TrieRebuilder (periodic count aggregation). Partitioned by query_prefix for processing locality. 128 partitions for 500K peak throughput.
Key Schema
query_prefix (string — first 2 characters of query)
Value Schema
{ query: string, user_id?: string, location?: string, timestamp: string }
Major breaking news event — 100K users search the same query within 30 seconds
Impact
TrendAggregator detects the surge within 10 seconds (10-second sliding window count exceeds 2x baseline). Trending boost is written to TrieCache within 15 seconds. The next autocomplete request for that prefix returns the trending query in the top-10. Users see relevant suggestions within 15-30 seconds of the event — compared to 5+ minutes in V1.
Mitigation
Already handled by design. The 10-second detection window is tuned for exactly this scenario. If the event is so large that Kafka consumer lag spikes, TrendAggregator workers auto-scale based on lag metric.
ML scorer fleet goes down completely (all 20 workers crash)
Impact
Circuit breaker opens on SuggestionService within 5 seconds (after 5 consecutive gRPC failures). All autocomplete requests fall back to static trie results from Redis (no ML re-ranking, no personalization). Quality degrades to V1 level — global popularity ranking only. Latency actually improves (skip the 5-10ms ML call). Users do not see errors — they see slightly less relevant suggestions.
Mitigation
Auto-scaling replaces crashed workers within 2-3 minutes (ECS Fargate). Circuit breaker retries every 30 seconds. On recovery, ML scoring resumes automatically. Monitor ML scorer fleet health with >1 worker healthy per AZ.
PersonalizationCache Redis cluster loses a node (1 of 4)
Impact
25% of user history lookups fail — those users receive non-personalized suggestions (global ranking). The ML scorer still runs but with an empty history feature, producing V1-equivalent quality for affected users. No errors visible to users — just slightly less relevant suggestions.
Mitigation
Redis Cluster automatic failover promotes a replica within 15-30 seconds. During failover, SuggestionService catches the Redis exception and passes an empty history to MLScorer (graceful degradation). 30-day TTL means no data loss for the affected shard — history rebuilds naturally as users continue searching.
Coordinated bot attack sends 50K queries/sec for a spam term to trigger false trending
Impact
TrendAggregator detects the spike and writes a trending boost for the spam term. Autocomplete users see the spam term in their suggestions — a reputation and trust violation.
Mitigation
Three layers: (1) ApiGateway rate-limits per-IP at 100 req/sec — a single bot IP is throttled immediately. (2) TrendAggregator minimum threshold: a prefix must receive queries from at least 1,000 unique user_ids in 10 seconds to qualify as trending (not just raw query volume). (3) Spam blocklist: known spam patterns are checked before boosting. Post-incident: add the spam term to the blocklist and flush the trending boost from TrieCache.
| Component | Failure | Impact | Mitigation |
|---|---|---|---|
| TrieCache (Redis — 8-node sharded cluster) | Single shard node failure | Prefixes hashed to the failed shard return empty results — approximately 12.5% of autocomplete requests return no suggestions. Other shards are unaffected. Users notice missing suggestions for specific prefixes (e.g., all prefixes starting with 'sys' if that shard fails). | Redis Cluster with 1 replica per shard. Automatic failover promotes replica within 15-30 seconds. During failover, SuggestionService retries with exponential backoff (50ms, 100ms, 200ms). After 3 retries, return empty suggestions for affected prefixes. |
| MLScorer (20 gRPC workers) | Latency spike from model serving issues (cold start, GC pause) | ML scoring p99 latency exceeds 15ms. Circuit breaker opens on SuggestionService. Autocomplete falls back to static trie results. Quality degrades but latency SLO is preserved. The fallback is invisible to users — they see slightly less personalized suggestions. | Pre-warm MLScorer workers (load model on startup, run inference on test data before accepting traffic). Use connection pooling on gRPC channels. Auto-scale based on p99 latency: if > 10ms, add workers. |
| SearchLogStream (Kafka — 128 partitions) | Kafka cluster unavailable | Search logs are lost — TrendAggregator stops detecting new trends, and TrieRebuilder stops receiving new data. Existing trie and trending boosts continue serving (stale but correct). Personalization continues working (writes go directly to PersonalizationCache, not through Kafka). Autocomplete hot path is unaffected. | Kafka multi-AZ deployment with replication factor 3. If Kafka is completely unavailable, SuggestionService buffers logs locally (10K messages per pod) and replays on recovery. TrendAggregator switches to degraded mode — serves from existing trending boosts without detecting new trends. |
| TrendAggregator (30 workers) | Consumer lag exceeds 5 minutes | Trending detection is delayed — new trends take 5+ minutes to surface instead of 10-30 seconds. The system degrades to V1-equivalent trending behavior. Autocomplete hot path is unaffected — existing trending boosts in Redis continue serving until TTL expiration (600s). | Auto-scale workers based on Kafka consumer lag metric. Add workers when lag > 60 seconds. Increase Kafka partition count from 128 to 256 to enable more parallel consumers. Alert at lag > 2 minutes, critical at lag > 5 minutes. |
Key metrics: (1) End-to-end autocomplete latency p50/p99 — should be <25ms. Alert at p99 > 60ms, critical at p99 > 100ms (SLO breach). (2) ML scorer latency p50/p99 — should be <5ms. Alert at p99 > 10ms, circuit breaker opens at p99 > 15ms. (3) Trend detection latency — time from query surge to trending boost written to Redis. Should be <30 seconds. Alert at >60 seconds. (4) Personalization cache hit rate — expected 85% for logged-in users. Drop below 70% indicates cache eviction issues. (5) TrieCache shard health — all 8 shards healthy. Alert on any shard failure. (6) Kafka consumer lag — TrendAggregator should be <30 seconds. Alert at 2 minutes. TrieRebuilder can tolerate higher lag (batch consumer). (7) ML fallback rate — percentage of requests served without ML re-ranking (circuit breaker open). Should be <1%. Alert at >5%. Dashboard: Grafana with panels for end-to-end latency histogram, ML scorer latency, trending detection timeline, Kafka consumer lag, Redis shard health, and personalization cache hit rate.
At 500K RPS peak: SuggestionService 30 pods (4 vCPU, 8 GB) ~$2,200/month, TrieCache Redis (8 nodes, 64 GB total) ~$2,400/month, PersonalizationCache Redis (4 nodes, 26 GB total) ~$800/month, MLScorer 20 workers (4 vCPU, 8 GB) ~$1,500/month, QueryDB DynamoDB On-Demand ~$600/month, Kafka MSK (6 brokers, 128 partitions) ~$1,200/month, ApiGateway ~$300/month, TrendAggregator 30 workers ~$600/month, TrieRebuilder 5 workers ~$100/month. Total: ~$9,700/month. The dominant costs are the Redis clusters (TrieCache + PersonalizationCache = $3,200/month) and the SuggestionService pods ($2,200/month). The ML investment ($1,500/month for MLScorer) is justified by a 15-25% improvement in search click-through rate, which translates to millions in revenue for search-dependent businesses.
User privacy: PersonalizationCache stores per-user search history — GDPR requires right to erasure (DELETE user:{user_id}:history on request), data export (LRANGE full history), and purpose limitation (history used only for autocomplete personalization, not sold to advertisers). Search history must be encrypted at rest (Redis encryption) and in transit (TLS). ML model privacy: the GBDT model is trained on aggregated click-through data, not individual user sessions — no user-level data is embedded in the model weights. Trending manipulation: rate limiting per-IP and per-user prevents bot-driven trending attacks. Blocklist of offensive/spam terms prevents harmful content from being boosted into suggestions. Autocomplete suggestions must never expose PII from other users' searches — all suggestions come from anonymized, aggregated query popularity data.
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| V0: Naive (Database LIKE Query) | T1 | 8-15ms query, 50-200ms+ under load | ~10K RPS | $730/month | Low | 99% (single DB) |
| V1: Trie + Top-K (In-Memory Trie) | T2 | <1ms trie + 2ms trending merge | 200K RPS | $2,500/month | Medium | 99.9% (multi-AZ) |
| V2: Streaming Trending + ML Ranking | T3 | <1ms trie + 5-10ms ML scoring | 500K RPS | $8,000/month | Very High | 99.95% (multi-AZ, ML fallback) |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
For users with no PersonalizationCache entry (new users or anonymous), the ML scorer receives an empty history feature vector. The model is trained with this case explicitly — approximately 30% of training data has empty history. Without history, the model falls back to popularity, trending, geo-relevance, and time-of-day features. This produces the same quality as V1 (global popularity ranking) for cold-start users, with personalization improving as the user accumulates search history.
SuggestionService has a circuit breaker on the MLScorer gRPC call. After 5 consecutive failures or if p99 latency exceeds 15ms, the circuit opens and SuggestionService returns static trie results (no ML re-ranking) for 30 seconds before retrying. This fallback path is the same quality as V1 — personalization degrades but autocomplete continues working. The circuit breaker prevents cascading failures from MLScorer to the hot path.
The GBDT model is trained offline on click-through data: for each autocomplete session, the training data records which suggestion (if any) the user clicked. Positive labels are clicked suggestions; negative labels are displayed but not clicked. Features are computed from the trie (popularity, trending score), user history (cosine similarity), and context (location, time, device). The model is retrained weekly on the latest 30 days of click data and deployed as a binary artifact. A/B testing validates that the new model improves click-through rate before full rollout.
Three layers of defense: (1) ApiGateway rate-limits per-IP and per-user — bots sending more than 100 requests/second are blocked. (2) TrendAggregator applies a minimum absolute threshold — a prefix must receive at least 500 queries in 10 seconds to be considered trending, regardless of the ratio to baseline. This filters out low-volume statistical noise. (3) A blocklist of known spam prefixes (adult content, phishing terms) is checked before boosting. Despite these defenses, sophisticated bot attacks can still trigger false trends — this is an ongoing adversarial game at all major search platforms.
Diminishing returns. The GBDT model's ability to surface relevant results decreases beyond position 100 — the popularity gap between position 100 and position 1000 is so large that even a heavily personalized signal rarely promotes a result from position 1000 to the top 10. Top-100 provides sufficient diversity for personalization while keeping the Redis payload manageable (~5 KB per prefix lookup versus ~50 KB for top-1000). At 500K lookups/sec, this difference is 2.5 GB/sec versus 25 GB/sec of Redis network traffic.
Sign in to join the discussion.
Ready to design your own Search Autocomplete?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator