The simplest possible search autocomplete architecture: every keystroke triggers a SQL LIKE query against PostgreSQL. No caching, no trie, no precomputation. SELECT ... WHERE query LIKE 'prefix%' ORDER BY popularity DESC LIMIT 10 on every character typed. Demonstrates why in-memory data structures and caching become essential as autocomplete traffic grows.
Search autocomplete (typeahead) is one of the most frequently asked system design interview questions because it tests a candidate's understanding of data structures, latency optimization, caching strategies, and real-time data processing. Companies like Google, Amazon, Yelp, Booking.com, LinkedIn, and Twitter all ask variants of this question because autocomplete is a user-facing feature where latency directly impacts user experience — every millisecond of delay is perceptible because the user is actively typing and expecting instant feedback.
The naive approach uses the simplest possible implementation: a single PostgreSQL database storing all query strings with their popularity counts. Every keystroke the user types triggers a fresh SQL query: SELECT query, popularity FROM queries WHERE query LIKE 'prefix%' ORDER BY popularity DESC LIMIT 10. The B-tree index on the query column supports left-anchored prefix matching efficiently — PostgreSQL can use the index to scan only the lexicographic range starting with the prefix. However, the ORDER BY popularity DESC clause is the hidden cost: PostgreSQL must collect all matching rows, sort them by popularity, and return the top 10. For a short prefix like 's', this may match 50 million rows out of 1 billion total queries.
The fundamental scaling problem is that every character typed generates a separate database round-trip. A user typing 'system design' produces 13 sequential queries: 's', 'sy', 'sys', 'syst', 'syste', 'system', 'system ', 'system d', 'system de', 'system des', 'system desi', 'system desig', 'system design'. Each query holds a database connection for 5-15ms. At 10K concurrent users typing simultaneously, the database handles 100K+ queries per second — each one scanning the B-tree index, fetching matching rows, sorting by popularity, and returning results. The PostgreSQL connection pool (typically 200 max connections) exhausts when connection hold time exceeds the arrival rate.
There is no caching in this approach — not even application-level memoization. The prefix 'how' is queried thousands of times per second by different users, and each instance executes the full LIKE scan independently. There is no precomputation — the ORDER BY popularity sort happens at query time for every request. There is no debouncing — the client fires on every keystroke without waiting for the user to pause. And there is no typo tolerance — a user typing 'sarch' instead of 'search' gets zero suggestions because LIKE 'sarch%' matches nothing.
Popularity counters are updated synchronously: when a user completes a search (selects a suggestion or presses Enter), the service executes UPDATE queries SET popularity = popularity + 1 WHERE query = 'term'. These writes compete with the LIKE reads for database connections and row-level locks. At peak traffic, 1,500 popularity UPDATEs/sec compete with 8,500 LIKE reads/sec for the same 200-connection pool.
This template exists to make the database bottleneck visible and measurable. Run the simulation at increasing RPS and watch connection pool utilization climb. At 5K RPS the system is healthy (60% pool utilization). At 8K RPS, p99 latency spikes as connections queue. At 10K RPS, the pool is exhausted and requests start failing. The comparison with the Trie variant (V1) quantifies the improvement: in-memory trie lookup takes 0.1ms versus 8-15ms for the database LIKE query — an 80-150x speedup with zero database load on the hot path.
Interviewers expect candidates to identify the per-keystroke database query as the primary bottleneck, propose caching or in-memory data structures (trie, sorted prefix lists) as the solution, discuss the trade-off between freshness and performance, and reason about client-side optimizations like debouncing.
The naive search autocomplete system is a four-component architecture: Client, SearchLB (Load Balancer), AutocompleteService, and QueryDB (PostgreSQL). There is no cache, no event stream, no background worker, and no in-memory data structure. Every autocomplete request is a direct database query.
All traffic arrives at SearchLB (AWS ALB), which distributes requests across AutocompleteService pods using round-robin. The load balancer adds approximately 1.5ms of routing latency and can handle up to 20K RPS — well above the system's actual limits, which are constrained by the database. The load balancer is never the bottleneck; the database is.
AutocompleteService is a stateless service with 5 pods, each running 100 threads. On every request, it parses the prefix from the query parameter, opens a database connection from the pool, executes the LIKE query, waits for results, and returns the JSON response. Processing time in the service is approximately 2ms (parsing + JSON serialization); the remaining 6-13ms is database round-trip time. The service has zero in-memory state — it is purely a pass-through to PostgreSQL.
QueryDB is a single PostgreSQL instance (db.r7g.xlarge: 4 vCPU, 32 GB RAM) storing the queries table with approximately 1 billion rows. Each row contains a query string (average 50 bytes), a popularity counter (integer), and timestamps. The table occupies approximately 80GB on disk with the B-tree index on the query column adding another 50GB. The buffer pool (shared_buffers = 8GB) caches frequently accessed index pages, so hot prefixes like 'a', 'the', and 'how' are served from memory. Cold prefixes (rare multi-word queries) require disk I/O, adding 5-20ms of latency variance.
The autocomplete query flow is: Client types a character -> SearchLB routes to a pod -> AutocompleteService acquires a DB connection -> executes SELECT query, popularity FROM queries WHERE query LIKE 'prefix%' ORDER BY popularity DESC LIMIT 10 -> PostgreSQL scans the B-tree index for the prefix range, fetches matching rows, sorts by popularity, returns top 10 -> AutocompleteService returns JSON response -> Client displays suggestions. Total end-to-end latency: 10-25ms at low load, degrading to 50-200ms+ as the connection pool saturates.
Popularity updates follow the same path: Client completes a search -> AutocompleteService executes UPDATE queries SET popularity = popularity + 1 WHERE query = 'term' -> the row-level lock is held for approximately 15ms. If the query does not exist, an INSERT creates it with popularity = 1. These writes share the same connection pool as reads, creating mutual degradation under concurrent load.
This sequence diagram traces the autocomplete flow for a user typing 'sys'. Each of the three characters generates an independent database query. The critical insight is that each query is a full LIKE scan + popularity sort — there is no incremental refinement. The results of 's' are not reused when querying 'sy'; the database starts from scratch. The diagram also shows the popularity UPDATE on search completion, which competes with LIKE reads for the same connection pool.
The second insight is the connection pool as the bottleneck. Each LIKE query holds a connection for 5-15ms. At 10K concurrent users, 100K queries per second need 100K * 0.01s = 1,000 concurrent connections — far exceeding the pool limit of 200.
Step-by-Step Walkthrough
Pseudocode
// AUTOCOMPLETE QUERY — every keystroke
async function autocomplete(prefix: string, limit: number = 10):
// Acquire connection from pool (blocks if pool exhausted)
conn = await pool.acquire() // May wait 0-5000ms under contention
results = await conn.query(
"SELECT query, popularity FROM queries " +
"WHERE query LIKE $1 ORDER BY popularity DESC LIMIT $2",
[prefix + '%', limit]
)
// prefix='s': scans 50M rows, sorts all, returns 10 — ~120ms
// prefix='sys': scans 200K rows, sorts all, returns 10 — ~8ms
conn.release()
return results
// POPULARITY UPDATE — on search completion
async function recordSearch(query: string):
conn = await pool.acquire()
await conn.query(
"INSERT INTO queries (query, popularity) VALUES ($1, 1) " +
"ON CONFLICT (query) DO UPDATE SET popularity = queries.popularity + 1, updated_at = now()",
[query]
) // Row lock ~15ms — contention on hot queries
conn.release()Choice
SELECT ... WHERE query LIKE 'prefix%' ORDER BY popularity DESC LIMIT 10
Rationale
LIKE 'prefix%' with a B-tree index is the simplest possible implementation for prefix matching. The B-tree index is sorted lexicographically, so 'sys%' scans only the range starting with 'sys' — this is efficient for the index lookup phase. The problem is the ORDER BY popularity DESC clause: PostgreSQL must collect all matching rows from the prefix range, then sort them by a different column (popularity) to find the top 10. For short prefixes like 's' matching 50M rows, this sort dominates query time. A composite index on (query, popularity) does not help because the query column is used as a range predicate (LIKE), not an equality predicate.
Choice
Every request hits PostgreSQL directly — no Redis, no application cache
Rationale
Adding a cache (Redis or application-level LRU) for popular prefixes would reduce database load by 80-90%. The prefix 'how' is queried thousands of times per second — caching its results for 5 seconds would eliminate 99.8% of database hits for that prefix. The naive approach deliberately omits caching to make the raw database cost visible. This is the most impactful single optimization a candidate can propose in an interview — cache the top 1,000 prefixes and database load drops from 10K QPS to under 2K QPS.
Choice
UPDATE popularity counter inline on every completed search
Rationale
When a user selects a suggestion, the service immediately increments the popularity counter in PostgreSQL. This is the simplest write model but creates contention: the UPDATE acquires a row-level lock for 15ms, and hot queries like 'weather' or 'amazon' receive hundreds of concurrent UPDATEs per second, creating lock queuing. An async approach (batch UPDATEs via Kafka) would decouple writes from the user path, but the naive approach keeps everything synchronous for simplicity.
Choice
Every keystroke fires a new HTTP request immediately
Rationale
Debouncing (waiting 100-200ms after the last keystroke before sending) would reduce query volume by 50-70%. A user typing 'system' at 150ms per character would fire 6 requests without debouncing versus 1-2 with 200ms debounce. The naive approach sends on every keystroke to illustrate the worst-case database load. Even optimized systems benefit from client-side throttling, but the trie approach's sub-millisecond latency makes it less critical — the server can handle the volume even without debouncing.
Target RPS
~10K sustained (ceiling at database connection pool)
Latency (p99)
8-15ms at low load, 50-200ms+ under contention
Storage
~130 GB (80 GB data + 50 GB B-tree index for 1B queries)
Availability
~99% (single DB instance, no redundancy)
| Operation | Time | Space | Notes |
|---|---|---|---|
| Autocomplete query (LIKE 'prefix%' ORDER BY popularity) | O(log N + M log M) — B-tree lookup + sort M matching rows | O(M) — buffer for sorting M matches | N = 1B total queries, M = matching rows for the prefix. Short prefixes: M can be 50M+ (prefix 's'). Long prefixes: M drops to hundreds. The sort is the bottleneck for short prefixes — PostgreSQL allocates work_mem (typically 4-64 MB) for the sort, spilling to disk if M is very large. |
| Popularity increment (UPDATE ... SET popularity = popularity + 1) | O(log N) — B-tree index lookup + row update | O(1) — single row update | Each UPDATE acquires a row-level lock (~15ms hold time). Hot queries like 'weather' receive hundreds of concurrent UPDATEs, creating lock queuing. Write-ahead log (WAL) write adds ~2ms for durability. |
| Full prefix scan (prefix length = 1 character) | O(N/26) — approximately 1/26th of all rows match a single letter | O(N/26) — all matching rows loaded for sort | The worst case for ORDER BY. Prefix 's' matches ~50M rows. PostgreSQL must sort 50M rows by popularity to find the top 10. This takes 200ms+ and dominates p99 latency. The trie approach pre-computes this sort at build time. |
Stores all unique query strings with their all-time popularity counts. The single table hit by every autocomplete request via LIKE 'prefix%' scan and every completed search via popularity UPDATE. B-tree index on the query column enables efficient prefix range scans. A separate index on popularity could accelerate sorting but PostgreSQL cannot combine a range scan on one index with a sort on another.
Indexes: PK on query_id, UNIQUE idx_queries_text ON (query) — also serves as B-tree for LIKE 'prefix%', idx_queries_popularity ON (popularity DESC) — not used with LIKE range scan
The B-tree index on query is the critical structure. LIKE 'prefix%' uses the index for left-anchored prefix matching — scanning only rows in the lexicographic range [prefix, prefix_next). However, ORDER BY popularity DESC cannot use this index (different sort order than the B-tree), forcing PostgreSQL to fetch all matching rows and sort in memory. At 1B rows, the table is approximately 80 GB with the B-tree index adding 50 GB.
Viral event causes thousands of users to search the same new query simultaneously
Impact
The query does not exist in the database yet, so LIKE returns it with zero popularity (or not at all if not yet INSERTed). Users see irrelevant suggestions. Once users start completing the search, hundreds of concurrent INSERT ON CONFLICT UPDATE popularity statements create lock contention on the same row. The new query's popularity counter lags behind actual demand by minutes.
Mitigation
The V1 trie approach handles this via Kafka-based search log aggregation with sliding time windows. The V2 streaming approach detects the trend within seconds via real-time stream processing and boosts the query immediately.
Database connection pool exhaustion during peak hours
Impact
All autocomplete requests fail with connection timeout errors. Users see empty suggestion lists. The AutocompleteService pods queue requests waiting for connections, eventually hitting their own queue limits. Load balancer health checks fail, and the ALB marks pods as unhealthy.
Mitigation
Increase max_connections from 200 to 500 as an emergency measure (increases memory usage by ~5 MB per connection). Add PgBouncer in transaction mode to multiplex connections. Long-term: add a Redis cache for hot prefixes to reduce database load by 80%.
User types a single character (maximum prefix fan-out)
Impact
LIKE 'a%' matches approximately 40M rows (4% of 1B queries). PostgreSQL must sort 40M rows by popularity to find the top 10. This query takes 200-500ms, consuming a database connection for that entire duration. During peak hours, single-character queries consume a disproportionate share of the connection pool.
Mitigation
Pre-compute and cache single-character results — there are only 36 possible single-character prefixes (a-z, 0-9). Cache these in Redis with a 30-second TTL. This eliminates the most expensive queries entirely. The trie approach pre-computes all prefix results at build time.
B-tree index becomes fragmented after millions of INSERT/UPDATE operations
Impact
Index bloat increases the on-disk size from 50 GB to 80+ GB. Index scans slow down as pages are no longer contiguous. Buffer pool hit rate drops because the inflated index no longer fits in shared_buffers. LIKE query latency increases by 30-50%.
Mitigation
Run REINDEX CONCURRENTLY on the query column index during low-traffic windows. Enable autovacuum with aggressive settings (autovacuum_vacuum_scale_factor = 0.01) to prevent dead tuple accumulation. Monitor pg_stat_user_indexes for index efficiency.
| Component | Failure | Impact | Mitigation |
|---|---|---|---|
| QueryDB (PostgreSQL) | Connection pool exhaustion from concurrent LIKE queries | All autocomplete requests fail — users see empty suggestion lists. Popularity UPDATEs also fail, so search completion data is lost. Total feature outage because the single database is the only data path. | PgBouncer connection pooling in transaction mode. Increase max_connections as stopgap. Long-term: add Redis cache to eliminate 80% of database queries. |
| AutocompleteService | Thread starvation from slow LIKE queries blocking all threads | All 500 threads (5 pods x 100 threads) occupied by queries waiting for database connections. New requests queue at the load balancer, causing ALB to return 503 errors. Health checks fail and pods are marked unhealthy. | Set query timeout (statement_timeout = 500ms) to prevent slow single-character queries from holding threads indefinitely. Use circuit breaker pattern — if database error rate exceeds 50%, fail fast with cached stale results. |
| SearchLB (Load Balancer) | All backend health checks fail simultaneously | ALB returns 502 Bad Gateway for all requests. Complete autocomplete outage. Search still works (separate system), but users get no suggestions. | Multi-AZ deployment with at least 2 pods per AZ. Configure health check threshold: 3 consecutive failures before marking unhealthy. Return degraded response (empty suggestions) instead of 500 error on database failure. |
Key metrics: (1) LIKE query latency p50/p99 — primary performance indicator. Alert at p99 > 30ms, critical at p99 > 100ms. (2) PostgreSQL active connections — alert at 70% of max (140/200), critical at 85% (170/200). (3) Connection wait time — time spent waiting for a free connection. Alert if mean > 5ms. (4) Queries per second by prefix length — single-character prefixes are 10-100x more expensive than 4+ character prefixes. (5) Popularity UPDATE contention — row lock wait time on hot queries. Alert if p99 lock wait > 50ms. (6) Buffer pool hit rate — should be > 95%. Drop below 90% indicates index bloat or working set exceeding shared_buffers. Dashboard: Grafana with panels for LIKE query latency histogram, connection pool utilization, QPS by prefix length, and PostgreSQL buffer pool hit rate.
At low traffic (< 5K RPS): PostgreSQL db.r7g.xlarge (~$350/month), ECS Fargate 5 pods ($350/month), ALB (~$30/month). Total: ~$730/month. This is the cheapest variant but collapses beyond 10K RPS. Scaling vertically to db.r7g.2xlarge ($700/month) extends the ceiling to ~15K RPS but does not solve the fundamental per-keystroke-DB-query problem. The V1 Trie variant at 200K RPS costs approximately $2,500/month but handles 20x the traffic — the per-request cost drops from $0.0001 to $0.00001 as you scale beyond the naive approach's ceiling.
Query logging: search queries may contain sensitive information (medical conditions, financial queries, personal names). Logs must be anonymized or access-controlled. SQL injection: the LIKE pattern must be parameterized — never concatenate user input into the SQL string. A malicious prefix like '%'; DROP TABLE queries;--' would be catastrophic without parameterized queries. Rate limiting: per-IP and per-user rate limiting to prevent dictionary enumeration attacks (systematically querying all prefixes to extract the full query corpus). PII in autocomplete: ensure suggestions do not leak other users' personal queries — the queries table should only contain anonymized, aggregated query strings, not per-user search history.
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| V0: Naive (Database LIKE Query) | T1 | 8-15ms query, 50-200ms+ under load | ~10K RPS | $730/month | Low | 99% (single DB) |
| V1: Trie + Top-K (In-Memory Trie) | T2 | <1ms trie + 2ms trending merge | 200K RPS | $2,500/month | Medium | 99.9% (multi-AZ) |
| V2: Streaming Trending + ML Ranking | T3 | <1ms trie + 5-10ms ML scoring | 500K RPS | $8,000/month | Very High | 99.95% (multi-AZ, ML fallback) |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
Search autocomplete combines five core distributed systems challenges: (1) data structure selection — trie vs hash map vs sorted set for prefix matching, (2) latency optimization — every millisecond is perceptible because the user is actively typing, (3) caching strategy — high hit rate due to Zipfian prefix distribution (popular prefixes like 'how', 'what', 'why' dominate), (4) real-time data processing — trending queries must surface quickly, and (5) scale — 500K+ RPS at Google/Amazon scale. Google, Amazon, Yelp, Booking.com, LinkedIn, Twitter, and Netflix all ask this question because autocomplete is their core product feature.
Two reasons: connection pool exhaustion and sort overhead. At 10K concurrent users each typing 10 characters, the database receives 100K queries per second. Each query holds a connection for 5-15ms. With a 200-connection pool, the maximum sustainable QPS is approximately 200 / 0.01 = 20K — and that assumes zero write contention. The sort overhead compounds the problem: short prefixes like 's' match millions of rows, and PostgreSQL must sort all of them by popularity to find the top 10. A B-tree index on (query) only accelerates the prefix scan, not the popularity sort.
Add a Redis cache for the top 10K most queried prefixes with a 5-10 second TTL. Autocomplete traffic follows a Zipfian distribution — the top 1% of prefixes account for approximately 60% of all queries. Caching these hot prefixes in Redis (sub-millisecond lookups) reduces database load by 60% immediately. Combined with client-side debouncing (200ms), total database load drops from 10K QPS to under 2K QPS. This buys significant headroom without any architectural change.
Elasticsearch with a completion suggester or edge-ngram tokenizer would provide faster prefix matching than SQL LIKE, plus fuzzy matching and relevance scoring. However, Elasticsearch adds significant operational complexity (cluster management, shard rebalancing, index mapping) for a naive approach. PostgreSQL LIKE is simpler to reason about for an interview and makes the O(N) bottleneck more explicit. In production, many companies do use Elasticsearch for autocomplete — but the optimal solution is an in-memory trie (V1 approach) that avoids network round-trips entirely.
It does not. LIKE 'sarch%' returns zero results because there is no query in the database starting with 'sarch'. B-tree prefix matching is exact — there is no edit distance computation, no phonetic matching, and no fuzzy logic. A user who misspells a prefix sees an empty suggestion list and must manually correct their input. The V2 streaming variant addresses this with ML-based scoring that can compute edit distance and semantic similarity between the typed prefix and stored queries.
Sign in to join the discussion.
Ready to design your own Search Autocomplete?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator