1A team is building a system with complex, unpredictable query requirements and moderate scale (1TB, 5K TPS). Which database approach is most appropriate?
The SQL vs NoSQL decision is one of the most consequential choices in system design, affecting data modeling flexibility, query capabilities, scalability patterns, and operational complexity. This deep dive provides a decision framework based on access patterns, consistency requirements, and scaling needs rather than hype or fashion.
The SQL vs NoSQL debate has matured significantly since the NoSQL movement of the late 2000s. The original NoSQL pitch -- that relational databases cannot scale -- has been disproven by systems like Google Spanner, CockroachDB, and Vitess-sharded MySQL, which scale relationally across thousands of nodes. Meanwhile, NoSQL databases have added features that were once exclusively relational: MongoDB has multi-document ACID transactions, DynamoDB has PartiQL (SQL-like queries), and Cassandra has materialized views. The lines are blurring, but fundamental architectural trade-offs remain.
The core trade-off is between query flexibility and access pattern optimization. SQL databases store data in normalized, relational tables and use a query optimizer to efficiently answer arbitrary queries (joins, aggregations, subqueries). You can add a new query without changing the schema or data layout. NoSQL databases (particularly key-value and wide-column stores) optimize for specific access patterns by denormalizing data and co-locating related records. Queries against the optimized pattern are extremely fast, but queries outside that pattern are slow or impossible without restructuring data.
Schema flexibility is another axis. SQL databases enforce schemas at write time (schema-on-write), catching data quality issues early but making schema evolution expensive for large tables (ALTER TABLE locks, data migrations). NoSQL databases enforce schemas at read time (schema-on-read) or not at all, making it easy to store heterogeneous data and evolve schemas, but pushing data validation responsibility to application code. In practice, most mature NoSQL applications end up with informal schemas enforced in application code -- the schema exists; it is just not in the database.
The scalability trade-off is also more nuanced than commonly presented. Single-node PostgreSQL handles millions of rows and thousands of transactions per second -- many applications never outgrow this. When you do need horizontal scaling, SQL databases offer read replicas, partitioning, and distributed SQL (Spanner, CockroachDB). NoSQL databases offer easier horizontal scaling via hash partitioning, but this ease comes at the cost of cross-partition operations. The question is not 'can SQL scale?' but 'what is the operational cost of scaling SQL to my requirements vs NoSQL?'
User Orders: Two Data Models Compared
Consider a system storing users and their orders. In SQL (PostgreSQL): a 'users' table and an 'orders' table with a foreign key user_id. To get a user with their orders: SELECT * FROM users JOIN orders ON users.id = orders.user_id WHERE users.id = 123. To find orders by date range across all users: SELECT * FROM orders WHERE created_at BETWEEN '2026-01-01' AND '2026-06-01'. Both queries are efficient with proper indexes. In NoSQL (DynamoDB): a single table with partition key user_id and sort key order_id. Getting a user's orders is a single partition query (extremely fast). But finding orders by date range across all users requires a Global Secondary Index or a full scan -- the data is not organized for that access pattern. If this cross-user query is important, you might need to duplicate data into a second table keyed by date. The SQL approach handles both patterns; the NoSQL approach optimizes one pattern and pays for the other.
Uber
Uber migrated from PostgreSQL to a custom Schemaless (MySQL-backed) NoSQL layer for their trip and marketplace data, citing the need for horizontal scalability, flexible schemas for rapidly evolving data models, and multi-datacenter replication. However, they kept PostgreSQL for financial ledger data where ACID transactions and complex queries (reconciliation, reporting) were essential. This polyglot approach -- NoSQL for high-scale operational data, SQL for transactional financial data -- is representative of mature large-scale architectures.
Airbnb
Airbnb uses MySQL (via Vitess sharding) for their core booking and payments data, where ACID transactions and relational queries are critical. They use DynamoDB for session management and user activity tracking (high write volume, simple key-value access patterns). Search uses Elasticsearch. This architecture shows that even at massive scale, SQL remains the right choice for transactional workloads, while NoSQL handles high-volume, pattern-specific workloads more efficiently.
Discord
Discord migrated from MongoDB to Cassandra for message storage, then later from Cassandra to ScyllaDB (a Cassandra-compatible database). Messages have a clear access pattern (get messages by channel_id, sorted by timestamp) that maps perfectly to a wide-column model with channel_id as partition key and message_id as clustering key. The relational alternative (PostgreSQL with channel_id index) would have required sharding much earlier due to the extreme write volume (millions of messages per second). The narrow, predictable access pattern made the NoSQL trade-off clearly favorable.
| Aspect | Description |
|---|---|
| Query Flexibility vs Access Pattern Optimization | SQL databases can answer any query expressible in SQL without schema changes -- the query optimizer handles it. This is invaluable for analytics, reporting, and evolving requirements. NoSQL databases are optimized for specific access patterns and are extremely fast for those patterns, but adding a new access pattern often requires creating a new table, index, or data pipeline. If your access patterns are known and stable, NoSQL wins. If they are evolving or unpredictable, SQL wins. |
| Transactional Integrity vs Write Throughput | SQL databases provide ACID transactions across multiple rows and tables, simplifying correctness for transactional workloads (financial transfers, inventory management, booking systems). This coordination has a throughput cost: distributed SQL transactions require 2PC or Paxos. NoSQL databases (especially Cassandra, DynamoDB) offer higher single-record write throughput by avoiding cross-record coordination, but push transactional logic to the application layer. If you need multi-record atomicity, SQL is dramatically simpler. |
| Schema Rigidity vs Data Quality | SQL's schema-on-write catches data quality issues at insertion time (wrong type, missing required field, violated constraint). Schema evolution (ALTER TABLE) can be slow for large tables. NoSQL's schema-on-read allows rapid iteration and heterogeneous data, but data quality bugs (misspelled field names, wrong types) reach production and are discovered only when reading. Most mature NoSQL applications end up implementing schema validation in application code, recreating what SQL provides natively. |
| Operational Simplicity vs Scaling Ceiling | A single PostgreSQL instance is operationally simple: backups, monitoring, and upgrades are well-understood. It handles most workloads up to ~10TB and ~50K TPS. Beyond that, SQL scaling (sharding, distributed SQL) adds significant operational complexity. NoSQL databases are designed for horizontal scaling from the start, making it operationally straightforward to scale from 3 nodes to 300 nodes. But the simpler operations come at the cost of a less powerful query model. Evaluate your actual scale requirements before choosing based on theoretical scalability. |
Twitter's Timeline: From SQL to Hybrid Architecture
Scenario
Twitter's home timeline (showing tweets from people you follow) is one of the classic system design challenges. In the early days, Twitter generated timelines on-read using SQL: SELECT * FROM tweets WHERE user_id IN (SELECT following_id FROM follows WHERE follower_id = ?) ORDER BY created_at DESC LIMIT 200. As the user base grew to hundreds of millions, this fan-out-on-read query became untenable -- users following thousands of accounts required joining millions of rows per timeline request.
Solution
Twitter moved to a fan-out-on-write architecture using Redis (NoSQL) for timeline caching. When a user tweets, the tweet ID is written to the timeline cache (a Redis sorted set) of each of their followers. Reading a timeline becomes a simple Redis ZRANGE -- no joins, no SQL. However, this approach failed for high-follower accounts (celebrities with 50M+ followers): fan-out-on-write for a single tweet would require 50M Redis writes. The solution was a hybrid: fan-out-on-write for normal users, fan-out-on-read for celebrity accounts, with tweets merged at read time. The core tweet and user data remained in MySQL (partitioned by user_id) for its transactional and query capabilities.
Outcome
The hybrid architecture served 300K+ timeline reads per second with P99 latency under 100ms. MySQL stored the source-of-truth tweet and user data with ACID guarantees; Redis stored precomputed timelines for fast reads. This architecture demonstrates that the SQL vs NoSQL choice is often a false dichotomy -- the best systems use SQL for what SQL does well (transactional storage, complex queries) and NoSQL for what NoSQL does well (precomputed results, high-throughput simple reads). The key lesson is that data modeling and access pattern analysis drive the choice, not a technology preference.
See SQL vs NoSQL in action
Explore system design templates that use sql vs nosql and run traffic simulations to see how these concepts perform under real load.
Browse Templates1A team is building a system with complex, unpredictable query requirements and moderate scale (1TB, 5K TPS). Which database approach is most appropriate?
2What is the primary disadvantage of denormalization in NoSQL data modeling?