Vetora logo
๐Ÿ”€Architectural Patterns

CQRS (Command Query Responsibility Segregation)

Understand CQRS, a pattern that separates read and write models into distinct paths, enabling independent optimization, scaling, and evolution of each side.

Overview

In a traditional CRUD architecture, the same data model serves both reads and writes. A single 'Order' table is used to insert new orders, update their status, and query order history. This simplicity works well when read and write patterns are similar, but becomes a bottleneck when they diverge. A system might need complex normalized schemas for write integrity but flat, denormalized views for fast reads. It might need to handle 100x more reads than writes, or vice versa. CQRS addresses this by splitting the architecture into two separate paths.

The command side handles all state mutations. Commands are imperative instructions -- 'PlaceOrder', 'CancelSubscription', 'UpdateShippingAddress' -- that go through validation, business rule enforcement, and persistence. The command model is optimized for write consistency: normalized schemas, strong transactions, and domain logic enforcement. It does not need to support arbitrary queries because it is never read by end users directly.

The query side handles all data retrieval. Read models are purpose-built materialized views, denormalized and pre-computed for specific query patterns. An order dashboard might have a flat read model with all the data for a single-page view pre-joined, while an analytics pipeline reads from a columnar store optimized for aggregations. Each read model can use a different storage technology -- Elasticsearch for full-text search, Redis for real-time dashboards, PostgreSQL for relational queries -- all derived from the same source of truth on the write side.

The two sides are connected by a synchronization mechanism. In the simplest case, the command side writes to a database and the query side reads from read replicas of the same database. In more sophisticated implementations, the command side emits domain events that are consumed by projectors which update dedicated read stores. This event-driven synchronization introduces eventual consistency: after a command is processed, there is a brief window (typically milliseconds to seconds) before the read models reflect the change. This trade-off is acceptable for most use cases but must be carefully considered for scenarios where read-your-own-writes consistency is required.

Key Points
  • 1CQRS does not require event sourcing. The simplest CQRS implementation uses the same database with separate read and write repositories. Event sourcing is a common companion pattern but is an independent architectural decision.
  • 2Read models are disposable and rebuildable. Because they are derived from the write side's events or data, any read model can be deleted and reconstructed from scratch. This makes schema migrations on the read side trivial.
  • 3Each query use case can have its own optimized read model. A product catalog might have a search-optimized Elasticsearch index, a price-comparison Redis cache, and an analytics-optimized ClickHouse table -- all derived from the same write events.
  • 4The write side can enforce complex business invariants without worrying about query performance. Domain logic lives exclusively in the command handlers, keeping it focused and testable.
  • 5Eventual consistency between write and read models is the primary trade-off. The synchronization lag must be communicated to users through UX patterns like optimistic updates or 'processing' indicators.
  • 6CQRS adds architectural complexity that is only justified when read and write workloads have fundamentally different characteristics -- different schemas, different scaling requirements, or different performance SLOs.
Simple Example

The Library Card Catalog Analogy

A library has two systems: the back-office system where librarians catalog, acquire, and process books (the write model), and the public card catalog where patrons search for books (the read model). The card catalog is organized for quick lookups -- alphabetically by author, by subject, by title -- in ways that would make no sense for the acquisition workflow. When a new book arrives, the librarian processes it in the back office (write), and later the card catalog is updated (eventual consistency). If the card catalog gets damaged, it can be rebuilt entirely from the back-office records. Different branches might organize their card catalogs differently (multiple read models) while sharing the same central acquisition system (single write model).

Real-World Examples

Microsoft (Azure DevOps)

Azure DevOps uses CQRS extensively across its services. The work item tracking system uses a normalized SQL write model for enforcing workflow rules and a denormalized read model optimized for complex queries, filtering, and sorting across millions of work items. Read model updates happen asynchronously, with a typical lag of 200-500ms. This architecture allows the query side to scale independently to handle the 10:1 read-to-write ratio.

Booking.com

Booking.com applies CQRS principles to its hotel availability system. The write model processes 1 million+ booking transactions per day with strong consistency guarantees. The read model serves hotel search results from pre-computed availability caches optimized for geographic and date-range queries. The read model tolerates brief staleness (a room might appear available for a few seconds after it is booked), which is resolved at checkout through write-side validation.

Twitter (now X)

Twitter's timeline architecture is a classic CQRS example. The write model processes incoming tweets and stores them in a normalized store. The read model pre-computes materialized timelines for each user using a fan-out-on-write strategy, stored in Redis for sub-millisecond read latency. Celebrity accounts with millions of followers use fan-out-on-read instead to avoid the write amplification of materializing millions of timelines for a single tweet.

Trade-Offs
AspectDescription
Read Performance vs ConsistencyCQRS enables highly optimized, denormalized read models that deliver sub-millisecond query times. The trade-off is eventual consistency -- read models may lag behind the write model by milliseconds to seconds. Systems requiring strict read-your-own-writes semantics need additional mechanisms like version-stamped reads or synchronous projections.
Independent Scaling vs Operational OverheadRead and write sides can scale independently -- critical when read traffic is 10-100x write traffic. However, each read model is an additional system to deploy, monitor, and maintain. A system with five specialized read models has five additional failure points and synchronization pipelines.
Schema Flexibility vs Synchronization ComplexityRead models can be redesigned and rebuilt without touching the write model, enabling rapid iteration on query patterns. However, the synchronization logic (event handlers, projectors) must be maintained for every read model and kept in sync with write-side schema changes.
Case Study

E-Commerce Platform Scaling with CQRS

Scenario

A mid-size e-commerce platform with 5 million daily active users was struggling with database performance. The product catalog had 50 million items and received 50,000 reads per second but only 500 writes per second. Complex search queries (filter by price, category, brand, rating, availability) required expensive JOINs across 8 normalized tables, resulting in p99 latencies exceeding 2 seconds. Adding read replicas helped but the queries themselves were too complex for the normalized schema.

Solution

The team implemented CQRS by separating the write model (PostgreSQL with fully normalized product data, strict validation, and ACID transactions) from purpose-built read models. Product search was moved to Elasticsearch with documents pre-denormalized for the most common filter combinations. A Redis-based read model served product detail pages with sub-millisecond latency. An event pipeline using Kafka connected the write model to both read models, with projectors transforming normalized write events into denormalized read documents.

Outcome

Search query p99 latency dropped from 2 seconds to 50ms. Product detail page loads decreased from 150ms to 8ms. The write model, freed from serving read traffic, could handle 3x more catalog updates without scaling. The Elasticsearch read model independently scaled to handle Black Friday traffic (10x normal) while the write model remained at steady state. Total infrastructure cost increased 30% but revenue per session improved 15% due to faster page loads, more than offsetting the cost.

Common Mistakes
  • โš Applying CQRS everywhere instead of selectively. CQRS adds significant complexity and is only justified when read and write models have fundamentally different requirements. Simple CRUD domains with balanced read/write ratios should use a single model.
  • โš Assuming CQRS requires event sourcing. Many successful CQRS implementations use simple database change events or even polling-based synchronization. Adding event sourcing on top of CQRS compounds complexity and should be a separate, justified decision.
  • โš Not handling the eventual consistency gap in the user experience. After a user submits a form (write), immediately redirecting to a page that reads from the eventually-consistent read model can show stale data. Use optimistic UI updates or version-aware reads to bridge the gap.
  • โš Building a single, generic read model instead of purpose-built models per use case. The power of CQRS comes from having read models tailored to specific query patterns. A single denormalized table that tries to serve all queries defeats the purpose.
Related Concepts

See CQRS (Command Query Responsibility Segregation) in action

Explore system design templates that use cqrs (command query responsibility segregation) and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Separate read and write paths for click aggregation

Metrics to watch
write_throughput_rpsread_latency_msprojection_lag_msconsistency_delay_ms
Run Simulation
Test Your Understanding

1What is the primary motivation for applying CQRS to a system?

2What happens to stale read models in a CQRS architecture if they become corrupted or need a schema change?

Deeper Reading