Medium10 componentsInterview: High

Online Code Editor — Container + OT Collaboration

Q: Why is OT preferred over CRDT for the standard variant?

OT is simpler to implement correctly and has lower per-document memory overhead. OT stores the document text plus a bounded operation log (last 1000 operations for late-joining catch-up). CRDT stores per-character metadata (client ID, sequence number, tombstones) that adds 1.5x the document size. At 100K concurrent documents, this difference is significant: OT DocCache needs approximately 5GB versus CRDT's 12.5GB. Google Docs has proven OT works at massive scale. The trade-off is that OT requires CollabService as a central coordination point — if it fails, collaboration stops. CRDT (V2) eliminates this single point of failure.

Q: How does the OT algorithm handle concurrent edits from multiple users?

Each OT operation carries a revision number (the document version the client was editing against). When CollabService receives an operation, it checks if any other operations have been applied since that revision. If yes, it transforms the incoming operation against each intervening operation using the OT transformation function. For example, if User A inserts at position 5 and User B deletes at position 3, User A's insert is transformed to position 4 (shifted left because the deletion before it reduced all positions by 1). The transformed operation is applied to DocCache and broadcast to all clients.

Q: What happens when CollabService goes down?

Collaborative editing stops — no OT operations can be processed or broadcast. However, users can continue editing locally in their browser (local state is preserved). When CollabService recovers, clients reconnect and send their accumulated local operations. CollabService transforms and merges them against any other operations received during the outage. There is a risk of divergent state if two users made conflicting edits during the outage, but OT guarantees convergence as long as all operations are eventually processed.

Q: Why not use WebContainer (in-browser execution) instead of server-side containers?

WebContainer (used by StackBlitz) runs a full Node.js runtime inside the browser via WebAssembly. This eliminates server-side execution cost entirely — no Firecracker VMs, no container orchestration, no Kafka pipeline. The limitation is language support: WebContainer only supports JavaScript/TypeScript and languages compilable to WASM. Python, Java, and Go require native runtimes that cannot run in the browser. For a multi-language code editor (Replit-style), server-side containers are necessary.

Q: How does the system handle 100K concurrent containers?

100K Firecracker microVMs at 128MB RAM each require 12.5TB of aggregate memory. This is distributed across a fleet of 200-400 bare-metal hosts (each with 32-64GB RAM). ContainerWorker manages the fleet via a container orchestration layer (Kubernetes or custom scheduler). Idle timeout (5 minutes) and snapshot/restore (V2 feature) reduce active container count by 40-60% during off-peak hours. Pre-warmed container pools absorb traffic spikes without cold start delays for popular languages.

Industry-standard online code editor architecture using container-per-session isolation via Firecracker/Docker and Operational Transform for real-time collaborative editing. Separate REST service for file operations and WebSocket service for OT collaboration. Kafka-based execution pipeline with container worker pool.

FirecrackerOTWebSocketKafkaReal-timeCode Editor

Try in Simulator

Problem Statement

The container-per-session with Operational Transform approach represents the industry-standard architecture for production online code editors. It solves the two fundamental problems with the naive architecture: the lack of execution isolation and the absence of real-time collaboration.

The key insight is separating two fundamentally different concerns: code execution isolation and collaborative editing. Container-per-session provides kernel-level isolation by running each user's code in a dedicated Firecracker microVM or Docker container with strict resource limits (1 vCPU, 512MB RAM, restricted network). This eliminates the shared process pool's security risks — a fork bomb or malicious program cannot affect other users because the microVM's kernel enforces isolation independently of the host.

Operational Transform (OT) enables real-time collaborative editing where multiple users edit the same file simultaneously. Each keystroke generates an OT operation (insert, delete, retain) sent to CollabService via WebSocket. CollabService transforms the operation against any concurrent edits from other users using the OT algorithm, applies the transformed operation to the canonical document state in DocCache (Redis), and broadcasts it to all connected session participants. The transformation ensures all clients converge to the same document state despite concurrent, conflicting edits.

The architecture separates REST operations from real-time WebSocket operations. EditorService handles project CRUD, file save/load, and execution triggers via standard REST endpoints. CollabService handles WebSocket connections for OT operations. These services have fundamentally different scaling characteristics: EditorService scales with request volume (throughput-bound), while CollabService scales with concurrent connections (connection-bound). Separating them allows independent scaling — 8 EditorService pods for 80K RPS versus 10 CollabService pods for 100K concurrent WebSocket connections.

Kafka decouples execution requests from container lifecycle management. When a user clicks Run, EditorService publishes an execution request to the exec-request topic. ContainerWorker consumes the request, allocates a container, mounts project files from S3, executes the code, and streams stdout/stderr back to the client via CollabService's WebSocket connection. If ContainerWorker is at capacity, requests queue in Kafka with backpressure rather than failing immediately.

The primary trade-off is operational complexity: 10 components instead of 5, Kafka cluster management, Redis cluster monitoring, WebSocket connection management, and container orchestration. The OT algorithm also requires CollabService as a central coordination point — if CollabService goes down, collaborative editing stops (though solo editing can continue with local state). The V2 CRDT variant addresses this by replacing OT with a conflict-free replicated data type that does not require central coordination.

Interviewers expect candidates to explain why container-per-session provides better isolation than shared processes, describe the OT algorithm for conflict resolution, discuss Kafka's role in decoupling execution from container management, and reason about the separation of REST and WebSocket services.

Architecture Overview

The container-per-session with OT architecture uses ten main components organized into four layers: edge (EditorClient, ApiGateway, MainLB), application services (EditorService, CollabService), data stores (SessionCache, DocCache, ProjectDB, FileStorage), and execution pipeline (ExecStream/Kafka, ContainerWorker).

The REST path handles file operations and execution triggers. Requests flow through ApiGateway (authentication, rate limiting at 60K RPS) to MainLB (AWS ALB, content-based routing) to EditorService (8 pods, 80 threads each). EditorService reads/writes project metadata from ProjectDB (PostgreSQL), file content from FileStorage (S3), and session state from SessionCache (Redis). For code execution, EditorService publishes a request to ExecStream (Kafka) and returns 202 Accepted immediately.

The WebSocket path handles real-time collaboration. EditorClient establishes a WebSocket connection to CollabService (10 pods, 100 threads each = 100K concurrent connections). Each keystroke generates an OT operation sent via WebSocket. CollabService reads the canonical document state from DocCache (Redis), transforms the incoming operation against concurrent edits, writes the transformed state back to DocCache, and broadcasts the result to all session participants. Round-trip latency target: under 100ms.

The execution pipeline is asynchronous. ExecStream (Kafka, 16 partitions) carries execution requests from EditorService to ContainerWorker (50 workers). ContainerWorker spins up Firecracker microVMs with strict resource limits, mounts project files from FileStorage (S3), executes code, and streams stdout/stderr back to the client via CollabService's WebSocket connection. Containers are torn down after execution completion or 5-minute idle timeout.

Data persistence uses a three-tier strategy. ProjectDB (PostgreSQL) stores project metadata and file version history with strong consistency. FileStorage (S3) stores file content with 99.999999999% durability and built-in versioning. SessionCache and DocCache (Redis) provide sub-millisecond access to active session state and document state respectively.

Horizontal scaling is independent per component. EditorService scales with REST request volume. CollabService scales with concurrent WebSocket connections. Kafka scales via partition count. ContainerWorker scales with concurrent execution demand. Redis scales by adding nodes to the cluster.

Architecture Preview

Loading architecture preview...

Open in Simulator

Key Design Decisions

Container-per-Session Isolation

Choice

Firecracker microVM or Docker+gVisor per coding session instead of shared process pool

Rationale

Untrusted user code can do anything: fork bombs, network attacks, filesystem access, kernel exploits. Container isolation (Firecracker microVM) provides a dedicated Linux kernel per session with strict CPU/memory/network limits enforced at the hypervisor level. A fork bomb in one session cannot affect any other session. The trade-off is higher per-session cost (128MB RAM per microVM) and cold start latency (1-2 seconds for container spin-up versus near-instant for pre-warmed shared processes).

Operational Transform for Collaboration

Choice

Server-authoritative OT algorithm instead of CRDT or lock-based editing

Rationale

OT requires a central server (CollabService) to order operations but is simpler to implement correctly for text editing than CRDT. Each operation is transformed against concurrent edits using a well-understood mathematical framework (insert/delete composition). Google Docs uses OT at massive scale. CRDTs (V2 variant) are decentralized but have higher per-character memory overhead (1.5x document size for metadata) and more complex garbage collection. At 100K concurrent sessions, the server cost of OT is acceptable.

Separate EditorService and CollabService

Choice

Independent microservices for REST operations and WebSocket collaboration

Rationale

REST and WebSocket have fundamentally different resource profiles. EditorService handles short-lived request/response cycles (file save in 30ms, project list in 20ms) and scales with throughput. CollabService maintains long-lived WebSocket connections (minutes to hours) and scales with connection count. Combining them means a spike in REST traffic (project imports) could starve WebSocket connections, or a burst of OT operations could delay file saves.

Kafka for Execution Pipeline

Choice

Async execution requests via Kafka instead of synchronous RPC to ContainerWorker

Rationale

Container spin-up takes 1-2 seconds and can fail (resource exhaustion, image pull timeout). Kafka decouples the execution request from container lifecycle. If ContainerWorker is at capacity, requests queue in Kafka with backpressure rather than failing with 503. Kafka also provides replay for debugging execution issues and natural ordering per project (partitioned by project_id).

S3 for File Storage

Choice

Amazon S3 instead of PostgreSQL TEXT columns or EBS volumes

Rationale

Code files range from tiny (100 bytes) to large (10MB+ for dependencies). S3 handles arbitrary file sizes efficiently with 99.999999999% durability. Built-in versioning provides file history without custom implementation. PostgreSQL TEXT columns (V0 approach) bloat the database and complicate backups. EBS volumes at 100K sessions would exceed AWS account limits.

Redis for Document State (DocCache)

Choice

Redis for canonical OT document state instead of PostgreSQL or in-memory

Rationale

The OT server needs fast read/write access to the canonical document. Each OT operation requires read + transform + write — this must complete in under 10ms for the 100ms round-trip target. Redis provides sub-millisecond operations. In-memory state on CollabService would be lost on pod restart. PostgreSQL adds 10-35ms per operation, blowing the latency budget.

Scale & Performance

Target RPS

~50K REST + 200K WS msg/sec peak

Latency (p99)

<100ms OT round-trip, <2s execution cold start, <200ms file save

Storage

~5 TB/year (S3 file content + versions)

Availability

99.9% (multi-AZ, no multi-region)

Time & Space Complexity

Operation	Time	Space	Notes
OT transform (CollabService)	O(K) — K is the number of concurrent operations since client revision	O(K) — transform history for the current revision window	Typical K is 1-3 (users rarely type simultaneously at exactly the same millisecond). Worst case K = 100 (batch paste from multiple users), still under 1ms. The bottleneck is Redis round-trip, not transform computation.
Container allocation (ContainerWorker)	O(1) — pick pre-warmed VM from pool	O(1) — single VM assignment	Pre-warmed pool provides O(1) allocation. Cold start (no pre-warmed VM available) takes O(S) where S is the runtime startup time (1-5 seconds depending on language).
File save (EditorService → S3)	O(1) — S3 PUT for file content	O(S) — S is the file content size	S3 PUT is O(1) amortized for files under 5MB (single-part upload). ProjectDB version INSERT is O(log N) on the index. Total latency: ~100-200ms.
Kafka produce (execution request)	O(1) — append to partition log	O(1) — single message (~1KB)	Kafka append is O(1) amortized. Partition selection by project_id hash is O(1). Publish latency: ~5ms.

Database Schema (HLD)

projects

Project metadata stored in PostgreSQL. Write-once on creation, read on every project open. Indexed on owner_id for dashboard queries. Partitioned by project_id hash across 8 shards.

project_id UUID PKname TEXTlanguage TEXTowner_id UUID FKcreated_at TIMESTAMPTZ

Indexes: PK on project_id, idx_projects_owner ON (owner_id, created_at)

Small table, low write volume. Read replicas serve dashboard queries. Not a performance concern.

files

File metadata tracking path, version number, and S3 object key. Content stored in S3, not in this table. Updated on every file save (version increment + new s3_key). Indexed on (project_id, path) for file tree queries.

file_id UUID PKproject_id UUID FKpath TEXTversion INTEGER (auto-incremented on save)s3_key TEXT (S3 object key for content)updated_at TIMESTAMPTZ

Indexes: PK on file_id, UNIQUE idx_files_project_path ON (project_id, path), idx_files_version ON (project_id, path, version) for history queries

Write-heavy during active editing (auto-save every 5 seconds). Each save creates a new version row (not an update), enabling version history. Growth proportional to total save count across all projects.

sessions

Active session records linking users to containers. Created when a user opens a project, updated on container lifecycle events. Used for session recovery and container cleanup.

session_id UUID PKproject_id UUID FKuser_id UUID FKcontainer_id TEXT (Firecracker VM ID)status TEXT (active | terminated)created_at TIMESTAMPTZlast_activity TIMESTAMPTZ

Indexes: PK on session_id, idx_sessions_project ON (project_id, status) WHERE status = 'active', idx_sessions_container ON (container_id) WHERE status = 'active'

Moderate write volume — created on session start, updated on activity, deleted on session end. Used by cleanup jobs to terminate orphaned containers (sessions with no activity for > 5 minutes).

collab_documents (DocCache/Redis)

Canonical OT document state stored in Redis for sub-millisecond access. Each key holds document text, revision number, and operation log. Not a PostgreSQL table — included for schema completeness.

key: doc:{project_id}:{file_path}content TEXT (full document text)revision INTEGER (current version number)op_log ARRAY (last 1000 OT operations for catch-up)TTL: 600 seconds

95% hit rate for actively edited documents. Documents not edited for 10 minutes are flushed to S3 and evicted. Sub-millisecond read/write latency is critical for the 100ms OT round-trip target.

Event Contracts

exec_requestexec-request

Execution requests published by EditorService when a user clicks Run. Consumed by ContainerWorker for container allocation and code execution. Partitioned by project_id for per-project ordering.

Key Schema

project_id (string)

Value Schema

{ project_id: string, language: string, session_id: string, file_path: string, timestamp: number }

Solution Comparison

Variant	Tier	Latency	Throughput	Cost	Complexity	Reliability
V0: Naive (Shared Process Pool)	T1	~50ms exec start, 0-2s output delivery	~5K RPS total	$1,100/month	Low	99% (single DB)
V1: Container + OT (Firecracker + Operational Transform)	T2	<2s cold start, <100ms collab	~50K RPS peak	$3,500/month	Medium	99.9% (multi-AZ)
V2: Container + CRDT (Firecracker + Yjs/Automerge)	T3	<2s cold start, <100ms collab	~50K RPS + 200K WS msg/sec	$15K-20K/month	Very High	99.95% (multi-AZ, CRDT resilience)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions

Why is OT preferred over CRDT for the standard variant?

OT is simpler to implement correctly and has lower per-document memory overhead. OT stores the document text plus a bounded operation log (last 1000 operations for late-joining catch-up). CRDT stores per-character metadata (client ID, sequence number, tombstones) that adds 1.5x the document size. At 100K concurrent documents, this difference is significant: OT DocCache needs approximately 5GB versus CRDT's 12.5GB. Google Docs has proven OT works at massive scale. The trade-off is that OT requires CollabService as a central coordination point — if it fails, collaboration stops. CRDT (V2) eliminates this single point of failure.

How does the OT algorithm handle concurrent edits from multiple users?

Each OT operation carries a revision number (the document version the client was editing against). When CollabService receives an operation, it checks if any other operations have been applied since that revision. If yes, it transforms the incoming operation against each intervening operation using the OT transformation function. For example, if User A inserts at position 5 and User B deletes at position 3, User A's insert is transformed to position 4 (shifted left because the deletion before it reduced all positions by 1). The transformed operation is applied to DocCache and broadcast to all clients.

What happens when CollabService goes down?

Collaborative editing stops — no OT operations can be processed or broadcast. However, users can continue editing locally in their browser (local state is preserved). When CollabService recovers, clients reconnect and send their accumulated local operations. CollabService transforms and merges them against any other operations received during the outage. There is a risk of divergent state if two users made conflicting edits during the outage, but OT guarantees convergence as long as all operations are eventually processed.

Why not use WebContainer (in-browser execution) instead of server-side containers?

WebContainer (used by StackBlitz) runs a full Node.js runtime inside the browser via WebAssembly. This eliminates server-side execution cost entirely — no Firecracker VMs, no container orchestration, no Kafka pipeline. The limitation is language support: WebContainer only supports JavaScript/TypeScript and languages compilable to WASM. Python, Java, and Go require native runtimes that cannot run in the browser. For a multi-language code editor (Replit-style), server-side containers are necessary.

How does the system handle 100K concurrent containers?

100K Firecracker microVMs at 128MB RAM each require 12.5TB of aggregate memory. This is distributed across a fleet of 200-400 bare-metal hosts (each with 32-64GB RAM). ContainerWorker manages the fleet via a container orchestration layer (Kubernetes or custom scheduler). Idle timeout (5 minutes) and snapshot/restore (V2 feature) reduce active container count by 40-60% during off-peak hours. Pre-warmed container pools absorb traffic spikes without cold start delays for popular languages.

Related Templates

Online Code Editor — Naive (Shared Process Pool)Online Code Editor — Container + CRDT Collaboration Real-Time Chat — Kafka Fan-Out + WebSocket

Discussion

Ready to design your own Online Code Editor?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator