Medium12 componentsInterview: High

Online Code Editor — Container + CRDT Collaboration

Q: What is the difference between OT and CRDT for collaborative editing?

Operational Transform (OT) requires a central server to order operations. When two users type simultaneously, the server determines which edit came first and transforms the second edit to account for the first. This works well but creates a single point of coordination failure. CRDTs (Conflict-free Replicated Data Types) use a mathematical structure that guarantees convergence without central ordering. Each character has a unique ID (client ID + sequence number), and the merge operation is commutative — applying edits in any order produces the same result. CRDTs can work offline and peer-to-peer. The trade-off is memory: CRDTs store per-character metadata (approximately 1.5x document size) versus OT's bounded operation log.

Q: How does session hibernation via Firecracker snapshot/restore work?

Firecracker supports pausing a microVM and serializing its complete state (CPU registers, memory pages, disk blocks, network state) to a file. ContainerOrchestrator monitors session activity via SessionCache TTLs. When a session is idle for 10 minutes, the orchestrator calls Firecracker's snapshot API, uploads the resulting file (50-200MB) to S3, and releases the VM resources. When the user returns, the orchestrator downloads the snapshot from S3, restores the Firecracker VM state, and resumes execution from exactly where it paused — including any running processes, open file descriptors, and network connections. Total restore time: 1-2 seconds.

Q: Why separate CRDTService from CollabGateway?

CollabGateway manages WebSocket connections (200K concurrent) — its scaling dimension is connection count. CRDTService performs CRDT merge operations (200K ops/sec) — its scaling dimension is CPU for merge computation and Redis throughput for state read/write. Separating them allows: (1) scaling CollabGateway for more connections without adding unnecessary CRDT CPU, (2) scaling CRDTService for more merge throughput without allocating WebSocket connection memory, (3) deploying CRDT algorithm updates without disrupting WebSocket connections.

Q: How does CRDT garbage collection work and why is it important?

Yjs CRDTs store tombstones for every deleted character — a record that says 'character X was deleted by client Y at time Z'. Without garbage collection, a document that has been heavily edited (thousands of inserts and deletes) accumulates tombstones that grow the CRDT state far beyond the visible text size. A 10KB document could have 1MB of tombstones after extensive editing. CRDTService runs GC every 24 hours: it identifies tombstones older than 24 hours (no client could still reference them), removes them from DocCache, and writes a compacted CRDT state snapshot to S3. This requires coordinating with all connected clients to ensure they have synced past the GC point.

Q: How does the pre-warmed container pool handle language popularity skew?

The pool maintains 125 VMs per language (Python, Node.js, Java, Go) by default, but allocation patterns are skewed: Python and Node.js account for approximately 70% of sessions. ContainerOrchestrator monitors allocation rates per language and adjusts pool ratios dynamically. If Python allocations spike (e.g., a popular tutorial goes viral), the orchestrator shifts standby capacity: 200 Python, 150 Node.js, 100 Java, 50 Go. Pool replenishment prioritizes the language with the lowest standby-to-demand ratio.

Full production online code editor with Firecracker microVM per session for kernel-level isolation and CRDT-based collaborative editing (Yjs/Automerge style) for conflict-free multi-user editing. Pre-warmed container pool for sub-2s cold start, WebSocket gateways for collab and stdio streaming, S3-backed persistence with session snapshot/restore for hibernation.

FirecrackerCRDTWebSocketMicroservicesReal-timeCode Editor

Try in Simulator

Problem Statement

The container-per-session with CRDT approach represents the most advanced architecture for production online code editors, addressing the remaining limitations of the OT-based variant (V1): the central coordination bottleneck of OT and the lack of session hibernation for cost optimization.

The key advancement over V1 is replacing Operational Transform with Conflict-free Replicated Data Types (CRDTs). OT requires CollabService as a central authority to order operations — if CollabService fails, collaboration stops. CRDTs are mathematically guaranteed to converge without a central ordering authority. Each client maintains a local CRDT replica that can be edited independently. When updates are synced via CRDTService, they merge commutatively and idempotently — the order of application does not matter, and applying the same update twice has no effect. This means if one CollabGateway node fails, clients reconnect to another node and their local state merges seamlessly with zero data loss.

The architecture introduces three new components compared to V1. ContainerOrchestrator replaces the simpler ContainerWorker, adding lifecycle management for Firecracker microVMs: allocation from a pre-warmed pool, file mounting from S3, execution dispatch, snapshot/restore for hibernation, and resource limit enforcement. CollabGateway is a dedicated WebSocket gateway for CRDT traffic, separated from CRDTService (the merge engine) for independent scaling. StdioRelay is a dedicated WebSocket bridge between client browsers and container stdin/stdout, separated from CollabGateway because execution I/O has different traffic patterns (variable payloads, point-to-point instead of fan-out).

Session hibernation is a critical cost optimization. At 100K concurrent sessions with Firecracker microVMs at 128MB RAM each, the system consumes 12.5TB of aggregate memory. Many sessions are idle — users open a project, write some code, then switch to another tab. ContainerOrchestrator detects idle sessions (no activity for 10 minutes), snapshots the microVM state to S3 (50-200MB per snapshot), and releases the VM back to the pre-warmed pool. When the user returns, the snapshot is restored in under 2 seconds, giving the appearance of an always-on environment. This reduces active container count by 40-60% during off-peak hours.

The pre-warmed container pool is essential for cold start performance. Firecracker microVM boot time is approximately 125ms, but loading a language runtime adds 1-5 seconds. The pool maintains 500 standby VMs (125 per language: Python, Node.js, Java, Go) with runtimes pre-loaded. Session creation pulls a pre-warmed VM and mounts project files — total cold start under 2 seconds. The cost is 64GB of reserved memory for standby VMs.

CRDT garbage collection is a non-trivial operational concern. Yjs-style CRDTs store tombstones for deleted characters — every deletion is recorded but never removed by default. Over time, a heavily edited document accumulates tombstones that grow the CRDT state far beyond the visible text size. CRDTService runs periodic garbage collection to prune tombstones older than 24 hours, but this requires coordinating with all connected clients to ensure no client holds a reference to the pruned state.

Interviewers expect candidates to explain the advantages of CRDT over OT (no central coordination, offline-capable, merge-by-default), discuss session hibernation via snapshot/restore as a cost optimization, reason about pre-warmed pool sizing and replenishment, and analyze the separation of CollabGateway (fan-out) from StdioRelay (point-to-point).

Architecture Overview

The container-per-session with CRDT architecture uses twelve components organized into five layers: edge (EditorClient, ApiGateway, EditorLB), application services (SessionService, CollabGateway, CRDTService, StdioRelay), data stores (SessionCache, DocCache, ProjectDB, SnapshotStore), and container management (ContainerOrchestrator, ContainerPool).

The REST path handles session management and file operations. Requests flow through ApiGateway (authentication, rate limiting at 80K RPS) to EditorLB (AWS ALB) to SessionService (10 pods, 80 threads each). SessionService manages the full session lifecycle: checks SessionCache (Redis) for existing sessions, loads project metadata from ProjectDB, file content from SnapshotStore (S3), and requests microVMs from ContainerOrchestrator. For returning users with hibernated sessions, SessionService triggers snapshot restore via ContainerOrchestrator.

The CRDT collaboration path handles real-time editing. EditorClient establishes a WebSocket connection to CollabGateway (15 pods, 100 threads each = 200K concurrent connections). Each keystroke generates a Yjs-compatible CRDT update sent via WebSocket. CollabGateway forwards the update to CRDTService (8 pods, 60 threads each), which merges it into the canonical document state in DocCache (Redis) and returns the merged update for broadcast to all session participants. Round-trip latency target: under 100ms. CRDT operations are commutative — no central ordering is needed.

The stdio path handles execution I/O. When a user runs code, StdioRelay (8 pods, 50 threads) establishes a bidirectional WebSocket bridge between the client and the Firecracker microVM's stdin/stdout file descriptors. Output streams in real-time with sub-100ms latency. This is separated from CollabGateway because execution I/O has different traffic patterns: variable payloads (1 byte to 64KB per chunk), point-to-point (one client to one container), and bursty (output comes in chunks, not continuous keystrokes).

Container management uses a two-tier architecture. ContainerOrchestrator (6 pods) handles VM lifecycle: allocation from ContainerPool, file mounting from S3, execution dispatch, and snapshot/restore for hibernation. ContainerPool (100 workers) maintains a pre-warmed fleet of 500 Firecracker microVMs with language runtimes pre-loaded. Pool replenishment is asynchronous — after a VM is allocated to a session, a background process boots a replacement VM to maintain the target pool size.

Data persistence follows a three-tier strategy. ProjectDB (PostgreSQL with read replicas) stores project metadata, file version history, and session records with strong consistency. SnapshotStore (S3) stores file content, CRDT state snapshots, and Firecracker VM snapshots with 99.999999999% durability. SessionCache and DocCache (separate Redis clusters) provide sub-millisecond access to active session state and CRDT document state respectively.

Architecture Preview

Loading architecture preview...

Open in Simulator

Request Flow — CRDT Collaboration + Code Execution

This sequence diagram traces the two primary real-time flows: CRDT collaborative editing and code execution with stdio streaming. The critical insight is that CRDT operations are commutative — concurrent edits from multiple users converge automatically without a central ordering authority. If a CollabGateway node fails, clients reconnect and their local CRDT state merges seamlessly.

The second insight is the separation of WebSocket channels: CollabGateway handles high-frequency, small-payload CRDT updates (200K msg/sec, ~200 bytes each), while StdioRelay handles lower-frequency, variable-payload execution I/O (30K msg/sec, up to 64KB chunks). This prevents execution output bursts from delaying collaboration updates.

Loading diagram...

Step-by-Step Walkthrough

1User types in the editor. Each keystroke generates a Yjs CRDT update (~200 bytes) sent via WebSocket to CollabGateway. Updates are commutative — order does not matter
2CollabGateway forwards the update to CRDTService, which reads the current CRDT state from DocCache (Redis, ~0.5ms), merges the update (Yjs merge, microseconds), and writes the merged state back
3CRDTService returns the merged update to CollabGateway, which broadcasts it to all connected session participants. Each participant applies the update to their local CRDT replica. Total round-trip: ~40-80ms
4User clicks Run. SessionService requests a microVM from ContainerOrchestrator, which pulls a pre-warmed VM from ContainerPool (< 2 seconds with file mounting)
5Client establishes a WebSocket connection to StdioRelay, which bridges to the container's stdin/stdout. Output streams in real-time as the program executes. Interactive programs can read from stdin via the same bridge

Pseudocode

// CRDT MERGE — commutative, idempotent, no central ordering needed
async function mergeCrdtUpdate(docId, clientUpdate):
    // Read current CRDT state from DocCache
    currentState = await redis.get("doc:" + docId)  // ~0.5ms

    // Yjs merge: commutative (order doesn't matter), idempotent (duplicate-safe)
    mergedState = Yjs.mergeUpdates([currentState, clientUpdate])  // microseconds

    // Write merged state back to DocCache
    await redis.set("doc:" + docId, mergedState, { ex: 600 })  // ~0.5ms

    // Return merged update for broadcast to other clients
    return Yjs.diffUpdate(mergedState, currentState)

// SESSION CREATION — pre-warmed Firecracker microVM allocation
async function createSession(projectId, language, userId):
    // Check for existing session
    existing = await redis.get("session:" + projectId + ":" + userId)
    if (existing?.status === 'hibernated'):
        // Restore from snapshot
        vm = await orchestrator.restoreSnapshot(existing.snapshotKey)  // ~2s
    else:
        // Allocate pre-warmed VM
        vm = await orchestrator.allocateVm(language)  // ~200ms from pool
        await orchestrator.mountFiles(vm.id, projectId)  // ~500ms from S3

    session = { sessionId: uuid(), projectId, userId, containerId: vm.id }
    await redis.set("session:" + session.sessionId, session, { ex: 600 })
    await db.insert("sessions", session)
    return session

Database Schema (ER Diagram)

The schema reflects the advanced architecture's separation of concerns: PostgreSQL for metadata with strong consistency, S3 for content with high durability, and Redis for real-time state with sub-millisecond access. The sessions table tracks the Firecracker microVM lifecycle including hibernation. The collab_documents table provides a durability layer for CRDT state that lives primarily in Redis.

The critical relationship is between sessions and collab_documents: a session has one container and may have multiple collaboratively-edited documents. When a session hibernates, the container ID is cleared and a snapshot key is set. When a CRDT document's DocCache entry is evicted, CRDTService restores from the S3 snapshot referenced in collab_documents.

Loading diagram...

Key Design Decisions

CRDT Instead of OT

Choice

Yjs-compatible CRDT for collaborative editing instead of Operational Transform

Rationale

CRDTs do not require a central server for operation ordering. Each client maintains a local replica that can be edited offline, and updates merge commutatively when synced. This eliminates the single point of coordination failure that OT has — if one CollabGateway node fails, clients reconnect to another node and their local state merges seamlessly. The trade-off is higher per-character memory overhead: Yjs stores client ID, sequence number, and tombstones per character, adding approximately 1.5x the raw document size. At 100K concurrent documents averaging 50KB, DocCache needs approximately 12.5GB versus OT's 5GB.

Firecracker MicroVM Per Session

Choice

Dedicated Firecracker microVM per coding session instead of shared containers or processes

Rationale

Firecracker provides kernel-level isolation via a lightweight VMM that runs each microVM with its own Linux kernel. Boot time is approximately 125ms with 5MB memory overhead — far lighter than traditional VMs (QEMU/KVM at 100MB+). This is the same technology used by AWS Lambda for running untrusted code. Docker containers share the host kernel, so a kernel exploit in user code could escape the container. Firecracker's threat model explicitly defends against kernel exploits via the VMM boundary.

Pre-Warmed Container Pool

Choice

500 standby Firecracker microVMs with pre-loaded language runtimes

Rationale

Firecracker boot time is 125ms, but language runtime loading adds 1-5 seconds (Python interpreter: 1s, Node.js V8: 1.5s, Java JVM: 3-5s, Go: 0.5s). Pre-warming eliminates this latency. The pool maintains 125 VMs per language, sufficient for 95th percentile burst allocation. Cost: 500 x 128MB = 64GB of reserved memory (~$200/month on Fargate). The alternative is 3-5 second cold starts that destroy user experience.

Separate CollabGateway and StdioRelay

Choice

Independent WebSocket services for CRDT collaboration and execution I/O

Rationale

CRDT collaboration traffic is high-frequency, small-payload, many-to-many fan-out (200K msg/sec, 100-500 bytes each, broadcast to all session participants). Execution I/O traffic is lower-frequency, variable-payload, point-to-point (30K msg/sec, 1 byte to 64KB each, one client to one container). Combining them means a burst of execution output (a program printing 1MB) would congest the WebSocket connection and delay CRDT updates, causing visible editing lag for collaborators.

Session Hibernation via Snapshot/Restore

Choice

Firecracker snapshot to S3 for idle sessions instead of keeping containers running

Rationale

At 100K sessions, many are idle — users open a project, write code, then switch tabs. Keeping all 100K containers running wastes 12.5TB of RAM. Snapshot/restore serializes the VM state (memory, disk, processes) to S3 when idle > 10 minutes, reducing active containers by 40-60% during off-peak. Restore takes under 2 seconds (stream snapshot from S3 + restore Firecracker state). The user sees a brief loading indicator but then resumes exactly where they left off, including running processes.

S3 for All Durable Storage

Choice

S3 for file content, CRDT snapshots, and VM snapshots instead of EBS or database storage

Rationale

At 100K sessions, EBS would require 100K attached volumes (exceeding AWS limits) at approximately $10K/month minimum. S3 provides unlimited object storage at $0.023/GB/month with 99.999999999% durability. File content (5TB/year), CRDT snapshots (periodic, ~500GB), and VM snapshots (50-200MB each, ~10TB total) all fit S3's access patterns: write-once, read-on-demand. The trade-off is higher restore latency (200ms for small files, 2s for VM snapshots) compared to EBS's block-level access.

Scale & Performance

Target RPS

~50K REST + 200K WS msg/sec + 2K exec starts/sec

Latency (p99)

<100ms CRDT round-trip, <2s cold start, <2s snapshot restore

Storage

~15 TB/year (S3: files + CRDT snapshots + VM snapshots)

Availability

99.95% (multi-AZ, CRDT resilience to node failure)

Time & Space Complexity

Operation	Time	Space	Notes
CRDT merge (CRDTService)	O(U) — U is the update size (number of characters inserted/deleted)	O(U) — merged update stored in DocCache	Yjs merge is linear in the update size, not the document size. A single keystroke update (U=1) merges in microseconds. A paste of 10K characters takes approximately 1ms. The bottleneck is Redis round-trip (~0.5ms), not merge computation.
Container allocation (ContainerOrchestrator)	O(1) — pull from pre-warmed pool	O(1) — single VM assignment	Pre-warmed pool provides O(1) allocation in typical case. Cold path (pool empty): O(S) where S is runtime startup time (1-5 seconds). Pool replenishment is asynchronous.
Snapshot/restore (ContainerOrchestrator → S3)	O(M) — M is the VM memory size (128-512MB)	O(M) — snapshot file stored in S3	Snapshot serialization: ~500ms for 128MB VM. S3 upload: ~1s for 128MB. Restore download: ~1s. Total restore: ~2s. Snapshot size is proportional to allocated VM memory, not used memory.
CRDT garbage collection	O(T) — T is the number of tombstones to prune	O(D) — D is the compacted document size	GC runs every 24 hours per document. Typical T is 1K-10K tombstones for actively edited documents. GC duration: 10-100ms per document. Must coordinate with connected clients to ensure sync past GC point.

Database Schema (HLD)

projects

Project metadata stored in PostgreSQL with strong consistency. Write-once on creation, read on project open. Partitioned by project_id hash across 8 shards.

project_id UUID PKname TEXTlanguage TEXT (python, node, java, go)owner_id UUID FKcreated_at TIMESTAMPTZ

Indexes: PK on project_id, idx_projects_owner ON (owner_id, created_at)

Small, low-write table. Not a performance concern.

files

File metadata with S3 references. Content stored in SnapshotStore (S3), not in this table. Each save creates a new version row for history tracking.

file_id UUID PKproject_id UUID FKpath TEXT (e.g., src/main.py)version INTEGER (auto-incremented)s3_key TEXT (S3 object key for content)updated_at TIMESTAMPTZ

Indexes: PK on file_id, UNIQUE idx_files_project_path ON (project_id, path), idx_files_version ON (project_id, path, version DESC)

Write-heavy during active editing (auto-save every 30 seconds). Version history enables undo/diff. Old versions can be pruned after 30 days to control storage growth.

sessions

Session records tracking active and hibernated coding sessions. Each session maps to a Firecracker microVM (active) or an S3 snapshot (hibernated). Used for session recovery, billing, and container cleanup.

session_id UUID PKproject_id UUID FKuser_id UUID FKcontainer_id TEXT (null if hibernated)status TEXT (active | hibernated | terminated)snapshot_key TEXT (S3 key for VM snapshot, null if active)created_at TIMESTAMPTZlast_activity TIMESTAMPTZ

Indexes: PK on session_id, idx_sessions_project ON (project_id, status) WHERE status = 'active', idx_sessions_idle ON (last_activity) WHERE status = 'active' for hibernation scanner

The hibernation scanner queries idx_sessions_idle every minute to find sessions idle > 10 minutes. These are snapshot/restored to reduce active container count.

collab_documents

CRDT collaboration document records in PostgreSQL, tracking active CRDT sessions and their DocCache/S3 references. Used for CRDT state recovery if DocCache evicts or crashes.

doc_id TEXT PK (project_id:file_path)project_id UUID FKfile_path TEXTs3_snapshot_key TEXT (CRDT state backup in S3)collaborator_count INTEGERlast_snapshot TIMESTAMPTZ

Indexes: PK on doc_id, idx_collab_project ON (project_id)

CRDT state backup is written to S3 every 30 seconds. On DocCache eviction or crash, CRDTService restores from the latest S3 snapshot. This means up to 30 seconds of CRDT operations could be lost on cache failure — clients resend their local state on reconnect to close the gap.

What-If Scenarios

CollabGateway node failure (1 of 15 pods goes down)

Impact

Approximately 13K WebSocket connections (200K / 15) are dropped. Clients reconnect to other CollabGateway pods within 2-5 seconds. Because CRDT state is commutative, any local edits made during disconnection merge seamlessly on reconnect — zero data loss, zero conflicts. Compare with V1 (OT): CollabService failure causes all collaboration to halt until recovery.

Mitigation

Kubernetes restarts the failed pod within 30 seconds. During reconnection, clients send their local CRDT state, which CRDTService merges with the canonical state in DocCache. No manual intervention needed.

DocCache (Redis) eviction under memory pressure

Impact

CRDT document state for least-recently-used documents is evicted. When a user edits that document next, CRDTService must restore from the latest S3 snapshot (up to 30 seconds stale). Any edits made after the last snapshot are lost from the server state — but clients hold local CRDT state that includes those edits. On reconnect, client state is merged to close the gap.

Mitigation

Increase DocCache memory or reduce TTL. Monitor eviction rate and alert if it exceeds 1% of active documents per minute. Reduce snapshot interval from 30 seconds to 10 seconds for high-value documents.

Pre-warmed container pool depletion (viral tutorial causes Python session spike)

Impact

All 125 pre-warmed Python VMs are allocated within seconds. New Python sessions fall back to cold start: Firecracker boot (125ms) + Python runtime load (1s) + file mount (500ms) = approximately 2 seconds. User experience degrades from instant to a noticeable loading spinner.

Mitigation

ContainerOrchestrator detects pool depletion and triggers emergency replenishment (boot 50 VMs simultaneously). Dynamic pool rebalancing shifts capacity from underused languages (Java, Go) to Python. Auto-scaling adds more ContainerPool workers based on pool depletion rate.

User runs a cryptocurrency miner inside their Firecracker microVM

Impact

The miner consumes the full 1 vCPU allocated to the session. Other sessions are unaffected — Firecracker enforces CPU limits at the hypervisor level. The miner runs at the session's allocated CPU rate, providing negligible mining revenue. The user's session times out after 30 seconds of continuous CPU usage.

Mitigation

CPU throttling is enforced by Firecracker's rate limiter (1 vCPU cap). Network restrictions prevent pool connections to mining pools. Billing by CPU-second makes sustained mining unprofitable. Automated detection: sessions consuming >90% CPU for >5 minutes are flagged for review.

Failure Modes & Resilience

Component	Failure	Impact	Mitigation
CRDTService	All pods down — CRDT merge operations fail	CollabGateway cannot forward updates to CRDTService for merging. Clients continue editing locally (CRDT state is maintained in the browser), but changes are not synced to other collaborators or to DocCache. When CRDTService recovers, all buffered updates are merged — CRDT guarantees convergence regardless of ordering.	Multi-AZ deployment with at least 3 pods per AZ. CollabGateway buffers updates locally during CRDTService downtime (up to 1000 updates per document). Kubernetes restarts failed pods within 30 seconds.
ContainerOrchestrator	VM allocation failures due to host capacity exhaustion	New sessions cannot be created — users see an error when opening projects. Existing sessions continue running (VMs are already allocated). Hibernation and snapshot/restore also fail, meaning idle sessions continue consuming resources instead of being reclaimed.	Auto-scale host fleet based on VM allocation rate. Reserve 20% excess host capacity for burst allocation. Implement graceful degradation: queue allocation requests with 10-second timeout instead of failing immediately.
SnapshotStore (S3)	S3 unavailable or high latency	File saves fail — auto-save writes to S3 time out. Session hibernation fails — snapshots cannot be uploaded. Session restore fails — snapshots cannot be downloaded. CRDT state snapshots fail — recovery from DocCache eviction becomes impossible. Active sessions continue working (state in Redis and local microVM), but durability is temporarily lost.	S3 has 99.99% availability SLA. During rare outages: buffer file saves in DocCache with extended TTL, defer hibernation (keep containers running), and retry S3 operations with exponential backoff.
DocCache (Redis cluster)	Redis cluster failure — all CRDT state lost	Active collaboration sessions lose server-side CRDT state. Clients hold local CRDT state, which is the most recent version. On recovery, CRDTService restores from S3 snapshots (up to 30 seconds stale) and clients resend local state to close the gap. Brief interruption in collaboration sync.	Redis Cluster with automatic failover (replica promotion in 15-30 seconds). Reduce S3 snapshot interval to 10 seconds for critical documents. Clients maintain full local CRDT state as a safety net.

Scaling Strategy

Independent horizontal scaling per component. SessionService: auto-scale on CPU utilization > 70%. CollabGateway: auto-scale on WebSocket connection count > 12K per pod. CRDTService: auto-scale on merge operation throughput > 20K/sec per pod. ContainerOrchestrator: auto-scale on allocation queue depth > 100. ContainerPool: auto-scale host fleet based on pre-warmed pool depth (minimum 100 standby VMs per language). DocCache: add Redis nodes when memory utilization exceeds 75%. The system scales from 10K to 500K concurrent sessions without architectural changes — only infrastructure expansion.

Monitoring & Alerting

Key metrics to monitor: (1) CRDT merge latency (p50, p99) — the primary collaboration performance indicator. Alert at p99 > 50ms, critical at p99 > 100ms. (2) Container pool depth — number of pre-warmed VMs available per language. Alert if any language drops below 20 standby VMs. (3) WebSocket connection count — total across CollabGateway pods. Alert at 80% of capacity (160K/200K). (4) Session hibernation rate — percentage of sessions hibernated per hour. Expected 40-60% during off-peak. Low hibernation rate indicates the idle timeout is too long. (5) Snapshot restore latency (p99) — should be under 2 seconds. Alert at p99 > 3 seconds. (6) DocCache eviction rate — percentage of CRDT documents evicted per minute. Alert if exceeds 1%. (7) CRDT GC tombstone count — average tombstones per document. Alert if exceeds 50K (indicates GC is not running or failing).

Cost Analysis

At 100K concurrent sessions: ContainerPool Firecracker VMs (~$8,000/month at 128MB each on Fargate-equivalent pricing), SessionService 10 pods (~$625/month), CollabGateway 15 pods (~$935/month), CRDTService 8 pods (~$500/month), StdioRelay 8 pods (~$500/month), ContainerOrchestrator 6 pods (~$750/month), SessionCache Redis 3 nodes (~$300/month), DocCache Redis 4 nodes (~$600/month), ProjectDB PostgreSQL (~$200/month), SnapshotStore S3 (~$500/month), ApiGateway (~$100/month), ALB (~$50/month). Total: approximately $13K-20K/month depending on actual session density and hibernation rate. Session hibernation reduces active container cost by 40-60% during off-peak hours, saving $3K-5K/month.

Security Considerations

Firecracker microVMs provide the strongest isolation model for running untrusted code. Each VM runs its own Linux kernel with no shared kernel surface with the host. The Firecracker VMM restricts device access to virtio-net and virtio-block only — no GPU, no USB, no PCI passthrough. Network is restricted to a private VLAN with no outbound internet access by default (configurable per plan). CPU and memory limits are enforced at the hypervisor level — cannot be bypassed from within the VM. Seccomp filters on the Firecracker process restrict system calls to the minimum required set. This is the same security model used by AWS Lambda.

Deployment Strategy

Blue-green deployment for SessionService, CRDTService, and ContainerOrchestrator to avoid WebSocket disruption. CollabGateway uses rolling deployment with connection draining — existing WebSocket connections are maintained on old pods while new pods accept new connections. StdioRelay uses the same connection-draining approach. ContainerPool VMs are never restarted during deployment — only the orchestration layer is updated. Database migrations run during low-traffic windows with online DDL (no downtime).

Real-World Examples

•Replit uses Firecracker microVMs for per-session isolation with a pre-warmed pool and WebSocket-based collaborative editing
•GitHub Codespaces uses custom container orchestration on Azure with VS Code's collaboration protocol for multi-user editing
•CodeSandbox uses Firecracker for sandboxed execution with Yjs-based CRDT for real-time collaboration
•Google Cloud Shell uses gVisor-sandboxed containers with a custom collaboration layer for shared terminal sessions

Solution Comparison

Variant	Tier	Latency	Throughput	Cost	Complexity	Reliability
V0: Naive (Shared Process Pool)	T1	~50ms exec start, 0-2s output delivery	~5K RPS total	$1,100/month	Low	99% (single DB)
V1: Container + OT (Firecracker + Operational Transform)	T2	<2s cold start, <100ms collab	~50K RPS peak	$3,500/month	Medium	99.9% (multi-AZ)
V2: Container + CRDT (Firecracker + Yjs/Automerge)	T3	<2s cold start, <100ms collab	~50K RPS + 200K WS msg/sec	$15K-20K/month	Very High	99.95% (multi-AZ, CRDT resilience)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions

What is the difference between OT and CRDT for collaborative editing?

Operational Transform (OT) requires a central server to order operations. When two users type simultaneously, the server determines which edit came first and transforms the second edit to account for the first. This works well but creates a single point of coordination failure. CRDTs (Conflict-free Replicated Data Types) use a mathematical structure that guarantees convergence without central ordering. Each character has a unique ID (client ID + sequence number), and the merge operation is commutative — applying edits in any order produces the same result. CRDTs can work offline and peer-to-peer. The trade-off is memory: CRDTs store per-character metadata (approximately 1.5x document size) versus OT's bounded operation log.

How does session hibernation via Firecracker snapshot/restore work?

Firecracker supports pausing a microVM and serializing its complete state (CPU registers, memory pages, disk blocks, network state) to a file. ContainerOrchestrator monitors session activity via SessionCache TTLs. When a session is idle for 10 minutes, the orchestrator calls Firecracker's snapshot API, uploads the resulting file (50-200MB) to S3, and releases the VM resources. When the user returns, the orchestrator downloads the snapshot from S3, restores the Firecracker VM state, and resumes execution from exactly where it paused — including any running processes, open file descriptors, and network connections. Total restore time: 1-2 seconds.

Why separate CRDTService from CollabGateway?

CollabGateway manages WebSocket connections (200K concurrent) — its scaling dimension is connection count. CRDTService performs CRDT merge operations (200K ops/sec) — its scaling dimension is CPU for merge computation and Redis throughput for state read/write. Separating them allows: (1) scaling CollabGateway for more connections without adding unnecessary CRDT CPU, (2) scaling CRDTService for more merge throughput without allocating WebSocket connection memory, (3) deploying CRDT algorithm updates without disrupting WebSocket connections.

How does CRDT garbage collection work and why is it important?

Yjs CRDTs store tombstones for every deleted character — a record that says 'character X was deleted by client Y at time Z'. Without garbage collection, a document that has been heavily edited (thousands of inserts and deletes) accumulates tombstones that grow the CRDT state far beyond the visible text size. A 10KB document could have 1MB of tombstones after extensive editing. CRDTService runs GC every 24 hours: it identifies tombstones older than 24 hours (no client could still reference them), removes them from DocCache, and writes a compacted CRDT state snapshot to S3. This requires coordinating with all connected clients to ensure they have synced past the GC point.

How does the pre-warmed container pool handle language popularity skew?

The pool maintains 125 VMs per language (Python, Node.js, Java, Go) by default, but allocation patterns are skewed: Python and Node.js account for approximately 70% of sessions. ContainerOrchestrator monitors allocation rates per language and adjusts pool ratios dynamically. If Python allocations spike (e.g., a popular tutorial goes viral), the orchestrator shifts standby capacity: 200 Python, 150 Node.js, 100 Java, 50 Go. Pool replenishment prioritizes the language with the lowest standby-to-demand ratio.

Related Templates

Online Code Editor — Naive (Shared Process Pool)Online Code Editor — Container + OT Collaboration Real-Time Chat — Multi-Region WebSocket + CRDT

Discussion

Ready to design your own Online Code Editor?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator