Vetora logo
๐ŸงฑCloud-Native

Immutable Infrastructure

Immutable infrastructure treats servers and deployments as disposable artifacts that are replaced rather than modified. Instead of patching a running server (mutable), you build a new image with the changes and replace the old instance entirely, ensuring consistency and eliminating configuration drift.

Overview

The traditional approach to server management is mutable infrastructure: you provision a server, then iteratively modify it over its lifetime -- applying OS patches, updating application code, changing configuration files, installing hotfixes. Over time, each server becomes a unique snowflake shaped by its individual history of modifications. Two servers that were identical at provisioning time may diverge significantly after months of patches, manual fixes, and ad-hoc changes. This divergence -- configuration drift -- is the root cause of 'it works on server A but not server B' failures and makes disaster recovery unreliable.

Immutable infrastructure inverts this model. Servers are never modified after deployment. When a change is needed -- a new application version, an OS security patch, a configuration update -- a new machine image is built from scratch incorporating the change, tested in staging, and deployed by replacing the running instances. The old instances are terminated, not patched. This is the 'phoenix server' pattern: instead of trying to keep a server alive forever (a pet), you burn it down and rebuild it from a known-good image (cattle).

Containers are the purest expression of immutable infrastructure. A Docker image is a frozen, read-only filesystem. Every deployment creates new containers from the image; containers are never entered with SSH and modified in place. Kubernetes enforces this with declarative Deployments: updating the image tag triggers a rolling replacement of all pods. The previous image is retained for instant rollback.

At the VM level, tools like Packer (HashiCorp) bake machine images (AMIs for AWS, VM images for GCP/Azure) from a base OS plus application code and configuration. The resulting image is versioned, tested, and deployed via auto-scaling groups. Blue-green deployments create an entirely new set of instances running the new image alongside the old set, then switch traffic once verified. Canary deployments route a small percentage of traffic to the new image before full rollout. In both cases, rollback means switching traffic back to the old image -- no need to reverse-engineer what patches were applied.

Key Points
  • 1No SSH, no patches, no manual changes. If a server needs modification, build a new image and replace it. This eliminates configuration drift and makes every deployment reproducible.
  • 2Container images are the canonical example: read-only, versioned, immutable filesystems. Kubernetes Deployments enforce immutability by replacing pods rather than modifying them.
  • 3Packer builds VM images (AMIs) from declarative templates. The same template produces identical images every time, enabling consistent environments across dev, staging, and production.
  • 4Blue-green and canary deployments depend on immutable infrastructure: you can only safely switch traffic between two versions if each version is a known, tested, unchanging image.
  • 5Rollback is trivial: redeploy the previous image version. No need to reverse patches, revert config changes, or debug partial updates.
  • 6Configuration that must vary per environment (secrets, endpoints, feature flags) is injected at runtime via environment variables, ConfigMaps, or secret managers -- never baked into the image.
Simple Example

Printed Book vs. Wiki Analogy

Mutable infrastructure is like a wiki page: anyone can edit it at any time, and over months of edits, the page may diverge from its original intent in unpredictable ways. It is hard to know the exact state without reading every revision. Immutable infrastructure is like a printed book: once published, a copy cannot be modified. If you find a typo, you publish a new edition (a new image) and recall the old one. Every copy of the same edition is guaranteed identical. You can always go back to a previous edition (rollback). The trade-off is that publishing a new edition takes longer than editing a wiki page (image build time), but you gain consistency, reproducibility, and auditability.

Real-World Examples

Netflix

Netflix pioneered immutable infrastructure at scale with their 'Bake and Deploy' pipeline. Every deployment bakes a new AMI using Packer (originally Aminator), tests it in a staging environment, and deploys it via red-black (blue-green) deployment to production auto-scaling groups. No instance is ever patched in place. Netflix estimates this approach reduced configuration-related incidents by over 80% compared to their previous mutable model.

Heroku

Heroku's slug-based deployment model is fundamentally immutable. A 'git push heroku main' triggers a build that produces a slug (a compressed filesystem snapshot). The slug is immutable and versioned. Deployment creates new dynos from the slug; rollback deploys a previous slug. Developers never SSH into dynos or modify them in place. This model inspired the design of Docker and the Twelve-Factor App methodology.

GitLab

GitLab runs their SaaS platform on GCP using immutable infrastructure. Every change to GitLab.com infrastructure is a new VM image built by CI/CD, tested, and deployed via rolling replacement. They maintain a 'no SSH to production' policy, enforced by removing SSH keys from production instances. Debugging is done via centralized logging and monitoring, not by connecting to individual servers.

Trade-Offs
AspectDescription
Consistency vs. Deployment SpeedBuilding a new image for every change ensures consistency but adds build time (2-10 minutes for container images, 10-30 minutes for VM images). Mutable updates (rsync a binary, restart a process) are faster but create drift risk. The trade-off is acceptable for most workloads; for emergency hotfixes, pre-built images or fast container builds mitigate the delay.
Immutable Servers vs. Stateful DataImmutable infrastructure works cleanly for stateless application servers. Stateful systems (databases, message brokers, caches) cannot be freely replaced without data migration. The solution is to separate compute (immutable) from storage (persistent volumes, managed databases), but this separation adds architectural complexity.
Security Patching Cadence vs. Operational OverheadIn mutable infrastructure, applying an OS patch is one command across all servers. In immutable infrastructure, it requires rebuilding the base image, rebuilding all dependent application images, testing, and redeploying every service. This is more work but guarantees the patch is applied consistently. Automated image pipelines that rebuild nightly on base image changes reduce this burden.
Case Study

UK Government Digital Service's Move to Immutable Infrastructure

Scenario

The UK Government Digital Service (GDS) operated GOV.UK -- the single website for UK government services serving 4+ billion requests per year -- on hand-configured VMs managed with Puppet. Despite configuration management, drift accumulated: security patches were applied inconsistently, Puppet runs occasionally failed silently, and 'snowflake' servers with manual tweaks existed across the fleet. Recreating the production environment from scratch required 2-3 days of manual work.

Solution

GDS migrated GOV.UK to an immutable infrastructure model on AWS. Application servers are deployed as Docker containers on ECS, with container images built from Dockerfiles in CI/CD. Infrastructure is provisioned via Terraform. VM-level components (where containers are not feasible) use Packer-built AMIs deployed to auto-scaling groups. A strict 'no SSH' policy is enforced: production instances have no SSH keys, and all debugging is done through centralized logging (ELK stack) and monitoring (Prometheus/Grafana).

Outcome

Full environment recreation time dropped from 2-3 days to 45 minutes (terraform apply + deploy). Configuration drift incidents were eliminated entirely. Security patching became automated: base image updates trigger automatic rebuilds of all dependent images and rolling redeployments. The team reduced their on-call burden by 40% because immutable infrastructure eliminated an entire class of 'server-specific' issues that previously required SSH investigation.

Common Mistakes
  • โš SSH-ing into containers or pods for 'quick fixes.' Any change made inside a running container is lost when it restarts. Worse, it creates a divergence between the running state and the image -- the next deployment reverts the fix. All changes must go through the image build pipeline.
  • โš Baking secrets or environment-specific configuration into images. An image with hardcoded database credentials cannot be reused across environments and exposes secrets to anyone with image access. Inject secrets at runtime via environment variables, Kubernetes Secrets, or Vault.
  • โš Not automating base image updates. If your base OS image (Amazon Linux, Ubuntu) is built once and never rebuilt, it accumulates unpatched CVEs. Automate nightly base image rebuilds with security patches, and trigger downstream application image rebuilds via dependency chains.
  • โš Treating immutability as all-or-nothing. You can adopt immutable infrastructure incrementally: start with container images for application code, then move to Packer-built AMIs for infrastructure components, and finally enforce 'no SSH' policies. Full immutability is a destination, not a prerequisite.
Related Concepts

See Immutable Infrastructure in action

Explore system design templates that use immutable infrastructure and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Compare mutable patching vs immutable blue-green deploys

Metrics to watch
config_drift_pctdeploy_rollback_time_msavailability_during_deploy_pctinstance_age_hours
Run Simulation
Test Your Understanding

1What is the primary problem that immutable infrastructure solves?

2How should environment-specific configuration (e.g., database URLs, API keys) be handled in immutable infrastructure?

Deeper Reading