Vetora logo
⚙️Cloud-Native

Kubernetes Orchestration

Kubernetes (K8s) is the industry-standard container orchestration platform that automates deployment, scaling, and management of containerized workloads. Its declarative model and reconciliation loop ensure actual cluster state continuously converges to desired state.

Overview

Kubernetes, originally designed by Google and based on 15 years of experience running Borg and Omega, is the dominant container orchestration system. Its core insight is declarative infrastructure management: instead of scripting imperative steps ('start container A, then create load balancer B, then configure DNS C'), operators declare the desired end state in YAML manifests, and Kubernetes controllers continuously work to make reality match that declaration. If a container crashes, the controller restarts it. If a node dies, the scheduler places the workload elsewhere. This reconciliation loop is the foundation of Kubernetes' self-healing capability.

The Kubernetes architecture separates the control plane from the data plane. The control plane consists of the API server (the single point of interaction for all cluster operations), etcd (a distributed key-value store holding all cluster state), the scheduler (which assigns pods to nodes based on resource requirements, affinity rules, and constraints), and controller managers (which run reconciliation loops for Deployments, ReplicaSets, Jobs, and other resources). The data plane consists of worker nodes, each running a kubelet (agent that manages pods on the node) and kube-proxy (which implements service networking via iptables or IPVS rules).

Kubernetes introduces several key abstractions. A Pod is the smallest deployable unit -- one or more containers that share a network namespace and storage volumes. A Deployment manages a set of identical pods, handling rolling updates and rollbacks. A Service provides a stable virtual IP and DNS name for a set of pods, enabling service discovery. An Ingress (or Gateway API) manages external HTTP traffic routing. ConfigMaps and Secrets inject configuration and credentials without rebuilding images. PersistentVolumeClaims abstract storage provisioning across cloud providers.

At scale, Kubernetes clusters can manage tens of thousands of nodes and hundreds of thousands of pods. However, this scale introduces complexity: etcd performance degrades with too many objects, the scheduler becomes a bottleneck with complex affinity rules, and networking overlay overhead can add 5-10% latency. Large organizations (Google, Airbnb, Spotify) run multiple smaller clusters rather than one mega-cluster, using federation or fleet management tools to maintain consistency across them.

Key Points
  • 1Declarative model: you specify desired state (Deployment with 5 replicas); controllers reconcile actual state to match. The reconciliation loop runs continuously -- self-healing is automatic, not reactive.
  • 2Pod is the atomic unit: one or more co-located containers sharing network (same IP, localhost communication) and storage (shared volumes). Sidecar patterns (logging, proxying, security) are implemented as additional containers in the pod.
  • 3The scheduler bin-packs pods onto nodes considering CPU/memory requests, node affinity/anti-affinity, taints/tolerations, and topology spread constraints. Poor resource requests lead to either waste (over-provisioning) or eviction (under-provisioning).
  • 4Services provide stable networking: ClusterIP (internal), NodePort (external via node ports), LoadBalancer (cloud LB integration). DNS-based service discovery (my-service.my-namespace.svc.cluster.local) is built in.
  • 5Rolling updates are the default deployment strategy: new pods are created before old ones are terminated, with configurable maxSurge and maxUnavailable. Rollbacks are instant -- Kubernetes stores previous ReplicaSet specs.
  • 6Horizontal Pod Autoscaler (HPA) scales pods based on CPU, memory, or custom metrics. Cluster Autoscaler adds/removes nodes based on pending pod requests. Together they enable elastic scaling from zero to thousands of pods.
Simple Example

Thermostat Analogy

A home thermostat is a perfect model for Kubernetes' reconciliation loop. You set the desired temperature (desired state) to 72 degrees F. The thermostat continuously measures the actual temperature (actual state) and turns the heater on or off to converge. If someone opens a window (a container crashes), the thermostat detects the temperature drop and compensates automatically. You never tell the thermostat 'turn on for 10 minutes then off for 5' (imperative). You just declare '72 degrees' and the control loop handles it. Kubernetes works the same way: you declare '3 healthy replicas' and the Deployment controller continuously ensures exactly 3 are running, restarting crashed pods and rescheduling from failed nodes.

Real-World Examples

Airbnb

Airbnb migrated from a monolithic Ruby on Rails application to over 1,000 microservices on Kubernetes. They run 20+ Kubernetes clusters across AWS regions, managing over 50,000 pods. Their internal platform (OneTouch) provides developers with self-service deployment, canary releases, and automated rollback. The migration reduced deployment time from hours to minutes and improved resource utilization from 15% to 55%.

Spotify

Spotify runs over 2,000 microservices on GKE (Google Kubernetes Engine) clusters. They process 4+ million requests per second across their clusters. Their developer platform (Backstage, now a CNCF project) abstracts Kubernetes complexity -- engineers deploy via a web UI without writing YAML. Custom operators manage Spotify-specific resources like ML model deployments and A/B test configurations.

CERN

CERN uses Kubernetes to orchestrate data processing for the Large Hadron Collider, managing 500,000+ containers across 12,000 nodes. Physics analysis jobs run as Kubernetes batch workloads (Jobs and CronJobs), processing petabytes of collision data. The declarative model enables physicists to submit analysis pipelines without understanding infrastructure, while cluster autoscaling handles burst workloads during beam collision periods.

Trade-Offs
AspectDescription
Operational Complexity vs. AutomationKubernetes automates deployment, scaling, and self-healing, but introduces significant operational complexity: control plane management, etcd backup/restore, RBAC policies, network policies, storage provisioning, and upgrade planning. Managed services (EKS, GKE, AKS) reduce operational burden but limit customization and add vendor lock-in.
Resource Requests vs. Utilization EfficiencySetting resource requests too high wastes capacity (pods reserve CPU/memory they never use). Setting them too low causes throttling, OOM kills, and eviction under pressure. Right-sizing requires continuous profiling with tools like Vertical Pod Autoscaler (VPA) or Goldilocks. Most organizations start at 30-40% utilization and optimize toward 60-70%.
Single Large Cluster vs. Multi-ClusterA single cluster simplifies service discovery and reduces networking overhead but creates a blast radius (etcd corruption or API server outage affects everything) and hits scaling limits (etcd at ~8 GB, 5,000 nodes per cluster). Multi-cluster architectures isolate failures and enable region-local deployments but require service mesh or DNS-based cross-cluster service discovery.
Abstraction vs. TransparencyInternal developer platforms (IDPs) that abstract Kubernetes behind a simple UI accelerate onboarding but hide operational details. When things break, developers cannot debug without understanding pods, logs, events, and resource constraints. The best platforms provide a 'graduated complexity' model: simple by default, full Kubernetes access when needed.
Case Study

Pinterest's Migration to Kubernetes

Scenario

Pinterest operated 2,000+ services on a custom deployment system built on EC2 instances and Chef. Deployments were slow (20-45 minutes), rollbacks required manual intervention, and resource utilization was below 20%. Each service team maintained its own deployment pipeline, resulting in inconsistent practices and frequent deployment-related outages.

Solution

Pinterest built an internal platform on top of Amazon EKS, providing a standardized deployment abstraction. Services were containerized with multi-stage builds, deployed via Kubernetes Deployments with automated canary analysis. The platform implemented resource recommendation (based on historical usage) to right-size pod requests, Horizontal Pod Autoscaler for traffic-based scaling, and PodDisruptionBudgets for safe node maintenance. A custom admission controller enforced security policies (non-root, resource limits, image registry restrictions).

Outcome

Deployment time dropped from 20-45 minutes to under 5 minutes. Rollbacks became instant (one-click ReplicaSet rollback). Resource utilization improved from below 20% to 55%, saving $15M+ annually in EC2 costs. Deployment-related incidents decreased by 60% thanks to automated canary analysis and instant rollback. Developer satisfaction scores for deployment experience increased from 3.2/5 to 4.5/5.

Common Mistakes
  • Not setting resource requests and limits. Without requests, the scheduler cannot bin-pack effectively, and pods may be scheduled on overcommitted nodes. Without limits, a memory leak in one pod can OOM-kill other pods on the same node. Always set both, and use VPA to right-size them over time.
  • Using Deployments for stateful workloads. Deployments treat pods as interchangeable; StatefulSets provide stable network identities (pod-0, pod-1) and ordered startup/shutdown required by databases and distributed systems like Kafka, ZooKeeper, and Elasticsearch.
  • Skipping liveness and readiness probes. Without readiness probes, Kubernetes sends traffic to pods that are not ready (still initializing, loading caches). Without liveness probes, a deadlocked pod stays in the Service rotation indefinitely. Misconfigured probes (too aggressive timeouts) cause cascading restart storms.
  • Ignoring Pod Disruption Budgets. Without PDBs, a cluster upgrade or node drain can terminate all replicas of a service simultaneously. PDBs declare 'at least 2 of 3 replicas must always be available', ensuring graceful node maintenance.
Related Concepts

See Kubernetes Orchestration in action

Explore system design templates that use kubernetes orchestration and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Watch Kubernetes HPA scale pods under traffic spikes

Metrics to watch
pod_counthpa_scaling_time_mscpu_utilization_pctp99_latency_ms
Run Simulation
Test Your Understanding

1What is the primary role of the Kubernetes scheduler?

2Why should StatefulSets be used instead of Deployments for databases?

Deeper Reading