1What is the primary difference between client-side and server-side service discovery?
Service discovery enables services in a distributed system to find and communicate with each other without hardcoded addresses. As containers and pods are created and destroyed dynamically, a discovery mechanism maintains an up-to-date registry of available service instances and their locations.
In traditional monolithic architectures, service locations are static: the database is at 10.0.1.5:5432, the cache is at 10.0.1.6:6379. These addresses are hardcoded in configuration files and rarely change. In cloud-native architectures with containers and auto-scaling, service instances are ephemeral -- they are created, destroyed, and rescheduled across hosts continuously. A service that had 3 instances at 10.0.1.{5,6,7} five minutes ago might now have 5 instances at completely different IPs. Service discovery is the mechanism that maintains an accurate, real-time map of service names to instance locations.
There are two fundamental patterns: client-side discovery and server-side discovery. In client-side discovery, each service queries a service registry (Consul, etcd, ZooKeeper) to get the list of available instances for a target service, then selects one using a client-side load-balancing algorithm (round-robin, least-connections, consistent hashing). Netflix Eureka and Ribbon popularized this pattern. The advantage is full control over load balancing; the disadvantage is coupling every service to the registry client library.
In server-side discovery, the calling service sends requests to a stable endpoint (a load balancer, reverse proxy, or Kubernetes Service), which routes to healthy backend instances. The caller does not need to know about the registry or load-balancing logic. Kubernetes Services (ClusterIP backed by kube-proxy iptables/IPVS rules), AWS ALB target groups, and Envoy-based service meshes (Istio, Linkerd) implement server-side discovery. This is simpler for service developers but adds a network hop and centralizes the routing logic.
DNS-based discovery is a universal approach that works with both patterns. Kubernetes provides automatic DNS entries (my-service.my-namespace.svc.cluster.local) that resolve to the Service's ClusterIP. Consul provides DNS interfaces where my-service.service.consul resolves to healthy instance IPs. The challenge with DNS is TTL caching: clients, OS resolvers, and intermediary caches may serve stale records for seconds to minutes after an instance dies, causing connection failures. Low TTLs (5-10 seconds) mitigate this but increase DNS query load.
Phone Directory Analogy
Service discovery works like a company phone directory. In the old world (monoliths), everyone had a fixed desk phone with a number that never changed -- you memorized it or wrote it on a sticky note (hardcoded config). In the modern world (microservices), employees are mobile -- they work from different desks, offices, or remotely every day. A phone directory service (the registry) tracks everyone's current number and location. Client-side discovery is like looking up a colleague in the directory app and calling them directly. Server-side discovery is like calling the company switchboard (load balancer), which connects you to the right person. Both need the directory to be accurate -- a stale entry means your call goes to an empty desk (dead instance).
Netflix
Netflix built Eureka, an open-source service discovery system, for their 1,000+ microservices on AWS. Each service registers with Eureka on startup and sends heartbeats every 30 seconds. Clients use the Ribbon library for client-side load balancing with Eureka as the registry. Eureka's AP design (availability over consistency) ensures services can still discover each other during network partitions, at the cost of potentially stale registrations.
HashiCorp / Consul Users
HashiCorp Consul is used by organizations like Stripe, Criteo, and Ticketmaster for multi-datacenter service discovery. Consul combines a service registry with health checking (TCP, HTTP, gRPC, script-based), DNS and HTTP discovery interfaces, and a key-value store for configuration. Its gossip-based protocol (Serf) detects node failures in seconds, and its Raft-based consensus ensures consistent registry state across datacenters.
Uber
Uber built Hyperbahn, a custom service discovery and routing layer using TChannel (their RPC protocol). Each service registers with Hyperbahn on startup, and all inter-service calls route through the Hyperbahn mesh, which handles discovery, load balancing, and circuit breaking. This server-side discovery model means individual services have zero knowledge of the service topology, simplifying service development at the cost of additional network hops.
| Aspect | Description |
|---|---|
| Client-Side vs. Server-Side Discovery | Client-side discovery (Eureka, Consul SDK) gives services full control over load balancing, routing, and failover, but requires every service to include the discovery client library, creating language-specific SDKs and coupling. Server-side discovery (K8s Services, ALB, Envoy) is language-agnostic and simpler for services but adds a network hop, centralizes routing logic, and makes custom load-balancing harder. |
| Consistency vs. Availability in Registry | A CP registry (ZooKeeper, etcd) provides consistent views but may become unavailable during partitions -- new services cannot register. An AP registry (Eureka) stays available during partitions but may serve stale entries (a deregistered service is still returned). Most service discovery systems favor AP: serving slightly stale data is better than returning no data at all. |
| DNS vs. API-Based Discovery | DNS is universal (every language and framework supports it) but limited: no health metadata, TTL caching delays propagation, and round-robin is the only native load-balancing strategy. API-based discovery (Consul HTTP API, Eureka REST) provides rich metadata (health status, zone, version, weight) but requires a client library. |
| Push vs. Pull Registration | Self-registration (service registers itself on startup) is simple but means the service must know the registry address and handle re-registration on restart. Third-party registration (a separate registrar watches for new containers and registers them) decouples services from the registry but adds a component that must be monitored and maintained. |
Airbnb's Migration from DNS to Envoy-Based Service Discovery
Scenario
Airbnb initially used DNS-based service discovery with Route53 for their microservices on AWS. As the number of services grew to 1,000+, DNS TTL caching caused recurring issues: when a service instance was replaced (auto-scaling, deployment), clients held stale DNS records for up to 60 seconds, sending requests to terminated instances. This resulted in ~0.5% error rate during deployments and scaling events.
Solution
Airbnb migrated to an Envoy-based service mesh with a custom control plane. Each service runs an Envoy sidecar that receives real-time endpoint updates from the control plane via xDS protocol (push-based, not DNS). Health checking runs at the Envoy level: unhealthy instances are removed from the load-balancing pool within 5 seconds. The control plane integrates with their Kubernetes clusters, EC2 auto-scaling groups, and legacy services, providing a unified discovery layer.
Outcome
Deployment-related error rates dropped from ~0.5% to near zero. Endpoint propagation time decreased from 60 seconds (DNS TTL) to under 5 seconds (real-time xDS push). The Envoy sidecar also enabled traffic shifting for canary deployments (1% -> 5% -> 25% -> 100%), circuit breaking for failing dependencies, and automatic retries with exponential backoff -- capabilities that previously required application-level code in each service.
See Service Discovery in action
Explore system design templates that use service discovery and run traffic simulations to see how these concepts perform under real load.
Browse Templates1What is the primary difference between client-side and server-side service discovery?
2Why is DNS-based service discovery problematic for rapidly changing environments?