Vetora logo
๐Ÿ›ก๏ธSecurity

Zero Trust Architecture

Zero Trust eliminates the concept of a trusted network perimeter. Every request is authenticated, authorized, and encrypted regardless of its origin -- whether from the public internet, the corporate network, or between internal microservices.

Overview

Traditional network security follows the 'castle and moat' model: everything inside the corporate network is trusted, and the perimeter (firewalls, VPN) keeps attackers out. This model fails in modern architectures for three reasons: cloud services exist outside the perimeter, remote work means users are outside the perimeter, and a single compromised service inside the perimeter can move laterally to any other service.

Zero Trust, formalized by John Kindervag at Forrester in 2010 and implemented at scale by Google as BeyondCorp, replaces perimeter trust with per-request verification. The core principle is: network location grants zero trust. A request from a pod on the same Kubernetes cluster is treated with the same skepticism as a request from the public internet. Every request must carry proof of identity, every identity's permissions are checked against policy, and all communication is encrypted.

In a microservice architecture, Zero Trust is implemented through several complementary mechanisms. Mutual TLS (mTLS) provides transport-level authentication: each service has a TLS certificate that proves its identity. When service A calls service B, both present their certificates, establishing mutual authentication. Service meshes (Istio, Linkerd) automate mTLS by injecting sidecar proxies that handle certificate issuance, rotation, and mutual verification transparently.

At the application level, identity-aware proxies (Google IAP, Cloudflare Access, Pomerium) sit at the edge and authenticate every user request against an identity provider. Instead of VPN access granting blanket network access, the proxy grants per-application, per-user access based on identity, device posture, and context (IP, time, MFA status). This eliminates VPN as a trust boundary.

Microsegmentation divides the network into isolated zones with explicit policies governing communication between zones. Instead of a flat network where any pod can reach any other pod, network policies (Kubernetes NetworkPolicy, AWS Security Groups) restrict which services can communicate. If a service is compromised, the attacker cannot move laterally because other services reject unauthorized connections.

The 'assume breach' principle means designing every component as if adjacent systems are already compromised. Encrypt data at rest. Log and audit every access. Use short-lived credentials. Implement circuit breakers that detect anomalous behavior. These practices limit the blast radius of any single compromise.

Key Points
  • 1Network location grants zero trust. A request from inside the VPN is not more trustworthy than one from the internet. Every request is authenticated, authorized, and encrypted regardless of origin.
  • 2mTLS provides service-to-service authentication. Each service proves its identity via TLS certificate. Service meshes (Istio, Linkerd) automate certificate lifecycle and mutual verification.
  • 3Identity-aware proxies replace VPNs. Instead of 'on the network = trusted', each application has its own access policy based on user identity, device health, and context. Users access only the applications they need.
  • 4Microsegmentation limits lateral movement. Kubernetes NetworkPolicies, AWS Security Groups, and service mesh policies restrict which services can communicate, containing breaches to a small blast radius.
  • 5Least privilege per request. Access is scoped to the minimum needed for each specific action, not granted as a broad role at login time. Policies evaluate user, resource, action, and context for every request.
  • 6Assume breach as a design principle. Encrypt all data at rest and in transit. Use short-lived credentials. Log all access. Implement anomaly detection. Design as if every adjacent system could be compromised.
Simple Example

Airport Security vs. Castle and Moat

The castle-and-moat model is like a gated community: once you pass the guard at the gate (VPN), you can walk into any house (service). Zero Trust is like airport security: every person (request) is checked at every checkpoint (service boundary). You show your boarding pass (identity) and go through screening (authorization) to access the gate area (application), even if you are already inside the terminal (network). If you try to enter the pilots-only lounge (admin service), you need additional credentials regardless of your location.

Real-World Examples

Google (BeyondCorp)

Google's BeyondCorp is the original enterprise-scale Zero Trust implementation. Google eliminated its corporate VPN entirely. All employee access to internal applications goes through an identity-aware proxy that checks: (1) user identity via SSO, (2) device certificate and health (patched, encrypted disk, managed by IT), (3) per-application access policy. A Google engineer on a personal laptop at a coffee shop and one on a corp laptop in the office go through the same verification.

Netflix

Netflix uses a Zero Trust approach for its 1,000+ microservices. All service-to-service communication uses mTLS via their custom service mesh. Services authenticate via short-lived certificates issued by an internal CA (using SPIFFE/SPIRE for identity). Authorization policies are enforced at the sidecar level, and every service call is logged for audit. Network-level access is restricted by security groups as defense-in-depth.

Microsoft (Azure AD Conditional Access)

Microsoft's enterprise Zero Trust solution uses Azure AD Conditional Access to evaluate user identity, device compliance, location, app sensitivity, and real-time risk signals for every authentication. High-risk sign-ins (new device + unusual location) require step-up MFA. Access tokens are short-lived with Continuous Access Evaluation (CAE) that can revoke sessions in near-real-time when risk signals change.

Trade-Offs
AspectDescription
Security Posture vs. ComplexityZero Trust adds authentication, authorization, and encryption at every boundary, increasing system complexity and operational overhead. Teams need expertise in mTLS, service meshes, policy engines, and certificate management. Simpler architectures may not need full Zero Trust.
Latency vs. Continuous VerificationPer-request policy evaluation adds latency (1-5ms per hop). Caching policy decisions improves performance but creates a window where revoked access is still honored. The trade-off between freshness and speed must be tuned per service.
User Experience vs. Verification DepthContinuous verification (re-check identity, device, context on every action) is the most secure but creates friction. Step-up authentication (prompt for MFA only on sensitive actions) balances security with usability.
mTLS Everywhere vs. Operational OverheadmTLS for all service communication is the ideal but requires certificate management at scale (issuance, rotation, revocation for every pod). Service meshes automate this but add sidecar resource overhead (10-50MB per pod) and operational complexity.
Case Study

Google Eliminates the Corporate VPN with BeyondCorp

Scenario

In the early 2010s, Google's engineers used a corporate VPN to access internal tools. The VPN was a bottleneck: slow during peak hours, unreliable on poor networks, and a single point of failure. More critically, once inside the VPN, a compromised device could access any internal application. The 2009 Aurora attack (attributed to Chinese state actors) demonstrated that perimeter security was insufficient -- attackers who breached the perimeter could move laterally across Google's network.

Solution

Google built BeyondCorp: an identity-aware proxy layer that authenticates every request to internal applications. Users connect directly to applications over the internet (no VPN). The proxy checks: user identity (SSO), device identity (certificate), device health (OS version, disk encryption, endpoint protection), and application-specific access policy. All traffic is encrypted. The VPN was decommissioned.

Outcome

Google's 100,000+ employees access internal applications without a VPN, from any device, on any network. A compromised device cannot access applications it is not authorized for, even on the corporate network. The approach was published as a series of papers (2014-2016) and has become the blueprint for enterprise Zero Trust. The BeyondCorp model influenced products like Google Cloud IAP, Cloudflare Access, and Zscaler Private Access.

Common Mistakes
  • โš Treating Zero Trust as a product rather than an architecture. Zero Trust is a set of principles (verify identity, enforce least privilege, assume breach), not a single product to purchase. A vendor selling a 'Zero Trust appliance' is marketing, not architecture. Implementation requires changes to identity, network, policy, and observability layers.
  • โš Keeping VPN as the 'backup' path. Maintaining a VPN alongside Zero Trust means maintaining two security models. Users and attackers will find the weaker path. Commit fully to identity-based access or the perimeter model; hybrid approaches have the complexity of both and the security of neither.
  • โš Only implementing mTLS without authorization. mTLS proves identity (this is the inventory-service) but does not check permissions (can the inventory-service call the payment-service?). mTLS without policy enforcement is authentication without authorization -- a common gap.
  • โš Not considering device posture. Authenticating a user is necessary but not sufficient. A valid user on a compromised device (malware, unpatched OS, no disk encryption) should receive restricted access. Continuous device health evaluation is a core Zero Trust principle.
Related Concepts

See Zero Trust Architecture in action

Explore system design templates that use zero trust architecture and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Add per-request mTLS verification and measure overhead

Metrics to watch
mtls_handshake_mscertificate_rotation_rateunauthorized_lateral_pctp99_latency_ms
Run Simulation
Test Your Understanding

1What is the core principle of Zero Trust architecture?

2Why is mTLS alone insufficient for Zero Trust between microservices?

Deeper Reading