Vetora logo
🔄Cloud-Native

GitOps

GitOps uses Git as the single source of truth for declarative infrastructure and application configuration. Changes are made via pull requests, and automated agents continuously reconcile cluster state to match the desired state defined in the Git repository.

Overview

GitOps, coined by Weaveworks in 2017, is an operational framework that applies the principles of Git-based collaboration to infrastructure and application delivery. The core idea is simple but powerful: the desired state of your entire system -- Kubernetes manifests, Helm charts, Kustomize overlays, Terraform configurations -- lives in a Git repository. An automated agent running inside the cluster (Argo CD, Flux) watches the repository and continuously reconciles the actual cluster state to match what is declared in Git. If the repository says 'Deployment X should have 3 replicas running image v2.1.0,' the agent ensures exactly that.

The key distinction between GitOps and traditional CI/CD is the push vs. pull model. In traditional CI/CD, the pipeline pushes changes to the cluster: after building an image, the CI system runs 'kubectl apply' or 'helm upgrade' against the cluster. This requires the CI system to have cluster credentials and creates a one-way flow -- the pipeline knows what it deployed, but nothing continuously ensures the cluster matches the intent. In GitOps, the agent inside the cluster pulls changes from Git. The CI pipeline's only job is to build and push the container image, then update the image tag in the Git repository. The GitOps agent detects the change and reconciles.

This pull-based model provides several critical advantages. First, security: cluster credentials never leave the cluster. The GitOps agent has read access to Git and write access to the cluster, but CI pipelines and developers never directly access production clusters. Second, auditability: every change to the cluster is a Git commit with an author, timestamp, reviewer (from the PR), and diff. Third, self-healing: if someone manually modifies the cluster (kubectl edit, console change), the agent detects the drift and reverts it, continuously enforcing the desired state. Fourth, rollback: reverting a bad deployment is 'git revert' -- the agent reconciles the cluster to the previous state.

GitOps implementations typically separate the application repository (source code, Dockerfiles) from the configuration repository (Kubernetes manifests, Helm values). The CI pipeline builds the application, pushes the image, and opens a PR to the config repository updating the image tag. After PR approval and merge, the GitOps agent deploys the change. This separation enables independent release cycles for application code and infrastructure configuration, and prevents CI pipeline permissions from escalating to cluster access.

Key Points
  • 1Git is the single source of truth: the desired state of the cluster is fully declared in a Git repository. If it is not in Git, it should not be in the cluster. Manual changes are automatically reverted.
  • 2Pull-based reconciliation: the GitOps agent (Argo CD, Flux) runs inside the cluster, pulls desired state from Git, and applies it. This is safer than push-based CI/CD because cluster credentials never leave the cluster.
  • 3Continuous drift detection: the agent periodically (every 30s-5min) compares actual cluster state to the Git-declared state. Drift (manual changes, partial failures) is detected and corrected automatically.
  • 4Rollback via git revert: reverting a bad deployment is a Git operation, not a Kubernetes operation. The agent reconciles the cluster to the reverted commit's state. No need to remember which kubectl commands to undo.
  • 5Separation of concerns: application repo (code + Dockerfile + tests) is separate from config repo (K8s manifests + Helm values). CI builds images; GitOps deploys them. This prevents CI from needing cluster credentials.
  • 6Progressive delivery: GitOps tools integrate with Argo Rollouts and Flagger for canary deployments, blue-green switches, and automated rollback based on metrics (error rate, latency) -- all declared in Git.
Simple Example

Spreadsheet Sync Analogy

Imagine a shared spreadsheet (Git) that defines every item in your warehouse (cluster): 'Shelf A: 10 units of Product X, Shelf B: 5 units of Product Y.' A warehouse manager (GitOps agent) checks the spreadsheet every minute and ensures the physical warehouse matches exactly. If someone moves items without updating the spreadsheet (manual kubectl change), the manager puts them back (drift correction). To add a new product, you update the spreadsheet (Git commit), and the manager stocks it automatically. To undo a mistake, you revert the spreadsheet edit (git revert), and the manager adjusts the warehouse. Every change is tracked with who made it and when (git log). No one needs keys to the warehouse (cluster credentials) -- they just need access to update the spreadsheet.

Real-World Examples

Intuit

Intuit (TurboTax, QuickBooks) adopted Argo CD for GitOps delivery across 100+ Kubernetes clusters managing 3,000+ applications. Their platform team built a self-service developer portal where engineers define deployments in Git, and Argo CD reconciles across multiple clusters (dev, staging, production). The GitOps model reduced deployment-related incidents by 50% and provided complete audit trails for SOC 2 compliance -- every production change traces to a reviewed pull request.

Weaveworks (now part of the Flux ecosystem)

Weaveworks coined the term GitOps and built Flux, now a CNCF graduated project. Flux runs as a set of Kubernetes controllers that watch Git repositories and reconcile cluster state. Organizations like SAP, D2iQ, and Deutsche Telekom use Flux to manage multi-cluster, multi-tenant environments. Flux's OCI artifact support enables storing Kubernetes manifests alongside container images in OCI registries.

Tesla

Tesla uses GitOps patterns for their manufacturing infrastructure and vehicle software delivery platform. Changes to production configurations go through pull requests with automated policy checks (OPA) and multi-party approval. The GitOps agent ensures every factory Kubernetes cluster runs the exact configuration declared in Git, enabling them to replicate identical software environments across global manufacturing sites within minutes.

Trade-Offs
AspectDescription
Security (Pull) vs. Simplicity (Push)Pull-based GitOps (Argo CD, Flux inside the cluster) means CI pipelines never need cluster credentials, reducing attack surface. Push-based CD (CI runs kubectl apply) is simpler to set up but requires storing kubeconfig in CI, creating a credential management and rotation burden. For production, pull-based is strongly preferred.
Strict Reconciliation vs. Operational FlexibilityStrict drift correction (revert all manual changes) ensures consistency but can be frustrating during debugging (an engineer's temporary change is immediately reverted). Most teams allow manual overrides in non-production environments and enforce strict reconciliation only in production.
Monorepo vs. Polyrepo for ConfigA single config repository for all services simplifies cross-cutting changes (cluster-wide policy updates) but creates merge conflicts and couples deployment cycles. Per-service config repos enable independent deployments but fragment visibility and make fleet-wide changes harder. ApplicationSets (Argo CD) help manage many applications in a monorepo.
Manifest Generation vs. Static ManifestsStoring plain YAML in Git is simple and auditable. Storing Helm charts or Kustomize overlays is more flexible but means the actual applied YAML is generated at reconciliation time, making diffs harder to review. Rendering manifests in CI and committing the output (the 'rendered manifests' pattern) combines auditability with flexibility.
Case Study

Chick-fil-A's Edge GitOps for 3,000+ Restaurant Clusters

Scenario

Chick-fil-A runs Kubernetes clusters in each of their 3,000+ restaurants for edge computing (order processing, kitchen display systems, drive-through optimization). Managing 3,000+ individual clusters with traditional CI/CD was operationally impossible. Configuration changes needed to roll out incrementally (not all restaurants at once), and each cluster needed to self-heal from intermittent internet connectivity.

Solution

Chick-fil-A adopted a GitOps model using Argo CD in each restaurant cluster. A central Git repository defines the desired state for each restaurant (with restaurant-specific overrides via Kustomize). Each cluster's Argo CD instance pulls its configuration from the repository. Progressive rollout is managed by updating Git for a small set of restaurants first (canary), monitoring metrics, then expanding to all restaurants. Clusters that lose internet connectivity continue running their last-known-good configuration and reconcile when connectivity returns.

Outcome

The team manages 3,000+ clusters with a platform team of under 20 engineers. Configuration changes propagate to all restaurants within 4 hours (progressive rollout). Clusters self-heal from connectivity disruptions without manual intervention. Every change to every restaurant's configuration is traceable to a Git commit, satisfying audit requirements. Rollback of a bad configuration is a single 'git revert' that propagates automatically.

Common Mistakes
  • Storing cluster credentials in CI pipelines for 'kubectl apply'. This is push-based CD, not GitOps. It requires managing and rotating credentials in CI, and creates a security risk if the CI system is compromised. Use a pull-based agent (Argo CD, Flux) inside the cluster instead.
  • Not separating application code repos from config repos. If the application source code and Kubernetes manifests live in the same repository, every code commit (including non-deployed changes) triggers a reconciliation attempt. Separate repos enable independent release cycles and cleaner audit trails.
  • Disabling drift detection. Some teams disable Argo CD's auto-sync to allow manual changes during debugging, then forget to re-enable it. Production clusters should always have drift detection and correction enabled. Use non-production environments for ad-hoc experimentation.
  • Using Git for secrets. Kubernetes Secrets stored in plain YAML in Git are base64-encoded, not encrypted. Use Sealed Secrets (Bitnami), SOPS (Mozilla), or External Secrets Operator to encrypt secrets in Git or reference them from external stores (Vault, AWS Secrets Manager).
Related Concepts

See GitOps in action

Explore system design templates that use gitops and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Detect and reconcile infrastructure drift via GitOps

Metrics to watch
drift_detection_time_msreconciliation_latency_msdeploy_frequencyrollback_count
Run Simulation
Test Your Understanding

1What is the key difference between GitOps and traditional push-based CI/CD?

2Why should Kubernetes Secrets not be stored as plain YAML in a GitOps repository?

Deeper Reading