1What is the primary purpose of model lineage tracking in a model registry?
An ML model registry is a centralized store for versioned model artifacts, metadata, and lifecycle state. It is the 'source of truth' that connects training pipelines to serving infrastructure, enabling reproducibility, auditability, and governance across the model lifecycle from experiment to production to retirement.
A model registry is to ML what a container registry is to software: the canonical store from which production systems pull artifacts. Without a registry, models are ad-hoc files on S3, shared via Slack, with no versioning, no lineage, and no rollback capability. When a production model misbehaves, the questions 'which version is running?', 'what data was it trained on?', and 'what was the previous version?' become impossible to answer.
The core abstraction in a model registry is the registered model, which is a named entity (e.g., 'search-ranking-v2') with multiple versions. Each version is an immutable artifact bundle: the serialized model weights, the preprocessing code or pipeline, a model signature (input/output schema), and metadata including training data snapshot ID, hyperparameters, evaluation metrics, the code commit hash, and the training pipeline run ID. Immutability is critical -- you must never overwrite a version, only create new ones.
Lifecycle management tracks each version through stages: 'Experimental' (created during development), 'Staging' (passed automated quality gates, awaiting human review), 'Production' (actively serving traffic), and 'Archived' (retired, retained for audit). Promotion between stages can be automated (if evaluation metrics exceed thresholds) or require manual approval (for high-risk models like credit scoring). This mirrors the software release process but adds ML-specific gates: accuracy, calibration, fairness metrics, and data lineage checks.
Lineage tracking is the model registry's most valuable feature for regulated industries. A complete lineage record traces from a production prediction back to: the model version, the training run, the training data snapshot, the feature transformations, and the code commit. In financial services, healthcare, and other regulated domains, this lineage is required for audit and compliance. Even in unregulated domains, lineage is essential for debugging: 'this model's accuracy degraded because training data version 47 contained a schema change that corrupted the user_age feature.'
Deploying a New Recommendation Model
A data scientist trains a new recommendation model and registers it as 'reco-model' version 14 in the model registry. The registration includes the model weights (2.3 GB), the feature preprocessing pipeline (Python pickle), evaluation metrics (NDCG@10 = 0.42, up from 0.40 in version 13), and metadata (trained on data snapshot 2026-06-01, code commit abc123). An automated pipeline promotes version 14 to 'Staging' because NDCG improved. A human reviewer approves promotion to 'Production' after checking fairness metrics. The serving infrastructure detects the new Production version, loads it into a canary replica, and gradually shifts traffic from version 13 to version 14 over 2 hours while monitoring prediction quality.
Uber (Michelangelo)
Uber's Michelangelo model registry stores thousands of model versions across hundreds of models for ride pricing, ETA estimation, fraud detection, and driver matching. Every model version includes a pointer to the training data snapshot, the feature store version, and the evaluation report. Models require automated evaluation and team lead approval before promotion to Production. Rollback to the previous version takes under 60 seconds via traffic routing.
Netflix
Netflix's model registry integrates with Metaflow (training), Meson (scheduling), and their A/B testing platform. Each model version records not just metrics but also the A/B test results that validated it. Models go through a staged rollout: shadow mode (predictions logged but not shown), canary (5% traffic), and full rollout. The registry retains the full history, enabling Netflix to analyze how model quality has evolved over years.
Google (Vertex AI Model Registry)
Vertex AI Model Registry supports model versioning, deployment to endpoints with traffic splitting, and integration with Vertex AI Pipelines for automated training-to-deployment workflows. It stores model artifacts in Google Cloud Storage, supports custom metadata labels for governance, and provides an API for querying model lineage. Google uses a similar internal registry for serving models across Search, Ads, and YouTube.
| Aspect | Description |
|---|---|
| Strict Governance vs. Iteration Speed | Requiring manual approval and extensive evaluation for every model version increases safety but slows deployment. Fast-moving teams (recommendations, personalization) may need automated promotion with only metric gates. High-risk models (fraud, credit, medical) require human review and documentation. A tiered governance model matches rigor to risk. |
| Centralized vs. Federated Registry | A single organization-wide registry enables cross-team model discovery and consistent governance but can become a bottleneck with complex access controls. Per-team registries are simpler but prevent reuse and create governance blind spots. Most organizations use a centralized registry with team-level namespaces. |
| Artifact Size vs. Reproducibility | Storing full model artifacts (7-140 GB for LLMs) in the registry is expensive but enables instant deployment. Storing only metadata + a pointer to external storage (S3, GCS) is cheaper but adds dependency on the external store's availability and consistency. |
Airbnb's ML Model Governance Platform
Scenario
Airbnb had 100+ ML models in production with no centralized tracking. A pricing model update caused a revenue drop of $2M before the team realized the new model had been trained on a data snapshot with a known data quality issue. There was no way to quickly identify which model version was running, what data it was trained on, or how to rollback.
Solution
Airbnb built a model governance platform centered on a registry that tracked every model version's lineage (data snapshot, feature store version, code commit, evaluation metrics). Promotion to Production required passing automated quality gates (accuracy, calibration, fairness) and manager approval. The serving layer maintained the previous version warm for instant rollback.
Outcome
Model-related incidents decreased 70%. Mean time to rollback dropped from 2 hours (manual redeployment) to 90 seconds (traffic routing switch). The data quality issue that caused the $2M loss would have been caught by the automated data snapshot validation gate. Cross-team model reuse increased because data scientists could discover and understand existing models through the registry's catalog.
See ML Model Registry in action
Explore system design templates that use ml model registry and run traffic simulations to see how these concepts perform under real load.
Browse Templates1What is the primary purpose of model lineage tracking in a model registry?
2Why should model versions in a registry be immutable?