1What is the primary benefit of using an OTel Collector between your application and observability backend?
OpenTelemetry (OTel) is the CNCF standard for vendor-neutral observability instrumentation. It provides a unified API, SDK, and Collector for generating, processing, and exporting logs, metrics, and traces from any application to any backend.
OpenTelemetry (OTel) emerged in 2019 from the merger of two competing CNCF projects: OpenTracing (a vendor-neutral tracing API) and OpenCensus (Google's instrumentation library for traces and metrics). The merger resolved the fragmentation that had plagued the observability ecosystem, where library authors had to choose between two incompatible instrumentation APIs. Today, OpenTelemetry is the second-most-active CNCF project after Kubernetes and the de facto standard for application instrumentation.
OTel's architecture has three layers. The API layer defines the interfaces for creating spans, recording metrics, and emitting logs. Library authors instrument against this API, which has zero dependencies and negligible overhead when no SDK is configured. The SDK layer implements the API with configurable samplers, processors, and exporters. Application operators configure the SDK to sample traces at 5%, batch metrics every 15 seconds, and export everything via OTLP. The Collector layer is a standalone process (deployed as a sidecar, DaemonSet, or gateway) that receives telemetry from SDKs, applies transformations (add attributes, filter sensitive data, sample), and exports to one or more backends.
Auto-instrumentation is OTel's killer feature. For Java, a single -javaagent flag instruments all HTTP clients (Apache, OkHttp), web frameworks (Spring, JAX-RS), database drivers (JDBC, Hibernate), messaging (Kafka, RabbitMQ), and gRPC calls without any code changes. Python, Node.js, .NET, and Go have similar auto-instrumentation packages. This means a team can go from zero observability to full traces across their microservice fleet in hours, not weeks.
The OTel Collector is the Swiss Army knife of telemetry pipelines. It can receive data in OTLP, Jaeger, Zipkin, Prometheus, or StatsD format. It can process data (add Kubernetes metadata, tail-sample traces, drop sensitive attributes). And it can export to 50+ backends. Running a Collector decouples instrumentation from backend choice -- you can switch from Jaeger to Tempo or from Prometheus to Datadog by changing the Collector config, not your application code.
OTLP (OpenTelemetry Protocol) is the wire format that ties everything together. It is a gRPC/HTTP protocol optimized for batched telemetry: spans, metrics, and logs share a common resource model (service.name, service.version, deployment.environment) and are encoded in Protobuf for efficiency. OTLP is natively supported by all major observability backends.
Adding OTel to a Python FastAPI Service
Install opentelemetry-distro and opentelemetry-instrumentation. Run 'opentelemetry-bootstrap -a install' to auto-detect and install instrumentation packages for FastAPI, httpx, SQLAlchemy, and Redis. Set environment variables: OTEL_SERVICE_NAME=orders-api, OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317. Start the app with 'opentelemetry-instrument python main.py'. Every incoming HTTP request now generates a trace with spans for FastAPI routing, SQLAlchemy queries, and Redis calls -- zero code changes. For business logic, add @tracer.start_as_current_span('validate_order') to custom functions.
eBay
eBay migrated from a proprietary tracing system to OpenTelemetry across 3,000+ microservices. The migration used a phased approach: first deploying OTel Collectors as sidecars that accepted both the legacy format and OTLP, then gradually switching services to OTel SDKs. Full migration took 12 months. Post-migration, eBay reduced observability vendor costs by 30% by routing telemetry through the Collector's processing pipeline (tail sampling, attribute filtering).
Skyscanner
Skyscanner uses OpenTelemetry auto-instrumentation for all Java and Python services, with manual instrumentation for critical business paths (flight search, price calculation). Their OTel Collector runs as a Kubernetes DaemonSet that enriches spans with pod metadata, applies tail-based sampling (keep errors and slow requests), and exports to Grafana Tempo. The standardized instrumentation enabled them to build a fleet-wide dependency map automatically.
GitHub
GitHub adopted OpenTelemetry for their Ruby on Rails monolith and surrounding Go microservices. They contributed the Ruby auto-instrumentation package back to the OTel project. By using the Collector as a telemetry gateway, they can A/B test observability backends (evaluating Honeycomb vs. Datadog) by duplicating telemetry to both backends without changing any application code.
| Aspect | Description |
|---|---|
| Vendor-Neutral vs. Vendor-Optimized | OTel provides portability but may not expose vendor-specific features (Datadog's profiling integration, Honeycomb's BubbleUp). Some teams use OTel for base instrumentation and add vendor-specific agents for advanced features. |
| Auto-Instrumentation vs. Manual Control | Auto-instrumentation is zero-effort but generates framework-level spans that may be noisy. Manual instrumentation adds business context but requires developer effort and ongoing maintenance as code changes. |
| Collector Sidecar vs. Gateway | Sidecar deployment (one Collector per pod) provides isolation but consumes resources per pod. Gateway deployment (shared Collector pool) is efficient but creates a centralized bottleneck. Most teams use a DaemonSet (one per node) as a compromise. |
| SDK Maturity Across Languages | Java and Go OTel SDKs are stable and battle-tested. Python and Node.js are GA but less mature. Ruby, PHP, and Rust are still evolving. Teams with polyglot stacks may find inconsistent instrumentation quality across languages. |
Zalando Migrates 500 Services to OpenTelemetry in 6 Months
Scenario
Zalando, Europe's largest online fashion platform, ran a proprietary tracing library that required manual integration in each service. With 500+ microservices and 200+ developers, keeping the library updated was a full-time job.
Solution
They migrated to OpenTelemetry Java auto-instrumentation, deploying the OTel Java agent as a default JVM argument in their Kubernetes base image. 80% of services were instrumented automatically with no code changes. For the remaining 20% (custom protocols, native code), they added manual instrumentation. The Collector runs as a DaemonSet with tail-based sampling.
Outcome
500 services instrumented in 6 months (vs. 3 years for the previous library), 40% reduction in mean time to resolve incidents, and the ability to switch trace backends by changing a Collector config file.
See OpenTelemetry in action
Explore system design templates that use opentelemetry and run traffic simulations to see how these concepts perform under real load.
Browse Templates1What is the primary benefit of using an OTel Collector between your application and observability backend?
2What does OpenTelemetry auto-instrumentation provide?