Distributed Tracing in Spring Boot: OpenTelemetry + Zipkin/Tempo (2026 Guide)

In a monolith, a slow request shows up in a profiler — you find the slow method and fix it. In a microservices architecture, a slow request might span an API gateway, an auth service, a product service, two DB calls, and a Redis lookup. Without distributed tracing, you get a 500ms response time and no idea which service caused it. With tracing, you see the full call tree and the exact milliseconds spent in each service.

Setup: Spring Boot 3 + Micrometer Tracing

Spring Boot 3 includes Micrometer Tracing, which wraps OpenTelemetry and provides a vendor-neutral API:

<!-- Micrometer Tracing with OpenTelemetry bridge -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>

<!-- Export to Zipkin (or swap for Tempo, Jaeger) -->
<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-zipkin</artifactId>
</dependency>

<!-- Auto-instrument Spring MVC, WebClient, RestClient, Kafka -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-reporter-wavefront</artifactId>
    <scope>compile</scope>
    <optional>true</optional>
</dependency>

# application.properties
management.tracing.sampling.probability=1.0   # 100% in dev, 0.1 in prod
management.zipkin.tracing.endpoint=http://zipkin:9411/api/v2/spans

# Correlate logs with traces
logging.pattern.level=%5p [${spring.application.name:},%X{traceId:-},%X{spanId:-}]

With these dependencies, Spring Boot automatically instruments:

Incoming HTTP requests (creates root span)
Outgoing WebClient / RestClient calls (propagates trace context)
@KafkaListener methods (continues trace from message headers)
JdbcTemplate queries (creates DB spans)

What You Get Automatically

Trace: GET /api/orders/dashboard  [total: 347ms]
├── http GET /api/orders/dashboard          [12ms]  — API Gateway
│   ├── auth-service: validateToken         [8ms]   — Auth Service  
│   └── order-service: getDashboard         [327ms] — Order Service ← SLOW
│       ├── jdbc: SELECT * FROM orders      [180ms] ← BOTTLENECK
│       ├── redis: GET user:42:prefs        [2ms]
│       └── http GET /inventory/bulk        [145ms] — Inventory Service
│           └── jdbc: SELECT * FROM stock   [140ms]

One glance tells you: the slow JDBC query in order-service is the problem. Without tracing, you'd spend hours adding logs and reproducing.

Adding Custom Spans

@Service
@RequiredArgsConstructor
public class ReportService {

    private final Tracer tracer;

    public Report generateReport(Long userId) {
        // Create a custom span for the expensive operation
        Span span = tracer.nextSpan()
            .name("report.generate")
            .tag("userId", userId.toString())
            .tag("reportType", "monthly")
            .start();

        try (Tracer.SpanInScope ws = tracer.withSpan(span)) {
            Report report = expensiveReportGeneration(userId);
            span.tag("report.rows", String.valueOf(report.getRowCount()));
            return report;
        } catch (Exception e) {
            span.error(e);
            throw e;
        } finally {
            span.end();
        }
    }
}

Custom spans appear in the trace tree with your tags — you can filter by userId or reportType in Zipkin/Grafana Tempo.

Trace Propagation Across Kafka

Spring Kafka 3.x + Micrometer Tracing automatically propagates trace context in message headers:

// Producer — trace context injected into message headers automatically
@Service
public class OrderProducer {
    private final KafkaTemplate<String, OrderEvent> kafkaTemplate;

    public void publishOrderCreated(OrderEvent event) {
        kafkaTemplate.send("orders", event.getOrderId().toString(), event);
        // Micrometer injects: traceparent, tracestate headers
    }
}

// Consumer — trace resumed from headers automatically
@KafkaListener(topics = "orders")
public void onOrder(OrderEvent event) {
    // This method runs in the same trace as the producer
    // Kafka consumer span appears as child of the producer span
    processOrder(event);
}

The full trace shows: HTTP request → Kafka produce → Kafka consume → DB write — one continuous trace across async boundaries.

Sampling Strategy for Production

# Dev: trace everything
management.tracing.sampling.probability=1.0

# Prod: sample 10% of requests
management.tracing.sampling.probability=0.1

For production, always-on tracing adds 2-5% overhead. With 10% sampling, overhead drops to ~0.5%. For debugging specific slow requests, use head-based sampling with a higher rate temporarily, or tail-based sampling (trace all requests, export only slow ones) if your collector supports it.

Correlating Logs with Traces

# Structured logging with trace/span IDs
logging.pattern.console=%d{HH:mm:ss} %highlight(%-5level) [%blue(%X{traceId})/%blue(%X{spanId})] %logger{36} - %msg%n

With this pattern, every log line includes the trace and span IDs:

14:32:01 ERROR [4bf92f3577b34da6/00f067aa0ba902b7] o.s.w.s.OrderService - Failed to load order 42

Now you can:

Get an error from your logging system (Loki, Datadog, CloudWatch)
Copy the traceId
Look it up in Zipkin/Tempo to see the full call tree for that exact request

Common Mistakes to Avoid

sampling.probability=1.0 in production — 100% sampling on a high-traffic service adds noticeable overhead and generates enormous trace data; use 5-10% in prod
Not propagating Authorization header to downstream services — the trace spans the services but security context doesn't; pass the JWT explicitly
Creating spans inside tight loops — each span has serialization overhead; span individual user requests, not iterations over a list
Ignoring baggage — Micrometer Tracing Baggage propagates arbitrary key-value pairs across service boundaries; use it for userId, tenantId instead of repeating them as tags on every span

Summary

Distributed tracing with Spring Boot 3 + Micrometer Tracing + OpenTelemetry is largely automatic: HTTP calls, DB queries, and Kafka messages are instrumented out of the box. Add custom spans for domain-significant operations, correlate with logs via MDC trace IDs, and tune sampling to 5-10% in production. The result is full visibility into cross-service latency — you find the bottleneck by reading the trace, not by adding more logs.

Complete Observability for Your Java App

JOptimize's live profiling complements distributed tracing by showing you slowest methods at the JVM level — where tracing stops, profiling begins.

IntelliJ Plugin — live method-level profiling: Install JOptimize for IntelliJ
Web Dashboard — full application performance analysis: Analyze your project free →

See every millisecond of your app's execution — free scan.