✨ Hands-On Tutorial

.NET 8 Essentials: Observability with OpenTelemetry — Tracing, Metrics & Structured Logging

Your Orders API emits traces to Jaeger in milliseconds. Prometheus scrapes latency histograms every 15 seconds. Grafana alerts when error rates breach your SLO. This isn't magic. It's OpenTelemetry wired into .NET 8.

Most .NET apps log to files and hope for the best. When production breaks, you grep logs for hours, guess at database bottlenecks, and rebuild mental models of what happened three services ago. You don't have to guess. Wire tracing, metrics, and structured logs into one pipeline. See the entire request waterfall. Correlate errors with spans. Alert on latency before users complain. This tutorial shows you how.

What You'll Build

You'll build a production-ready Orders API wired with OpenTelemetry (traces, metrics, logs), EF Core (SQLite), HttpClient (external call), and a background queue (Channel<T>). You'll run an OTel Collector and visualize traces in Jaeger and metrics/logs in Grafana/Prometheus:

  • Distributed tracing that follows requests across HTTP, database, and async queues
  • Custom spans and attributes with semantic conventions and error recording
  • RED metrics (Rate, Errors, Duration) with histograms and low-cardinality labels
  • Structured logs that correlate with traces via TraceId/SpanId injection
  • Sampling strategies (head-based, parent-based, tail-sampling in the Collector)
  • OTLP export to Jaeger (traces), Prometheus (metrics), and Grafana (logs)
  • SLO monitoring with burn-rate alerts and health checks

Why Observability?

Observability answers three questions: Is the system working? Why is it slow? What broke during that deploy? Logs tell you what happened. Metrics tell you how often. Traces tell you where.

Traditional monitoring tracks known failure modes. You write alerts for CPU, memory, disk. Observability handles unknown unknowns. A new query pattern murders your database. A third-party API starts timing out. A cache invalidation bug cascades across services. You didn't predict these. Observability lets you debug them anyway.

Golden Signals

Google's SRE book defines four golden signals: latency, traffic, errors, and saturation. Track these and you catch 90% of production issues. Latency measures response time. Traffic counts requests per second. Errors tracks failure rate. Saturation shows resource limits (CPU, memory, connections).

OpenTelemetry captures all four. Traces record latency per request. Metrics aggregate traffic and errors. Resource attributes expose saturation. Wire them together and you get complete visibility.

SLOs and Cardinality

Service Level Objectives define acceptable performance. Example: 99.9% of requests complete under 300ms. You measure this with metrics. But cardinality kills metric systems. Every unique combination of labels creates a new time series. Label by userId and you explode storage. Label by endpoint and you stay sane.

SLO Definition
// Define an SLO: 99.9% availability, p95 latency ≤ 300ms
public class OrderServiceSLO
{
    public const double TargetAvailability = 0.999; // 99.9%
    public const int TargetLatencyMs = 300;         // p95 ≤ 300ms
    public const int ErrorBudgetMinutes = 43;       // 43 min/month downtime
}

// Track with metrics
private readonly Histogram _requestDuration;
private readonly Counter _requestErrors;

public OrderService(IMeterFactory meterFactory)
{
    var meter = meterFactory.Create("DotNetGuide.Orders");

    _requestDuration = meter.CreateHistogram(
        "http.server.request.duration",
        unit: "ms",
        description: "Request duration in milliseconds");

    _requestErrors = meter.CreateCounter(
        "http.server.request.errors",
        description: "Total request errors");
}
Cardinality Risks

Never use high-cardinality values (userId, requestId, timestamp) as metric labels. Each unique label combination creates a new time series. A million users means a million time series. Backends like Prometheus choke on this. Use low-cardinality labels: endpoint, HTTP method, status code. Put high-cardinality IDs in span attributes or events, not metric labels.

Project Setup

OpenTelemetry works in any .NET 8 project. We'll use a Minimal API to keep setup simple. Add the SDK, configure exporters, and set resource attributes so traces, metrics, and logs correlate across services.

Create the Project

Start with a Web API template. Install OpenTelemetry packages for tracing, metrics, logs, and OTLP export.

Terminal
dotnet new web -o OrdersApi
cd OrdersApi

# Core OpenTelemetry packages
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol

# Instrumentation libraries
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Instrumentation.EntityFrameworkCore

# EF Core for database demo
dotnet add package Microsoft.EntityFrameworkCore.Sqlite
dotnet add package Microsoft.EntityFrameworkCore.Design

Configure OpenTelemetry in Program.cs

Wire tracing, metrics, and logging into the dependency injection container. Set a ResourceBuilder with service name, version, and environment. These attributes appear in every signal, letting you filter and correlate across services.

Program.cs (Basic Setup)
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
using OpenTelemetry.Metrics;
using OpenTelemetry.Logs;

var builder = WebApplication.CreateBuilder(args);

// Configure resource (shared across traces, metrics, logs)
var resourceBuilder = ResourceBuilder.CreateDefault()
    .AddService(
        serviceName: "OrdersApi",
        serviceVersion: "1.0.0",
        serviceInstanceId: Environment.MachineName)
    .AddAttributes(new Dictionary
    {
        ["deployment.environment"] = builder.Environment.EnvironmentName
    });

// Add OpenTelemetry: Tracing
builder.Services.AddOpenTelemetry()
    .ConfigureResource(resource => resource
        .AddService(
            serviceName: "OrdersApi",
            serviceVersion: "1.0.0",
            serviceInstanceId: Environment.MachineName)
        .AddAttributes(new Dictionary
        {
            ["deployment.environment"] = builder.Environment.EnvironmentName
        }))
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddEntityFrameworkCoreInstrumentation()
        .AddOtlpExporter(opts =>
        {
            opts.Endpoint = new Uri("http://localhost:4317"); // Collector gRPC
        }))
    .WithMetrics(metrics => metrics
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddOtlpExporter(opts =>
        {
            opts.Endpoint = new Uri("http://localhost:4317");
        }));

// Add logging with OpenTelemetry
builder.Logging.AddOpenTelemetry(logging =>
{
    logging.SetResourceBuilder(resourceBuilder);
    logging.AddOtlpExporter(opts =>
    {
        opts.Endpoint = new Uri("http://localhost:4317");
    });
});

var app = builder.Build();
app.MapGet("/", () => "Orders API with OpenTelemetry");
app.Run();

Docker Compose for Collector, Jaeger & Prometheus

The OpenTelemetry Collector receives OTLP data and fans it out to Jaeger (traces) and Prometheus (metrics). Grafana visualizes both. This decouples your app from backend-specific formats.

docker-compose.yml
version: '3.8'

services:
  # OpenTelemetry Collector
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP HTTP receiver
      - "8889:8889"   # Prometheus metrics exporter

  # Jaeger (traces)
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686" # Jaeger UI
      - "14250:14250" # gRPC receiver

  # Prometheus (metrics)
  prometheus:
    image: prom/prometheus:latest
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"   # Prometheus UI

  # Grafana (visualization)
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"   # Grafana UI
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true

  prometheus:
    endpoint: "0.0.0.0:8889"

  logging:
    loglevel: debug

processors:
  batch:
    timeout: 10s

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [jaeger, logging]

    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus, logging]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging]

Run docker-compose up -d and visit http://localhost:16686 (Jaeger), http://localhost:9090 (Prometheus), and http://localhost:3000 (Grafana).

Distributed Tracing

Distributed tracing follows a request across service boundaries. Each operation creates a span. Spans nest into a trace. You see the entire waterfall: HTTP handler → database query → external API call → background job.

Create an ActivitySource

.NET uses Activity for tracing. Create an ActivitySource for your service. Start activities (spans) for important operations. OpenTelemetry instruments ASP.NET Core, HttpClient, and EF Core automatically, but custom spans capture business logic.

ActivitySource Setup
using System.Diagnostics;

public class OrderService
{
    private static readonly ActivitySource ActivitySource =
        new("DotNetGuide.Orders", "1.0.0");

    public async Task CreateOrderAsync(CreateOrderRequest request)
    {
        // Start a custom span
        using var activity = ActivitySource.StartActivity("OrderService.CreateOrder");

        // Add attributes (key-value pairs)
        activity?.SetTag("order.customer_id", request.CustomerId);
        activity?.SetTag("order.total", request.Total);

        // Simulate order creation
        var order = new Order
        {
            Id = Guid.NewGuid(),
            CustomerId = request.CustomerId,
            Total = request.Total,
            CreatedAt = DateTime.UtcNow
        };

        // Record an event (milestone within the span)
        activity?.AddEvent(new ActivityEvent("Order validated"));

        await Task.Delay(50); // Simulate work

        activity?.AddEvent(new ActivityEvent("Order persisted"));

        return order;
    }
}

// Register the ActivitySource with OpenTelemetry
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddSource("DotNetGuide.Orders") // Required to capture custom spans
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddOtlpExporter());
	

Span Links and Baggage

Span links connect traces that aren't parent-child. Example: An HTTP request creates an order, then enqueues background work. The background span links to the request span. Baggage propagates cross-cutting keys (like tenantId) across services without adding them to every span.

Links and Baggage
using System.Diagnostics;

// Add baggage in the HTTP handler
app.MapPost("/orders", async (CreateOrderRequest req) =>
{
    using var activity = ActivitySource.StartActivity("POST /orders");

    // Set baggage (propagates to all child spans)
    Baggage.SetBaggage("tenant.id", req.TenantId);
    Baggage.SetBaggage("user.id", req.UserId);

    var order = await orderService.CreateOrderAsync(req);

    // Enqueue background work
    await backgroundQueue.EnqueueAsync(order.Id);

    return Results.Created($"/orders/{order.Id}", order);
});

// Background worker creates a linked span
public async Task ProcessOrderAsync(Guid orderId, ActivityContext parentContext)
{
    // Create a span linked to the original request
    var links = new[] { new ActivityLink(parentContext) };
    using var activity = ActivitySource.StartActivity(
        "ProcessOrder",
        ActivityKind.Internal,
        parentContext: default,
        links: links);

    // Baggage is still accessible
    var tenantId = Baggage.GetBaggage("tenant.id");
    activity?.SetTag("tenant.id", tenantId);

    // Process order...
}

Links show non-hierarchical relationships. Baggage avoids repeating the same attributes on every span. Use baggage sparingly—it travels with every trace context.

HTTP & DB Instrumentation

OpenTelemetry auto-instruments ASP.NET Core, HttpClient, and EF Core. You get spans for every HTTP request, database query, and outgoing HTTP call. No manual code required. Just register the instrumentation libraries.

Enable Automatic Instrumentation

Call AddAspNetCoreInstrumentation(), AddHttpClientInstrumentation(), and AddEntityFrameworkCoreInstrumentation(). Configure options to enrich spans or redact sensitive data.

Instrumentation Configuration
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        // ASP.NET Core: HTTP requests
        .AddAspNetCoreInstrumentation(opts =>
        {
            // Enrich spans with additional data
            opts.EnrichWithHttpRequest = (activity, request) =>
            {
                activity.SetTag("http.client_ip", request.HttpContext.Connection.RemoteIpAddress);
            };

            // Filter out health check endpoints
            opts.Filter = ctx => !ctx.Request.Path.StartsWithSegments("/health");
        })

        // HttpClient: Outgoing HTTP calls
        .AddHttpClientInstrumentation(opts =>
        {
            opts.EnrichWithHttpRequestMessage = (activity, request) =>
            {
                activity.SetTag("http.url.full", request.RequestUri?.ToString());
            };
        })

        // EF Core: Database queries
        .AddEntityFrameworkCoreInstrumentation(opts =>
        {
            // Enable detailed query logging (sanitizes parameters by default)
            opts.SetDbStatementForText = true;
            opts.SetDbStatementForStoredProcedure = true;
        })

        .AddOtlpExporter());

Database Span Example

When you query the database, EF Core instrumentation creates spans with SQL statements. Parameters are sanitized to avoid leaking PII. You see query duration and connection info in Jaeger.

Database Query Trace
// This query automatically creates an EF Core span
public async Task GetOrderAsync(Guid orderId)
{
    using var activity = ActivitySource.StartActivity("GetOrder");
    activity?.SetTag("order.id", orderId);

    // EF Core span appears as child:
    // db.system: sqlite
    // db.statement: SELECT * FROM Orders WHERE Id = ?
    // db.operation: ExecuteReader
    return await dbContext.Orders.FindAsync(orderId);
}

// The resulting trace waterfall:
// └─ GET /orders/{id}              (200ms)
//    ├─ GetOrder                   (180ms)
//    │  └─ sqlite ExecuteReader    (150ms)  ← Auto-instrumented
//    └─ Serialize response         (10ms)

SQL statements appear in the db.statement attribute. Connection strings are redacted. Slow queries show up in the trace with exact durations.

Custom Spans & Attributes

Auto-instrumentation covers framework code. Custom spans capture business logic. Add attributes to describe what the code does. Use semantic conventions when they exist. Record errors with status codes and exception events.

Naming Conventions

Name spans with a verb and noun: OrderService.CreateOrder, Inventory.CheckStock, Payment.Authorize. Use dot notation for hierarchy. Span names should be low-cardinality—don't include IDs or user-specific data.

Custom Span with Attributes
public async Task ValidateInventoryAsync(Guid orderId, List items)
{
    using var activity = ActivitySource.StartActivity(
        "Inventory.ValidateStock",
        ActivityKind.Internal);

    // Add semantic attributes (follow OpenTelemetry conventions)
    activity?.SetTag("order.id", orderId);
    activity?.SetTag("inventory.item_count", items.Count);

    foreach (var item in items)
    {
        // Record events for important milestones
        activity?.AddEvent(new ActivityEvent($"Checking stock for SKU {item.Sku}"));

        var available = await CheckStockAsync(item.Sku, item.Quantity);
        if (!available)
        {
            // Record error semantics
            activity?.SetStatus(ActivityStatusCode.Error, "Insufficient inventory");
            activity?.SetTag("inventory.failed_sku", item.Sku);
            return false;
        }
    }

    activity?.SetStatus(ActivityStatusCode.Ok);
    return true;
}

Error Recording

When an exception occurs, record it with RecordException() and set status to Error. This marks the span red in Jaeger and lets you filter for errors in queries.

Exception Recording
public async Task ProcessOrderAsync(Guid orderId)
{
    using var activity = ActivitySource.StartActivity("ProcessOrder");
    activity?.SetTag("order.id", orderId);

    try
    {
        var order = await dbContext.Orders.FindAsync(orderId);
        if (order == null)
        {
            throw new OrderNotFoundException(orderId);
        }

        // Process order...
        activity?.SetStatus(ActivityStatusCode.Ok);
        return order;
    }
    catch (Exception ex)
    {
        // Record exception event with stack trace
        activity?.RecordException(ex);

        // Set span status to Error
        activity?.SetStatus(ActivityStatusCode.Error, ex.Message);

        // Re-throw to let middleware handle it
        throw;
    }
}

// Exception events appear in Jaeger with:
// - exception.type: OrderNotFoundException
// - exception.message: Order 123 not found
// - exception.stacktrace: (full stack trace)

Always call SetStatus() to mark success or failure. Backends use this to calculate error rates and filter broken traces.

Metrics

Metrics aggregate numbers over time. Counters track totals. Histograms measure distributions. UpDownCounters track values that go up and down. Use Views to control cardinality and aggregation.

RED Metrics Pattern

Track Rate (requests/sec), Errors (error count), and Duration (latency). This gives you 80% of what you need to debug production. Add USE metrics (Utilization, Saturation, Errors) for infrastructure.

RED Metrics
using System.Diagnostics.Metrics;

public class OrderMetrics
{
    private readonly Counter _orderCount;
    private readonly Counter _orderErrors;
    private readonly Histogram _orderDuration;

    public OrderMetrics(IMeterFactory meterFactory)
    {
        var meter = meterFactory.Create("DotNetGuide.Orders");

        // Rate: Total orders created
        _orderCount = meter.CreateCounter(
            "orders.created",
            description: "Total orders created");

        // Errors: Failed orders
        _orderErrors = meter.CreateCounter(
            "orders.errors",
            description: "Total order errors");

        // Duration: Order processing time
        _orderDuration = meter.CreateHistogram(
            "orders.duration",
            unit: "ms",
            description: "Order processing duration");
    }

    public void RecordOrderCreated(string endpoint, double durationMs)
    {
        _orderCount.Add(1, new("endpoint", endpoint));
        _orderDuration.Record(durationMs, new("endpoint", endpoint));
    }

    public void RecordOrderError(string endpoint, string errorType)
    {
        _orderErrors.Add(1,
            new("endpoint", endpoint),
            new("error.type", errorType));
    }
}

Views for Cardinality Control

Views let you aggregate or drop high-cardinality labels. Example: Drop userId labels, keep only endpoint. Configure Views in AddMetrics().

Metrics with Views
builder.Services.AddOpenTelemetry()
    .WithMetrics(metrics => metrics
        .AddMeter("DotNetGuide.Orders")

        // Define a View to control histogram buckets
        .AddView(
            instrumentName: "orders.duration",
            new ExplicitBucketHistogramConfiguration
            {
                // Custom buckets for latency (ms)
                Boundaries = new[] { 10, 50, 100, 200, 500, 1000, 2000, 5000 }
            })

        // Drop high-cardinality labels (example: drop user.id)
        .AddView(
            instrumentName: "orders.created",
            new MetricStreamConfiguration
            {
                TagKeys = new[] { "endpoint" } // Keep only endpoint label
            })

        .AddOtlpExporter());
Cardinality Control

Every unique combination of label values creates a new time series. Label by endpoint and you get 10-50 series. Label by userId and you get millions. Prometheus and most backends choke on high cardinality. Use Views to drop or aggregate labels. For user-specific data, use traces or logs instead of metrics.

Histograms and Percentiles

Histograms record distributions. They let you calculate p50, p95, p99 latency. OpenTelemetry histograms use buckets. Configure buckets in Views to match your expected latency range.

Histogram Usage
// Record latency for percentile calculation
var stopwatch = Stopwatch.StartNew();
try
{
    await ProcessOrderAsync(order);
    _orderDuration.Record(stopwatch.ElapsedMilliseconds,
        new("endpoint", "/orders"),
        new("status", "success"));
}
catch (Exception ex)
{
    _orderDuration.Record(stopwatch.ElapsedMilliseconds,
        new("endpoint", "/orders"),
        new("status", "error"));
    throw;
}

// Query in Prometheus:
// histogram_quantile(0.95, rate(orders_duration_bucket[5m]))
// → p95 latency over 5 minutes

Logs

Structured logs record discrete events. OpenTelemetry's logging provider injects TraceId and SpanId into every log. This correlates logs with traces. You see log lines inline with spans in Jaeger or Grafana.

Structured Logging with ILogger

Use ILogger with structured parameters. Don't concatenate strings. Log semantic keys like orderId, customerId, errorType. OpenTelemetry exports these as attributes.

Structured Logging
public class OrderService
{
    private readonly ILogger _logger;

    public OrderService(ILogger logger)
    {
        _logger = logger;
    }

    public async Task CreateOrderAsync(CreateOrderRequest request)
    {
        // Structured log with placeholders (not string interpolation!)
        _logger.LogInformation(
            "Creating order for customer {CustomerId} with total {Total}",
            request.CustomerId, request.Total);

        try
        {
            var order = await SaveOrderAsync(request);

            _logger.LogInformation(
                "Order created successfully: {OrderId}",
                order.Id);

            return order;
        }
        catch (Exception ex)
        {
            _logger.LogError(ex,
                "Failed to create order for customer {CustomerId}",
                request.CustomerId);
            throw;
        }
    }
}

// Logs include TraceId and SpanId automatically:
// {
//   "timestamp": "2026-02-02T10:30:00Z",
//   "level": "Information",
//   "message": "Order created successfully: 123",
//   "OrderId": "123",
//   "TraceId": "4bf92f3577b34da6a3ce929d0e0e4736",
//   "SpanId": "00f067aa0ba902b7"
// }

Log Correlation with Traces

Enable the OpenTelemetry logging provider. It uses log scopes to inject TraceId and SpanId. Backends like Grafana Tempo link logs to traces. Click a span, see its logs. Click a log, jump to its trace.

Log Correlation Setup
// Program.cs: Enable OpenTelemetry logging
builder.Logging.AddOpenTelemetry(logging =>
{
    logging.SetResourceBuilder(resourceBuilder);

    // Include formatted message in logs
    logging.IncludeFormattedMessage = true;

    // Include log scopes (contains TraceId/SpanId)
    logging.IncludeScopes = true;

    logging.AddOtlpExporter(opts =>
    {
        opts.Endpoint = new Uri("http://localhost:4317");
    });
});

// Logs now include:
// - TraceId: correlates with the active trace
// - SpanId: correlates with the active span
// - Scopes: additional context (user, tenant, etc.)

// Example: View correlated logs in Grafana
// 1. Open trace in Grafana Tempo
// 2. Click on a span
// 3. See all logs with matching TraceId/SpanId inline

Correlated logs turn debugging from guesswork into precision. See the trace waterfall, then read the exact logs from that request. No more grepping millions of lines.

Sampling Strategies

Sampling reduces trace volume. Head-based sampling decides at the start of a trace. Tail-based sampling decides after the trace completes. Parent-based sampling ensures child spans follow the parent's decision.

Head-Based Sampling

Sample a percentage of traces. Example: 10% in dev, 1% in production. This cuts costs but might drop important traces (errors, slow requests). Configure sampling in the SDK.

Head-Based Sampling
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        // Sample 10% of traces (head-based)
        .SetSampler(new TraceIdRatioBasedSampler(0.1))

        // Parent-based: if parent is sampled, children are too
        .SetSampler(new ParentBasedSampler(
            new TraceIdRatioBasedSampler(0.1)))

        .AddAspNetCoreInstrumentation()
        .AddOtlpExporter());

// Production: lower ratio to reduce costs
// builder.Environment.IsProduction() ? 0.01 : 0.1

Tail-Based Sampling in the Collector

Tail sampling keeps all errors and slow requests, even if head sampling dropped them. Configure this in the OpenTelemetry Collector, not the SDK. The Collector buffers traces, applies policies, then exports.

Collector Tail Sampling
# otel-collector-config.yaml (tail sampling)
processors:
  batch:
    timeout: 10s

  tail_sampling:
    policies:
      # Always sample errors
      - name: errors
        type: status_code
        status_code:
          status_codes: [ERROR]

      # Always sample slow requests (>500ms)
      - name: slow-requests
        type: latency
        latency:
          threshold_ms: 500

      # Sample 5% of everything else
      - name: baseline
        type: probabilistic
        probabilistic:
          sampling_percentage: 5

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling, batch]
      exporters: [jaeger]
Sampling Tradeoffs

Head sampling is cheap and fast but might drop critical traces. Tail sampling captures errors and outliers but costs more CPU and memory (Collector must buffer traces). Start with head-based sampling at 10% in dev. In production, use 1-5% head sampling + tail sampling for errors/latency. Parent-based sampling ensures distributed traces stay complete.

Propagation

Propagation carries trace context across process boundaries. HTTP calls use W3C TraceContext headers. Message queues embed context in message metadata. This keeps distributed traces connected.

W3C TraceContext over HTTP

ASP.NET Core and HttpClient propagate context automatically using traceparent and tracestate headers. You don't write code. Just enable instrumentation and it works.

HTTP Propagation
// Service A: Makes HTTP call to Service B
app.MapGet("/orders/{id}", async (Guid id, HttpClient http) =>
{
    // Current trace context automatically propagated via headers:
    // traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
    var response = await http.GetAsync($"https://httpbin.org/delay/1");

    // Service B receives headers and continues the trace
    return Results.Ok(await response.Content.ReadAsStringAsync());
});

// No manual header management needed!
// OpenTelemetry.Instrumentation.Http handles:
// - Extracting context from incoming requests
// - Injecting context into outgoing requests
// - Creating child spans for HttpClient calls

Propagation in Message Queues

Message queues don't have HTTP headers. Store trace context in message properties or metadata. Extract it when consuming. Use ActivityLinks to connect async work back to the original request.

Queue Propagation
using System.Diagnostics;

// Producer: Serialize trace context into message metadata
public class OrderMessage
{
    public Guid OrderId { get; set; }
    public string TraceParent { get; set; } = string.Empty;
    public string? TraceState { get; set; }
}

app.MapPost("/orders", async (CreateOrderRequest req) =>
{
    var order = await orderService.CreateOrderAsync(req);

    // Capture current trace context
    var activity = Activity.Current;
    var message = new OrderMessage
    {
        OrderId = order.Id,
        TraceParent = activity?.Id ?? string.Empty,
        TraceState = activity?.TraceStateString
    };

    // Enqueue with context
    await backgroundQueue.EnqueueAsync(message);
    return Results.Created($"/orders/{order.Id}", order);
});

// Consumer: Extract context and create linked span
public async Task ProcessMessageAsync(OrderMessage message)
{
    // Parse trace context from message
    ActivityContext parentContext = default;
    if (ActivityContext.TryParse(message.TraceParent, message.TraceState, out var ctx))
    {
        parentContext = ctx;
    }

    // Create span linked to original request
    var links = parentContext != default
        ? new[] { new ActivityLink(parentContext) }
        : null;

    using var activity = ActivitySource.StartActivity(
        "ProcessOrder",
        ActivityKind.Consumer,
        parentContext: default,
        links: links);

    // Process order...
    await ProcessOrderAsync(message.OrderId);
}

Links show the relationship without making the consumer a child span. This avoids inflating the original trace duration with async work that happens later.

Visualizing

Visualization turns raw telemetry into insights. Jaeger shows trace waterfalls. Prometheus graphs metric trends. Grafana combines traces, metrics, and logs in one dashboard.

Jaeger Trace Waterfall

Open Jaeger at http://localhost:16686. Search for traces by service, operation, or tags. Click a trace to see the waterfall. Each span shows duration, tags, and events. Red spans indicate errors.

Example: Search for service=OrdersApi operation=POST /orders. See the HTTP request span, child DB query spans, and external HTTP call spans. Click a span to view attributes like order.id, http.status_code, db.statement.

Prometheus Metrics Queries

Open Prometheus at http://localhost:9090. Query metrics with PromQL. Example: rate(orders_created[5m]) shows orders per second. histogram_quantile(0.95, rate(orders_duration_bucket[5m])) calculates p95 latency.

prometheus.yml (Scrape Config)
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'otel-collector'
    static_configs:
      - targets: ['otel-collector:8889']

  # Optional: scrape app metrics directly (if using Prometheus exporter)
  - job_name: 'orders-api'
    static_configs:
      - targets: ['host.docker.internal:5000']

Grafana Dashboards

Open Grafana at http://localhost:3000 (admin/admin). Add Prometheus as a data source. Add Jaeger or Tempo for traces. Create a dashboard with RED metrics: request rate, error rate, latency percentiles. Add trace links to jump from metrics to traces.

Example panel: Graph histogram_quantile(0.95, rate(http_server_request_duration_bucket[5m])) to show p95 latency over time. When latency spikes, click the timestamp and view traces from that period.

Health, Alerts & SLOs

SLOs define acceptable service levels. Alerts notify when you risk breaking SLOs. Health checks export readiness and liveness as metrics. This completes the observability loop: measure, alert, respond.

Health Checks as Metrics

ASP.NET Core health checks report service status. Export them as OpenTelemetry metrics. Prometheus scrapes them. Grafana alerts when health degrades.

Health Check Metrics
// Add health checks
builder.Services.AddHealthChecks()
    .AddDbContextCheck("database")
    .AddUrlGroup(new Uri("https://httpbin.org/status/200"), "external-api");

// Export health as metrics
var healthMeter = new Meter("DotNetGuide.Health");
var healthGauge = healthMeter.CreateObservableGauge(
    "service.health.status",
    () =>
    {
        var healthReport = healthCheckService.CheckHealthAsync().Result;
        return healthReport.Status switch
        {
            HealthStatus.Healthy => 1,
            HealthStatus.Degraded => 0.5,
            HealthStatus.Unhealthy => 0,
            _ => 0
        };
    });

// Map health endpoints
app.MapHealthChecks("/health/ready");
app.MapHealthChecks("/health/live");

Burn-Rate Alerts for SLOs

SLOs define error budgets. Example: 99.9% availability allows 43 minutes downtime per month. Burn-rate alerts warn when you're spending budget too fast. Set fast alerts (5m window) and slow alerts (1h window) to catch both spikes and trends.

Prometheus Alert Rules
# prometheus-alerts.yml
groups:
  - name: slo-alerts
    interval: 30s
    rules:
      # Fast burn: error rate > 5% over 5 minutes
      - alert: HighErrorRateFast
        expr: |
          rate(http_server_request_errors_total[5m])
          / rate(http_server_request_total[5m]) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected (fast burn)"
          description: "Error rate {{ $value | humanizePercentage }} over 5m"

      # Slow burn: error rate > 1% over 1 hour
      - alert: HighErrorRateSlow
        expr: |
          rate(http_server_request_errors_total[1h])
          / rate(http_server_request_total[1h]) > 0.01
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Elevated error rate (slow burn)"
          description: "Error rate {{ $value | humanizePercentage }} over 1h"

      # Latency SLO breach: p95 > 300ms
      - alert: LatencySLOBreach
        expr: |
          histogram_quantile(0.95,
            rate(http_server_request_duration_bucket[5m])) > 300
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "p95 latency exceeds SLO (300ms)"
          description: "p95 latency: {{ $value }}ms"

Configure alerts in Prometheus or Grafana. Send notifications to Slack, PagerDuty, or email. Track error budget burn in dashboards. Adjust SLOs based on real usage patterns.

Production Hardening

Production observability needs PII scrubbing, cardinality budgets, retention policies, and cost controls. Don't ship everything. Sample intelligently. Redact sensitive data. Compress and archive old traces.

PII Scrubbing

Never log passwords, credit cards, or PII. Use processors to redact sensitive attributes. Configure scrubbing in the Collector or SDK.

PII Redaction
// Redact sensitive attributes in the SDK
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation(opts =>
        {
            opts.EnrichWithHttpRequest = (activity, request) =>
            {
                // Remove sensitive headers
                if (request.Headers.TryGetValue("Authorization", out var auth))
                {
                    activity.SetTag("http.request.header.authorization", "[REDACTED]");
                }
            };
        })
        .AddProcessor(new PiiRedactionProcessor()));

// Custom processor to scrub PII
public class PiiRedactionProcessor : BaseProcessor
{
    private readonly string[] _sensitiveKeys = { "password", "ssn", "credit_card" };

    public override void OnEnd(Activity activity)
    {
        foreach (var tag in activity.Tags)
        {
            if (_sensitiveKeys.Any(k => tag.Key.Contains(k, StringComparison.OrdinalIgnoreCase)))
            {
                activity.SetTag(tag.Key, "[REDACTED]");
            }
        }
    }
}

Cardinality Budgets

Enforce cardinality limits. Drop or aggregate high-cardinality labels. Example: Allow endpoint, method, status_code. Drop userId, customerId, requestId.

Cardinality Control
// Define allowed metric labels
builder.Services.AddOpenTelemetry()
    .WithMetrics(metrics => metrics
        .AddView(
            instrumentName: "http.server.request.duration",
            new MetricStreamConfiguration
            {
                // Only keep low-cardinality labels
                TagKeys = new[] { "http.method", "http.route", "http.status_code" }
            }));

// Collector processor for cardinality limiting (otel-collector-config.yaml)
// processors:
//   attributes:
//     actions:
//       - key: user.id
//         action: delete
//       - key: request.id
//         action: delete

Retention and Cost Control

Traces and logs grow fast. Set retention policies. Compress old data. Archive to S3 or blob storage. Downsample metrics after 30 days. Drop noisy logs (keep only warnings and errors in production).

Cost Controls

Observability gets expensive at scale. Sample more aggressively in production (1-5% head + tail for errors). Drop debug/info logs; keep warn/error. Compress traces before archiving. Set retention: 7 days hot, 30 days warm, 90 days archived. Monitor your monitoring costs—don't let telemetry cost more than the infrastructure it monitors.

End-to-End Demo

This complete example wires everything together: tracing, metrics, logs, EF Core, HttpClient, and a background queue. You'll see a single request flow through the API, database, external call, and async processing—all visible in one trace.

EF Core Model and DbContext

Models.cs
// Order entity
public class Order
{
    public Guid Id { get; set; }
    public string CustomerId { get; set; } = string.Empty;
    public decimal Total { get; set; }
    public DateTime CreatedAt { get; set; }
    public string Status { get; set; } = "Pending";
}

// DbContext
public class OrderDbContext : DbContext
{
    public OrderDbContext(DbContextOptions options)
        : base(options) { }

    public DbSet Orders => Set();
}

// Request DTO
public record CreateOrderRequest(string CustomerId, decimal Total);

Complete Program.cs with Full Instrumentation

Program.cs (Complete)
using System.Diagnostics;
using System.Diagnostics.Metrics;
using System.Threading.Channels;
using Microsoft.EntityFrameworkCore;
using OpenTelemetry.Logs;
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

var builder = WebApplication.CreateBuilder(args);

// Resource (shared across signals)
var resourceBuilder = ResourceBuilder.CreateDefault()
    .AddService("OrdersApi", serviceVersion: "1.0.0")
    .AddAttributes(new Dictionary
    {
        ["deployment.environment"] = builder.Environment.EnvironmentName
    });

// OpenTelemetry: Traces, Metrics, Logs
builder.Services.AddOpenTelemetry()
    .ConfigureResource(resource => resource
        .AddService("OrdersApi", serviceVersion: "1.0.0")
        .AddAttributes(new Dictionary
        {
            ["deployment.environment"] = builder.Environment.EnvironmentName
        }))
    .WithTracing(tracing => tracing
        .AddSource("DotNetGuide.Orders")
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddEntityFrameworkCoreInstrumentation()
        .AddOtlpExporter(opts => opts.Endpoint = new Uri("http://localhost:4317")))
    .WithMetrics(metrics => metrics
        .AddMeter("DotNetGuide.Orders")
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddOtlpExporter(opts => opts.Endpoint = new Uri("http://localhost:4317")));

builder.Logging.AddOpenTelemetry(logging =>
{
    logging.SetResourceBuilder(resourceBuilder);
    logging.IncludeFormattedMessage = true;
    logging.IncludeScopes = true;
    logging.AddOtlpExporter(opts => opts.Endpoint = new Uri("http://localhost:4317"));
});

// EF Core SQLite
builder.Services.AddDbContext(opts =>
    opts.UseSqlite("Data Source=orders.db"));

// HttpClient for external API
builder.Services.AddHttpClient("external", client =>
{
    // Use httpbin.org by default; override with config
    client.BaseAddress = new Uri(
        builder.Configuration["ExternalApi:BaseUrl"] ?? "https://httpbin.org");
});

// Background queue
builder.Services.AddSingleton(Channel.CreateUnbounded());
builder.Services.AddHostedService();

var app = builder.Build();

// Ensure DB created
using (var scope = app.Services.CreateScope())
{
    var db = scope.ServiceProvider.GetRequiredService();
    db.Database.EnsureCreated();
}

// ActivitySource and Meter
var activitySource = new ActivitySource("DotNetGuide.Orders", "1.0.0");
var meter = new Meter("DotNetGuide.Orders", "1.0.0");
var orderCounter = meter.CreateCounter("orders.created");
var orderDuration = meter.CreateHistogram("orders.duration", unit: "ms");

app.MapPost("/orders", async (
    CreateOrderRequest req,
    OrderDbContext db,
    IHttpClientFactory httpFactory,
    Channel queue,
    ILogger logger) =>
{
    using var activity = activitySource.StartActivity("POST /orders");
    activity?.SetTag("order.customer_id", req.CustomerId);

    var stopwatch = Stopwatch.StartNew();

    try
    {
        logger.LogInformation("Creating order for {CustomerId}", req.CustomerId);

        // Save to DB
        var order = new Order
        {
            Id = Guid.NewGuid(),
            CustomerId = req.CustomerId,
            Total = req.Total,
            CreatedAt = DateTime.UtcNow
        };
        db.Orders.Add(order);
        await db.SaveChangesAsync();

        activity?.AddEvent(new ActivityEvent("Order saved to database"));

        // Call external API (httpbin.org/delay simulates latency)
        var http = httpFactory.CreateClient("external");
        var response = await http.GetAsync("/delay/1");
        activity?.AddEvent(new ActivityEvent("External API call completed"));

        // Enqueue background work
        await queue.Writer.WriteAsync(order.Id);
        activity?.AddEvent(new ActivityEvent("Background job enqueued"));

        logger.LogInformation("Order {OrderId} created successfully", order.Id);

        orderCounter.Add(1, new("endpoint", "/orders"), new("status", "success"));
        orderDuration.Record(stopwatch.ElapsedMilliseconds, new("endpoint", "/orders"));

        return Results.Created($"/orders/{order.Id}", order);
    }
    catch (Exception ex)
    {
        logger.LogError(ex, "Failed to create order for {CustomerId}", req.CustomerId);
        activity?.RecordException(ex);
        activity?.SetStatus(ActivityStatusCode.Error, ex.Message);

        orderCounter.Add(1, new("endpoint", "/orders"), new("status", "error"));
        orderDuration.Record(stopwatch.ElapsedMilliseconds, new("endpoint", "/orders"));

        throw;
    }
});

app.MapGet("/orders/{id:guid}", async (Guid id, OrderDbContext db) =>
{
    var order = await db.Orders.FindAsync(id);
    return order is not null ? Results.Ok(order) : Results.NotFound();
});

app.Run();

// Background worker
public class OrderProcessorService : BackgroundService
{
    private readonly Channel _queue;
    private readonly IServiceProvider _services;
    private static readonly ActivitySource ActivitySource = new("DotNetGuide.Orders");

    public OrderProcessorService(Channel queue, IServiceProvider services)
    {
        _queue = queue;
        _services = services;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        await foreach (var orderId in _queue.Reader.ReadAllAsync(stoppingToken))
        {
            using var activity = ActivitySource.StartActivity(
                "ProcessOrder",
                ActivityKind.Consumer);
            activity?.SetTag("order.id", orderId);

            using var scope = _services.CreateScope();
            var db = scope.ServiceProvider.GetRequiredService();
            var logger = scope.ServiceProvider.GetRequiredService>();

            logger.LogInformation("Processing order {OrderId}", orderId);

            var order = await db.Orders.FindAsync(orderId);
            if (order is not null)
            {
                // Simulate validation + inventory check
                await Task.Delay(200);
                activity?.AddEvent(new ActivityEvent("Inventory validated"));

                order.Status = "Confirmed";
                await db.SaveChangesAsync();

                logger.LogInformation("Order {OrderId} confirmed", orderId);
            }
        }
    }
}

Run the Demo

Terminal
# Start Collector, Jaeger, Prometheus
docker-compose up -d

# Run the API
dotnet run

# Create an order
curl -X POST http://localhost:5000/orders \
  -H "Content-Type: application/json" \
  -d '{"customerId":"user123","total":99.99}'

# View in Jaeger: http://localhost:16686
# - Search for service "OrdersApi"
# - See trace waterfall: HTTP → DB → External API → Background job
# - Click spans to view attributes, events, logs

# Query metrics in Prometheus: http://localhost:9090
# - rate(orders_created_total[5m])
# - histogram_quantile(0.95, rate(orders_duration_bucket[5m]))

You'll see one trace spanning the HTTP request, database save, external API call (httpbin.org), and background processing. Logs from each step appear inline with the trace in Grafana. Metrics show request rate and latency distribution.

Frequently Asked Questions

Do I need all three signals (traces, metrics, logs)?

Yes. Each signal answers different questions. Traces show request flow across services. Metrics reveal patterns and aggregates. Logs capture granular events. Start with traces and RED metrics (Rate, Errors, Duration). Add correlated logs once tracing works. All three together give complete visibility.

How much should I sample in production?

Start with 10% head-based sampling in dev. In production, use 1-5% head sampling plus tail sampling in the Collector to keep all errors and slow requests. Parent-based sampling ensures child spans follow the parent decision. Tail sampling costs more CPU but captures important traces you'd miss with pure head sampling.

Where should I put custom attributes?

Add attributes to the span closest to the business action. Use semantic conventions (http.*, db.*, messaging.*) when they exist. For custom attributes, keep keys stable and low-cardinality. Never use userId or requestId as attribute keys—use high-cardinality IDs as span names or events, not labels.

Which exporter should I use?

Use OTLP (gRPC or HTTP) to export to the OpenTelemetry Collector. Let the Collector handle fan-out to Jaeger, Prometheus, Grafana, or cloud backends. This decouples your app from vendor-specific formats and lets you change backends without redeploying. Console exporter works for local dev only.

How do I correlate logs with traces?

Enable the OpenTelemetry logging provider. It automatically injects TraceId and SpanId into log scopes. Use structured logging with ILogger. When you view a trace in Jaeger or Grafana, logs with matching TraceId appear inline. This connects what happened (logs) with where it happened (spans).

Back to Tutorials