.NET 8 BackgroundService: Cancellation Tokens, Retry Policies & Graceful Shutdown Patterns

The Survival Kit for BackgroundService in Production

The BackgroundService base class makes starting a long-running background task in .NET trivially easy: inherit, override ExecuteAsync, register with AddHostedService. What it does not make obvious is everything that can go wrong between that first working implementation and a service that handles restarts, transient failures, and shutdown signals correctly in production.

A BackgroundService that ignores the cancellation token blocks graceful shutdown and gets force-killed by Kubernetes, corrupting in-flight work. One that swallows exceptions and retries without backoff hammers a failing dependency until the circuit breaks everywhere. One that uses async void internally crashes the process with no stack trace. One that drains its queue indefinitely on shutdown holds up a deployment for minutes. None of these are exotic failure modes — they are the predictable collision points between a minimal BackgroundService implementation and what production actually demands. This article gives you the patterns that survive all four.

Cancellation Token Propagation: The One Rule That Cannot Be Broken

The stoppingToken passed to ExecuteAsync is the host's mechanism for telling your service to stop. It is triggered by application shutdown, a SIGTERM from the OS or container orchestrator, or an explicit IHostApplicationLifetime.StopApplication() call. Every async operation inside ExecuteAsync must receive this token. Every one. A single await SomeOperation() without a cancellation token is a potential shutdown blocker — if that operation is in progress when the host signals stop, it will run to completion (or timeout) before ExecuteAsync can return, holding up the entire shutdown sequence.

CancellationPropagation.cs — Correct vs Incorrect Token Propagation Patterns
// ════════════════════════════════════════════════════════════════════════════
// WRONG: stoppingToken not propagated — shutdown blocks until operation completes
// ════════════════════════════════════════════════════════════════════════════

public class NaiveWorker : BackgroundService
{
    private readonly IMessageQueue _queue;
    private readonly ILogger _logger;

    public NaiveWorker(IMessageQueue queue, ILogger logger)
        => (_queue, _logger) = (queue, logger);

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            // WRONG: no cancellation token — blocks shutdown if queue is empty
            // and the dequeue call has a long internal timeout
            var message = await _queue.DequeueAsync();

            // WRONG: no cancellation token — blocks shutdown for the full
            // duration of message processing regardless of shutdown signal
            await ProcessMessageAsync(message);

            // WRONG: no cancellation token — delay runs to completion before
            // the loop condition is checked again
            await Task.Delay(TimeSpan.FromSeconds(5));
        }
    }

    private async Task ProcessMessageAsync(Message message) { /* ... */ }
}


// ════════════════════════════════════════════════════════════════════════════
// RIGHT: stoppingToken propagated to every async call
// ════════════════════════════════════════════════════════════════════════════

public class CorrectWorker : BackgroundService
{
    private readonly IMessageQueue _queue;
    private readonly ILogger _logger;

    public CorrectWorker(IMessageQueue queue, ILogger logger)
        => (_queue, _logger) = (queue, logger);

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        _logger.LogInformation("Worker started.");

        // OperationCanceledException is the normal exit path when stoppingToken
        // fires. Catch it here, log the shutdown, and return cleanly.
        // Do NOT catch it inside the loop — let it propagate to ExecuteAsync.
        try
        {
            while (!stoppingToken.IsCancellationRequested)
            {
                // stoppingToken cancels the dequeue wait immediately on shutdown
                var message = await _queue.DequeueAsync(stoppingToken);

                // stoppingToken propagated — processing is interrupted on shutdown
                await ProcessMessageAsync(message, stoppingToken);
            }
        }
        catch (OperationCanceledException)
        {
            // Normal shutdown path — stoppingToken was cancelled.
            // Log and return. Do not re-throw — the host already knows it is stopping.
            _logger.LogInformation("Worker stopping — cancellation requested.");
        }

        _logger.LogInformation("Worker stopped.");
    }

    private async Task ProcessMessageAsync(Message message, CancellationToken ct)
    {
        // Pass ct to every async call within processing
        await _dbContext.SaveChangesAsync(ct);
        await _httpClient.PostAsync("/api/notify", content, ct);
        // ... etc.
    }
}


// ════════════════════════════════════════════════════════════════════════════
// PER-OPERATION TIMEOUT WITH LINKED TOKEN SOURCE
// ════════════════════════════════════════════════════════════════════════════
// When you need a per-operation timeout AND host shutdown cancellation,
// link both into a single token. The operation is cancelled by whichever
// fires first — the timeout or the host shutdown.

protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
    try
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            var message = await _queue.DequeueAsync(stoppingToken);

            // Per-operation timeout: 30 seconds or host shutdown, whichever is first
            using var operationCts = CancellationTokenSource.CreateLinkedTokenSource(
                stoppingToken,
                new CancellationTokenSource(TimeSpan.FromSeconds(30)).Token);

            try
            {
                await ProcessMessageAsync(message, operationCts.Token);
            }
            catch (OperationCanceledException) when (!stoppingToken.IsCancellationRequested)
            {
                // The operation-level timeout fired, not the host shutdown.
                // Log it, move to the next message.
                _logger.LogWarning("Message {Id} processing timed out.", message.Id);
            }
            // If stoppingToken fired, OperationCanceledException propagates
            // to the outer try/catch and exits the loop cleanly.
        }
    }
    catch (OperationCanceledException)
    {
        _logger.LogInformation("Worker stopping — cancellation requested.");
    }
}


// ════════════════════════════════════════════════════════════════════════════
// async void: THE PATTERN THAT KILLS YOUR SERVICE SILENTLY
// ════════════════════════════════════════════════════════════════════════════

// WRONG: async void — exceptions escape to the thread pool and crash the process
// or are silently swallowed depending on runtime version
private async void FireAndForgetWork() // ← never do this in a BackgroundService
{
    await Task.Delay(1000);
    throw new InvalidOperationException("This kills the process with no useful stack trace.");
}

// RIGHT: async Task — exceptions are observable, awaitable, and loggable
private async Task DoWorkAsync(CancellationToken ct)
{
    await Task.Delay(1000, ct);
    // If this throws, the exception propagates to the awaiting caller in ExecuteAsync
    // where it can be caught, logged, and handled correctly.
}
// In ExecuteAsync: await DoWorkAsync(stoppingToken);  — observable, correct.

The when (!stoppingToken.IsCancellationRequested) filter on the inner OperationCanceledException catch is the detail that makes per-operation timeouts composable with host shutdown. Without it, catching OperationCanceledException inside the loop swallows the host shutdown signal — the outer try/catch never fires, the loop continues, and your service ignores the shutdown entirely. The filter distinguishes between "the operation timed out" (handle it, continue the loop) and "the host is shutting down" (let it propagate, exit the loop). This pattern appears whenever you combine two cancellation dimensions and is worth internalising as a standard.

Retry Policies: Backoff That Does Not Block Shutdown

A BackgroundService that processes messages from a queue or calls an external dependency will encounter transient failures. The correct response to a transient failure is a retry with exponential backoff — not an immediate retry that hammers a failing dependency, and not a bare catch that swallows the exception and moves on. The critical constraint is that every delay in the retry cycle must be cancellable by the stoppingToken. A retry delay of 64 seconds that ignores the stopping token means a shutdown during that delay waits the full 64 seconds before the host can proceed.

RetryPolicy.cs — Manual Exponential Backoff & Polly v8 ResiliencePipeline
// ════════════════════════════════════════════════════════════════════════════
// OPTION A: Manual exponential backoff respecting stoppingToken
// ════════════════════════════════════════════════════════════════════════════

public class RetryWorker : BackgroundService
{
    private const int MaxRetryAttempts = 5;
    private readonly ILogger _logger;
    private readonly IExternalService _service;

    public RetryWorker(IExternalService service, ILogger logger)
        => (_service, _logger) = (service, logger);

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        try
        {
            while (!stoppingToken.IsCancellationRequested)
            {
                var message = await _queue.DequeueAsync(stoppingToken);
                await ProcessWithRetryAsync(message, stoppingToken);
            }
        }
        catch (OperationCanceledException)
        {
            _logger.LogInformation("Worker stopping.");
        }
    }

    private async Task ProcessWithRetryAsync(Message message, CancellationToken ct)
    {
        var attempt = 0;

        while (true)
        {
            try
            {
                await _service.ProcessAsync(message, ct);
                return;   // success — exit retry loop
            }
            catch (OperationCanceledException)
            {
                throw;   // host shutdown — do not retry, propagate immediately
            }
            catch (Exception ex) when (attempt < MaxRetryAttempts)
            {
                attempt++;

                // Exponential backoff: 1s, 2s, 4s, 8s, 16s — capped at 60s
                var delay = TimeSpan.FromSeconds(Math.Min(Math.Pow(2, attempt), 60));

                _logger.LogWarning(ex,
                    "Processing attempt {Attempt}/{Max} failed. Retrying in {Delay}s.",
                    attempt, MaxRetryAttempts, delay.TotalSeconds);

                // CRITICAL: pass ct to Task.Delay — shutdown cancels the wait immediately
                // Task.Delay(delay) without ct blocks shutdown for the full delay duration
                await Task.Delay(delay, ct);
            }
            catch (Exception ex)
            {
                // Max retries exceeded — log and move on (or dead-letter the message)
                _logger.LogError(ex,
                    "Message {Id} failed after {Max} attempts. Moving to dead-letter.",
                    message.Id, MaxRetryAttempts);

                await _deadLetterQueue.EnqueueAsync(message, ct);
                return;
            }
        }
    }
}


// ════════════════════════════════════════════════════════════════════════════
// OPTION B: Polly v8 ResiliencePipeline — declarative retry with backoff
// dotnet add package Polly
// dotnet add package Microsoft.Extensions.Http.Resilience (for HTTP clients)
// ════════════════════════════════════════════════════════════════════════════

// Register the pipeline in DI (Program.cs):
builder.Services.AddResiliencePipeline("worker-retry", pipelineBuilder =>
{
    pipelineBuilder.AddRetry(new RetryStrategyOptions
    {
        MaxRetryAttempts   = 5,
        BackoffType        = DelayBackoffType.Exponential,
        Delay              = TimeSpan.FromSeconds(1),
        MaxDelay           = TimeSpan.FromSeconds(60),
        UseJitter          = true,   // randomises delay ±25% — prevents retry storms
        ShouldHandle       = new PredicateBuilder()
            .Handle()
            .Handle(),
        OnRetry = args =>
        {
            logger.LogWarning(
                "Retry {Attempt} after {Delay}ms due to: {Exception}",
                args.AttemptNumber,
                args.RetryDelay.TotalMilliseconds,
                args.Outcome.Exception?.Message);
            return ValueTask.CompletedTask;
        }
    });

    // Add timeout per attempt — combined with retry gives bounded total time
    pipelineBuilder.AddTimeout(TimeSpan.FromSeconds(30));
});

// Use in BackgroundService:
public class PollyWorker : BackgroundService
{
    private readonly ResiliencePipeline _pipeline;
    private readonly ILogger _logger;

    public PollyWorker(
        ResiliencePipelineProvider pipelineProvider,
        ILogger logger)
    {
        _pipeline = pipelineProvider.GetPipeline("worker-retry");
        _logger   = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        try
        {
            while (!stoppingToken.IsCancellationRequested)
            {
                var message = await _queue.DequeueAsync(stoppingToken);

                // stoppingToken passed to ExecuteAsync — Polly aborts retry cycle on shutdown
                await _pipeline.ExecuteAsync(
                    async ct => await _service.ProcessAsync(message, ct),
                    stoppingToken);
            }
        }
        catch (OperationCanceledException)
        {
            _logger.LogInformation("Worker stopping.");
        }
    }
}

// ── Jitter note ────────────────────────────────────────────────────────────
// UseJitter = true is not optional when you have multiple worker instances.
// Without jitter, all instances fail at the same time and retry at the same
// time — creating synchronised retry waves that overwhelm the recovering
// dependency. Jitter distributes retry attempts across time, giving the
// dependency a chance to recover without a thundering herd.

The choice between manual retry and Polly is a maintenance question, not a correctness question — both approaches work when implemented correctly. Polly's advantage is that the retry semantics are declared in one place and are reusable across multiple services registered against the same pipeline name. The manual approach's advantage is zero additional dependencies and explicit control over every decision point. For teams already using Polly for HTTP resilience, using ResiliencePipeline in background services keeps resilience policy consistent across the codebase. For teams without Polly, the manual pattern is correct and readable without adding a dependency.

Graceful Shutdown: Drain the Queue or Stop Immediately

When the host signals shutdown, your BackgroundService faces a decision about work already in progress. Stop immediately means cancel all in-flight processing, return from ExecuteAsync, and let the queue or caller retry. Drain means stop accepting new work but complete everything already dequeued before returning. The right answer depends on whether your work items are idempotent, whether your queue provides at-least-once delivery, and how expensive duplicate processing is relative to the cost of a longer shutdown. Neither is universally correct — the decision belongs to the service's domain, not the framework.

GracefulShutdown.cs — Stop-Immediately & Drain Patterns With Shutdown Timeout
// ════════════════════════════════════════════════════════════════════════════
// HOST SHUTDOWN TIMEOUT CONFIGURATION
// Default: 5 seconds. Extend when your work items need more time to drain.
// Must be shorter than Kubernetes terminationGracePeriodSeconds (default: 30s).
// ════════════════════════════════════════════════════════════════════════════

builder.Services.Configure(options =>
{
    // Allow 25 seconds for all hosted services to stop.
    // Kubernetes terminationGracePeriodSeconds should be set to 30 or higher.
    options.ShutdownTimeout = TimeSpan.FromSeconds(25);
});


// ════════════════════════════════════════════════════════════════════════════
// PATTERN A: Stop immediately — correct for idempotent, at-least-once queues
// ════════════════════════════════════════════════════════════════════════════
// When your queue (SQS, Service Bus, RabbitMQ) provides at-least-once delivery
// and your message handler is idempotent, the simplest shutdown strategy is
// correct: let OperationCanceledException propagate, return from ExecuteAsync.
// The message was not acknowledged — the queue redelivers it to another instance.

public class StopImmediatelyWorker : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        try
        {
            while (!stoppingToken.IsCancellationRequested)
            {
                var message = await _queue.DequeueAsync(stoppingToken);
                await ProcessAsync(message, stoppingToken);
                await _queue.AcknowledgeAsync(message, stoppingToken);
                // If stoppingToken fires before Acknowledge, the message is
                // redelivered by the queue broker — correct behaviour.
            }
        }
        catch (OperationCanceledException)
        {
            _logger.LogInformation("Worker stopped. In-flight message will be redelivered.");
        }
    }
}


// ════════════════════════════════════════════════════════════════════════════
// PATTERN B: Drain in-flight work — correct for non-idempotent or costly work
// ════════════════════════════════════════════════════════════════════════════
// Use when: messages have external side effects (payment, email, third-party API)
// or when deduplication is expensive and redelivery is undesirable.

public class DrainingWorker : BackgroundService
{
    private readonly ILogger _logger;
    private readonly IMessageQueue _queue;

    // Maximum time to spend draining in-flight work after shutdown signal.
    // Must fit within the host ShutdownTimeout configured above.
    private static readonly TimeSpan DrainTimeout = TimeSpan.FromSeconds(20);

    public DrainingWorker(IMessageQueue queue, ILogger logger)
        => (_queue, _logger) = (queue, logger);

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        // Track all in-flight processing tasks so we can await them on shutdown
        var inFlight = new ConcurrentBag();

        try
        {
            while (!stoppingToken.IsCancellationRequested)
            {
                var message = await _queue.DequeueAsync(stoppingToken);

                // Start processing — do NOT await here, so the dequeue loop continues.
                // CancellationToken.None: processing runs to completion even after shutdown.
                // The stoppingToken stops dequeuing, not processing.
                var processingTask = ProcessAndAcknowledgeAsync(message, CancellationToken.None);
                inFlight.Add(processingTask);

                // Prune completed tasks to avoid unbounded growth of the bag
                // (in production, use a more structured concurrent collection)
            }
        }
        catch (OperationCanceledException)
        {
            _logger.LogInformation(
                "Shutdown signalled. Draining {Count} in-flight messages.",
                inFlight.Count(t => !t.IsCompleted));
        }

        // ── Drain phase ────────────────────────────────────────────────────
        // Wait for all in-flight tasks, bounded by the drain timeout.
        // After DrainTimeout, any remaining tasks are abandoned — the host
        // will force-terminate after ShutdownTimeout regardless.
        using var drainCts = new CancellationTokenSource(DrainTimeout);

        try
        {
            await Task.WhenAll(inFlight).WaitAsync(drainCts.Token);
            _logger.LogInformation("Drain complete. All in-flight messages processed.");
        }
        catch (OperationCanceledException)
        {
            var remaining = inFlight.Count(t => !t.IsCompleted);
            _logger.LogWarning(
                "Drain timeout reached. {Count} message(s) abandoned — will be redelivered.",
                remaining);
        }
    }

    private async Task ProcessAndAcknowledgeAsync(Message message, CancellationToken ct)
    {
        try
        {
            await _processor.ProcessAsync(message, ct);
            await _queue.AcknowledgeAsync(message, ct);
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Failed to process message {Id} during drain.", message.Id);
            await _queue.NegativeAcknowledgeAsync(message, ct);
        }
    }
}


// ════════════════════════════════════════════════════════════════════════════
// SHUTDOWN DECISION REFERENCE
// ════════════════════════════════════════════════════════════════════════════
//
// Condition                                          → Pattern
// ─────────────────────────────────────────────────  ──────────────────────
// Queue provides at-least-once + handler idempotent  → Stop immediately (A)
// Processing has non-idempotent side effects         → Drain (B)
// Processing is fast (< 1s per message)              → Stop immediately (A)
// Processing is slow or batched                      → Drain (B)
// Redelivery is cheap                                → Stop immediately (A)
// Redelivery causes duplicates user-visible          → Drain (B)
// Kubernetes pod rolling update                      → Either, but drain
//                                                       reduces visible errors

The drain timeout is as important as the drain logic itself. Without a bounded drain time, a processing spike during a deployment rollout could cause ExecuteAsync to block long enough that the host exceeds its ShutdownTimeout and force-terminates anyway — defeating the purpose of draining. Set DrainTimeout to a value comfortably inside your ShutdownTimeout, and set ShutdownTimeout comfortably inside your Kubernetes terminationGracePeriodSeconds. The layered timeout structure — per-operation timeout, drain timeout, host shutdown timeout, Kubernetes termination grace period — gives every layer a chance to clean up before the next layer forces termination.

Exception Behaviour & .NET 8 Host Configuration

A BackgroundService that throws an unhandled exception from ExecuteAsync behaves differently depending on the .NET version and the BackgroundServiceExceptionBehavior host option. Understanding this behaviour is the difference between a silently dead worker (the old default) and a host that correctly surfaces the failure. In .NET 8, the default is to stop the host — which is the correct production behaviour, but it means unhandled exceptions that were previously silent will now take the application down.

HostConfiguration.cs — Exception Behaviour, Logging & Startup Validation
// ════════════════════════════════════════════════════════════════════════════
// BACKGROUND SERVICE EXCEPTION BEHAVIOUR
// ════════════════════════════════════════════════════════════════════════════

builder.Services.Configure(options =>
{
    // .NET 8 default: StopHost — unhandled exception in any BackgroundService
    // triggers IHost.StopAsync, bringing down the entire application.
    // This is correct production behaviour — a dead worker should not silently
    // run alongside healthy services without anyone knowing.
    options.BackgroundServiceExceptionBehavior =
        BackgroundServiceExceptionBehavior.StopHost;

    // Alternative: Ignore — the worker stops but the host continues.
    // Only use if your BackgroundService is truly optional and the application
    // is meaningful without it. Document why explicitly in code.
    // options.BackgroundServiceExceptionBehavior =
    //     BackgroundServiceExceptionBehavior.Ignore;

    options.ShutdownTimeout = TimeSpan.FromSeconds(25);
});


// ════════════════════════════════════════════════════════════════════════════
// STRUCTURED LOGGING FOR BACKGROUND SERVICE LIFECYCLE
// ════════════════════════════════════════════════════════════════════════════

public class WellLoggedWorker : BackgroundService
{
    private readonly ILogger _logger;

    public WellLoggedWorker(ILogger logger)
        => _logger = logger;

    public override async Task StartAsync(CancellationToken cancellationToken)
    {
        _logger.LogInformation("Worker starting.");
        await base.StartAsync(cancellationToken);   // calls ExecuteAsync on a background thread
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        _logger.LogInformation("Worker execute loop started.");

        try
        {
            while (!stoppingToken.IsCancellationRequested)
            {
                // ... work loop ...
                await Task.Delay(TimeSpan.FromSeconds(1), stoppingToken);
            }
        }
        catch (OperationCanceledException)
        {
            // Normal — log and return
            _logger.LogInformation("Worker execute loop cancelled.");
        }
        catch (Exception ex)
        {
            // Unhandled exception — log with full context before it propagates
            // to the host and triggers StopHost behaviour.
            _logger.LogCritical(ex, "Worker execute loop failed with unhandled exception.");
            throw;   // re-throw — let the host react per BackgroundServiceExceptionBehavior
        }
    }

    public override async Task StopAsync(CancellationToken cancellationToken)
    {
        _logger.LogInformation("Worker stopping.");
        await base.StopAsync(cancellationToken);
        _logger.LogInformation("Worker stopped.");
    }
}


// ════════════════════════════════════════════════════════════════════════════
// STARTUP VALIDATION — fail fast on missing dependencies
// ════════════════════════════════════════════════════════════════════════════
// If your BackgroundService requires a configuration value or external
// dependency that must be present at startup, validate in StartAsync rather
// than discovering the problem at the first loop iteration.

public class ValidatedWorker : BackgroundService
{
    private readonly WorkerOptions _options;
    private readonly ILogger _logger;

    public ValidatedWorker(IOptions options, ILogger logger)
    {
        _options = options.Value;
        _logger  = logger;
    }

    public override Task StartAsync(CancellationToken cancellationToken)
    {
        // Validate required configuration before the execute loop starts.
        // Throws immediately at startup — visible in logs, not buried in loop iteration 1.
        if (string.IsNullOrWhiteSpace(_options.QueueConnectionString))
            throw new InvalidOperationException(
                "WorkerOptions.QueueConnectionString is required but not configured.");

        if (_options.MaxConcurrency < 1 || _options.MaxConcurrency > 32)
            throw new InvalidOperationException(
                $"WorkerOptions.MaxConcurrency must be between 1 and 32. Got: {_options.MaxConcurrency}");

        _logger.LogInformation(
            "Worker validated. Queue: {Queue}, MaxConcurrency: {Concurrency}",
            _options.QueueConnectionString, _options.MaxConcurrency);

        return base.StartAsync(cancellationToken);
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        // Configuration is guaranteed valid here — StartAsync validated it.
        // ... execute loop ...
        await Task.CompletedTask;
    }
}

The startup validation pattern in StartAsync is worth applying to every BackgroundService that depends on external configuration. An invalid connection string or an out-of-range configuration value discovered at iteration one of the work loop produces a log entry buried among normal startup messages, possibly with a confusing inner exception. The same validation in StartAsync throws before ExecuteAsync is called, appears prominently in the startup log sequence, and stops the application immediately with a clear error message — the exact behaviour you want when configuration is wrong. Fail fast, fail loudly, fail before the service has pretended to start successfully.

What Developers Want to Know

What is the difference between the stoppingToken passed to ExecuteAsync and a manually created CancellationToken?

The stoppingToken passed to ExecuteAsync is triggered when IHost.StopAsync is called — by application shutdown, a SIGTERM signal, or IHostApplicationLifetime.StopApplication(). It is the authoritative shutdown signal for your service's work loop. A manually created token gives you a second cancellation dimension — useful for per-operation timeouts or retry abort conditions. Always link manually created tokens to the stoppingToken using CancellationTokenSource.CreateLinkedTokenSource so that a host shutdown immediately cancels any in-flight operation regardless of where it is in the retry or timeout cycle.

Why does my BackgroundService silently stop running without throwing any exception?

The most common cause is an unhandled exception escaping ExecuteAsync. In .NET 6 and earlier, this caused the BackgroundService to stop silently while the host continued running — no visible indication the worker had died. From .NET 8, the default BackgroundServiceExceptionBehavior.StopHost causes an unhandled exception in any hosted service to bring down the entire host. Check the logs for the exception before the host shutdown entries. The second common cause is async void — a fire-and-forget async void method that throws will crash the process on older runtimes with no useful stack trace.

How long does BackgroundService have to complete work on shutdown before it is forcibly terminated?

By default, the .NET generic host allows 5 seconds for all hosted services to stop after StopAsync is called. Configure this via HostOptions.ShutdownTimeout. In containerised environments, Kubernetes sends SIGTERM and waits for terminationGracePeriodSeconds (default 30s) before sending SIGKILL. Your .NET shutdown timeout must be shorter than the Kubernetes termination grace period — a typical safe configuration is a 25-second .NET shutdown timeout with a 30-second Kubernetes grace period, giving the host a 5-second margin to complete its own teardown after all services have stopped.

Should I use Polly or manual retry logic in a BackgroundService?

Either works correctly when implemented properly — the critical requirement is that your retry logic respects the stoppingToken. Polly's ResiliencePipeline (Polly v8) accepts a CancellationToken and aborts the retry cycle on shutdown. Manual retry loops with Task.Delay(backoff, stoppingToken) achieve the same result. The risk with manual retry is accidentally omitting the cancellation token from Task.Delay — a retry in a 64-second backoff delay without the token blocks shutdown for the full duration. For teams already using Polly for HTTP resilience, using ResiliencePipeline in workers keeps resilience policy consistent across the codebase.

What is the drain-or-stop decision and how do I implement draining?

Drain-or-stop determines what happens to work already in progress when the shutdown signal arrives. Stop means cancel immediately and let the queue redeliver — correct for idempotent handlers with at-least-once delivery. Drain means stop accepting new work but complete everything already dequeued — correct for non-idempotent side effects like payment processing or email sending. Implement draining by using the stoppingToken to exit the dequeue loop but passing CancellationToken.None to the processing of already-dequeued items. Always bound drain time with a separate CancellationTokenSource timeout that fits inside your host ShutdownTimeout.

Back to Articles