Graceful Shutdown in ASP.NET Core: How to Stop Dropping Requests on Every Rolling Deploy

Q: How long does ASP.NET Core wait for in-flight requests to complete before shutting down?

By default, ASP.NET Core waits 5 seconds for in-flight requests to complete after receiving a shutdown signal. You can increase this with builder.WebHost.UseShutdownTimeout(TimeSpan.FromSeconds(30)) or via HostOptions.ShutdownTimeout in appsettings.json. If any request is still running after the timeout, it is forcefully terminated. Set the timeout to match your 99th-percentile request duration — not your worst-case outlier, which should be handled by a separate request timeout middleware.

Q: Why does my Kubernetes rolling deploy still drop requests even after adding graceful shutdown?

The most common cause is the timing gap between Kubernetes removing the pod from the Endpoints list and the load balancer or kube-proxy propagating that change. New requests can still reach a pod that has already received SIGTERM for several seconds. The fix is a preStop hook that sleeps for 5–15 seconds before the process exits, giving the control plane time to drain traffic away from the pod before shutdown begins in earnest.

Q: What happens to my BackgroundService jobs when the app shuts down?

The host passes a CancellationToken to BackgroundService.ExecuteAsync that is cancelled when shutdown is triggered. If your job loop does not observe this token — for example, by passing it to await Task.Delay or database calls — it will be forcefully killed when the shutdown timeout expires, potentially mid-operation. Always pass the stoppingToken through every async call in your background service, and implement a checkpoint pattern for long-running jobs so they can resume from where they left off after restart.

Q: Should I use IHostApplicationLifetime or IHostedService for shutdown logic?

Use IHostApplicationLifetime when you need to register callbacks that run at application start, stopping, or stopped events — for example, releasing external resources, flushing telemetry, or notifying a service registry. Use IHostedService (or BackgroundService) when you have a long-running process that should be running continuously during the application lifetime and needs a clean shutdown path. The two are complementary: a BackgroundService can inject IHostApplicationLifetime to trigger application shutdown if a critical failure occurs.

Q: Does graceful shutdown work with the ASP.NET Core development server?

Yes — pressing Ctrl+C in the terminal sends SIGINT, which the generic host converts into the same shutdown sequence as SIGTERM in production. IHostApplicationLifetime callbacks fire, in-flight requests are given time to complete up to the shutdown timeout, and BackgroundService.ExecuteAsync receives the stoppingToken cancellation. This means you can test your graceful shutdown logic locally without needing a Kubernetes cluster.

February 27, 2026

10 min read

Advanced

The Silent Error You Ship on Every Deploy

Rolling deploys are supposed to be invisible to users. Kubernetes spins up a new pod, drains the old one, and traffic shifts without a blip. In practice, the old pod receives a SIGTERM and Kestrel stops accepting connections while requests are still in-flight — those requests get a connection reset, the user sees an error, and your metrics show a spike of 5xx responses clustered precisely around deploy time.

The failure is predictable and completely preventable. It happens because graceful shutdown requires three things working in concert: the process must delay long enough for the load balancer to stop sending it traffic, in-flight requests must be given time to complete, and background jobs must checkpoint their work before the process exits. Most ASP.NET Core applications implement none of the three correctly by default.

This article walks through each layer — what breaks, why, and the exact code that fixes it. The patterns apply whether you're running on Kubernetes, Azure App Service, or bare VMs behind a load balancer.

Why Requests Drop During Rolling Deploys

Understanding the failure requires following the shutdown sequence step by step. When Kubernetes decides to terminate a pod, it does two things simultaneously: it sends SIGTERM to the process, and it removes the pod from the Endpoints object so no new traffic is routed to it. The problem is "simultaneously" is not actually simultaneous — propagating the endpoint removal through kube-proxy and any external load balancer takes several seconds.

During that propagation window, the load balancer still considers the pod healthy and continues sending it traffic. But the process has already received SIGTERM and begun shutting down. Kestrel, by default, stops accepting new connections the moment shutdown is triggered. Requests that arrive in this window get a TCP reset — no HTTP response, just a connection failure. From the client's perspective, the server vanished mid-request.

The Shutdown Race Condition — Timeline

// t=0s   Kubernetes sends SIGTERM to pod
//        Kubernetes simultaneously removes pod from Endpoints list

// t=0s   ASP.NET Core host receives SIGTERM
//        Default behaviour: starts shutdown sequence immediately
//        Kestrel stops accepting NEW connections at t=0s

// t=0–5s Load balancer propagation delay
//        kube-proxy, iptables rules, external LB — all still routing to pod
//        New requests arrive at pod → TCP RESET (connection refused)
//        In-flight requests: may or may not complete before timeout

// t=5s   Default shutdown timeout expires
//        Any in-flight requests still running are forcefully terminated
//        Process exits

// ─── What we want instead ────────────────────────────────────────────────

// t=0s   SIGTERM received
//        preStop hook fires: sleep 10s (lets LB drain traffic away)

// t=10s  preStop hook completes
//        Kestrel stops accepting new connections (traffic already gone)
//        In-flight requests run to completion

// t=10s+ All in-flight requests finish (within shutdown timeout)
//        Background services checkpoint and stop cleanly
//        Process exits — zero dropped requests

The two-part fix is: a preStop sleep to cover the LB propagation delay, and a shutdown timeout long enough for in-flight requests to complete. Neither alone is sufficient. Both together eliminate the race condition.

The Kubernetes preStop Hook

The preStop hook runs before Kubernetes sends SIGTERM to the container. A simple sleep command buys the control plane time to propagate the endpoint removal and drain traffic away from the pod before the process begins shutting down.

kubernetes/deployment.yaml — preStop Hook

// YAML — paste into your Deployment spec under containers[].lifecycle
/*
spec:
  containers:
  - name: api
    image: yourcompany/api:latest
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 10"]

    # Shutdown timeout must be longer than preStop sleep + request drain time
    # terminationGracePeriodSeconds = preStop sleep + shutdown timeout + buffer
    # Example: 10s sleep + 25s drain + 5s buffer = 40s
  terminationGracePeriodSeconds: 40
*/

// In Program.cs — set the shutdown timeout to match
builder.WebHost.ConfigureKestrel(options =>
{
    // How long Kestrel waits for in-flight requests after shutdown begins
    // Set to your 99th-percentile request latency, not your worst-case outlier
    options.Limits.KeepAliveTimeout = TimeSpan.FromSeconds(25);
});

builder.Services.Configure(options =>
{
    // Host-level shutdown timeout — covers background services too
    // Must be >= Kestrel drain time
    options.ShutdownTimeout = TimeSpan.FromSeconds(30);
});

The terminationGracePeriodSeconds in the pod spec is the hard ceiling. Kubernetes sends SIGKILL when it expires, regardless of what the process is doing. Always set it larger than preStop sleep + ShutdownTimeout with a few seconds of buffer. If terminationGracePeriodSeconds is shorter than your preStop hook, Kubernetes will kill the pod before the hook finishes — defeating the entire purpose.

IHostApplicationLifetime: Reacting to Shutdown Signals

IHostApplicationLifetime exposes three CancellationToken properties that fire at different points in the application lifetime. Registering callbacks on them lets you flush telemetry, release external leases, notify service registries, or log a structured shutdown event — all before the process exits.

Program.cs — IHostApplicationLifetime Callbacks

var app = builder.Build();

// Resolve lifetime after build — safe to use in Program.cs top-level statements
var lifetime = app.Services.GetRequiredService();
var logger   = app.Services.GetRequiredService>();

// ApplicationStarted: fires after the host is fully started and ready for traffic
lifetime.ApplicationStarted.Register(() =>
{
    logger.LogInformation(
        "Application started. Pod ready to serve traffic. PID={Pid}",
        Environment.ProcessId);

    // Example: register with a service discovery system
    // serviceRegistry.RegisterAsync(instanceId, address, port);
});

// ApplicationStopping: fires when shutdown is triggered (SIGTERM received)
// The process is still alive — in-flight requests are still running
// This is the right place to stop accepting new work (e.g., pause a queue consumer)
lifetime.ApplicationStopping.Register(() =>
{
    logger.LogWarning(
        "Shutdown signal received. Draining in-flight requests. " +
        "New requests will be rejected after Kestrel drain completes.");

    // Example: pause a message queue consumer so no new jobs start
    // queueConsumer.PauseAsync();
});

// ApplicationStopped: fires after all hosted services have stopped
// The process is about to exit — safe for final cleanup only
lifetime.ApplicationStopped.Register(() =>
{
    logger.LogInformation(
        "Application stopped. All hosted services completed shutdown.");

    // Example: deregister from service discovery
    // serviceRegistry.DeregisterAsync(instanceId);

    // Example: flush buffered telemetry (OpenTelemetry, AppInsights)
    // telemetryClient.Flush();
    // Thread.Sleep(2000); // give flush time to complete
});

app.Run();

Register callbacks on ApplicationStopping rather than ApplicationStopped for anything that needs to complete before the process exits. By the time ApplicationStopped fires, hosted services have already been told to stop and the shutdown timeout clock is ticking. ApplicationStopping fires first, while there is still time to do meaningful work.

Propagating CancellationToken Through Every Handler

Graceful shutdown only works if your request handlers actually stop when cancelled. A handler that ignores its CancellationToken runs until it completes or the shutdown timeout kills the process — whichever comes first. For long-running handlers, that means either a dropped request or a delayed shutdown that blocks other pods from starting.

The rule is simple: every async method that does I/O must accept and forward the CancellationToken. No exceptions. A database query with no token is a ticking delay on every shutdown.

Endpoints — CancellationToken Through Every I/O Call

// ── WRONG: ignores cancellation — handler runs regardless of shutdown ──────
app.MapGet("/orders/{id}", async (int id, IOrderService orders) =>
{
    var order = await orders.GetByIdAsync(id);  // no token — blocks shutdown
    var lines  = await orders.GetLinesAsync(id); // no token — blocks shutdown
    return order is null ? Results.NotFound() : Results.Ok(new { order, lines });
});

// ── CORRECT: token flows through every async call ─────────────────────────
app.MapGet("/orders/{id}", async (
    int               id,
    IOrderService     orders,
    CancellationToken ct) =>           // ASP.NET Core injects HttpContext.RequestAborted
{
    // If shutdown triggers mid-request, ct is cancelled →
    // GetByIdAsync throws OperationCanceledException → handler exits cleanly
    var order = await orders.GetByIdAsync(id, ct);
    if (order is null) return Results.NotFound();

    var lines = await orders.GetLinesAsync(id, ct);
    return Results.Ok(new { order, lines });
});

// ── Repository layer — must accept and forward the token ──────────────────
public class OrderRepository(AppDbContext db)
{
    public async Task GetByIdAsync(int id, CancellationToken ct = default) =>
        await db.Orders
            .AsNoTracking()
            .FirstOrDefaultAsync(o => o.Id == id, ct);  // EF Core respects ct

    public async Task> GetLinesAsync(int id, CancellationToken ct = default) =>
        await db.OrderLines
            .Where(l => l.OrderId == id)
            .AsNoTracking()
            .ToListAsync(ct);
}

// ── Global exception handling — treat cancellation as a non-error ─────────
app.UseExceptionHandler(exApp => exApp.Run(async ctx =>
{
    var ex = ctx.Features.Get()?.Error;

    // Client disconnected or shutdown triggered — not a server error, don't log as one
    if (ex is OperationCanceledException)
    {
        ctx.Response.StatusCode = 499; // Client Closed Request (nginx convention)
        return;
    }

    ctx.Response.StatusCode = 500;
    await ctx.Response.WriteAsJsonAsync(new
    {
        type   = "https://tools.ietf.org/html/rfc7807",
        title  = "Internal Server Error",
        status = 500
    });
}));

ASP.NET Core automatically cancels HttpContext.RequestAborted when the client disconnects or when the server is shutting down. Using it as your handler's CancellationToken means you get correct cancellation behaviour for both scenarios with no extra wiring.

BackgroundService: Checkpoint Before the Process Exits

Background services are the most common source of data loss during shutdown. A job that processes items from a queue without observing the stoppingToken will be killed mid-item when the shutdown timeout expires — the item is lost, or worse, partially processed. The fix is checkpoint-based processing: only advance the queue cursor after an item is fully handled, and stop fetching new items the moment cancellation is requested.

Services/OrderProcessingService.cs — Checkpoint-Safe BackgroundService

public class OrderProcessingService(
    IServiceScopeFactory     scopeFactory,
    ILogger logger)
    : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        logger.LogInformation("Order processing service started.");

        // stoppingToken is cancelled when the host begins shutdown
        while (!stoppingToken.IsCancellationRequested)
        {
            try
            {
                await ProcessBatchAsync(stoppingToken);

                // Idle wait — passes stoppingToken so delay is cancelled on shutdown
                // Without the token, Task.Delay blocks shutdown for its full duration
                await Task.Delay(TimeSpan.FromSeconds(5), stoppingToken);
            }
            catch (OperationCanceledException)
            {
                // Shutdown requested — exit the loop cleanly, don't log as error
                break;
            }
            catch (Exception ex)
            {
                logger.LogError(ex, "Order processing batch failed. Retrying in 30s.");
                await Task.Delay(TimeSpan.FromSeconds(30), stoppingToken);
            }
        }

        logger.LogInformation(
            "Order processing service stopping. Completing current batch if any.");
    }

    private async Task ProcessBatchAsync(CancellationToken ct)
    {
        await using var scope = scopeFactory.CreateAsyncScope();
        var db    = scope.ServiceProvider.GetRequiredService();
        var queue = scope.ServiceProvider.GetRequiredService();

        // Fetch items — passes ct so the query cancels on shutdown
        var items = await queue.DequeueBatchAsync(batchSize: 10, ct);

        foreach (var item in items)
        {
            // Check before each item — stops mid-batch cleanly on shutdown
            ct.ThrowIfCancellationRequested();

            try
            {
                await ProcessOrderAsync(item, db, ct);

                // ── Checkpoint: only advance cursor AFTER successful processing ──
                await queue.AcknowledgeAsync(item.Id, ct);
            }
            catch (OperationCanceledException)
            {
                // Return item to queue so next instance picks it up
                await queue.NackAsync(item.Id);
                throw;   // propagate to exit the batch loop
            }
            catch (Exception ex)
            {
                logger.LogError(ex,
                    "Failed to process order {OrderId}. Moving to dead-letter queue.",
                    item.OrderId);
                await queue.DeadLetterAsync(item.Id);
            }
        }
    }

    public override async Task StopAsync(CancellationToken cancellationToken)
    {
        logger.LogWarning(
            "Order processing service received stop signal. " +
            "Finishing current item before exiting.");

        // Give the base class time to complete the current iteration
        await base.StopAsync(cancellationToken);

        logger.LogInformation("Order processing service stopped cleanly.");
    }
}

The ct.ThrowIfCancellationRequested() call at the top of each loop iteration is the most important line in the entire service. Without it, a batch of 100 items will process all 100 even after shutdown is requested — only stopping when the shutdown timeout kills the process. With it, the service stops after the current item completes, acknowledges that item, and exits — leaving the remaining items safely in the queue for the next pod instance to pick up.

Go deeper — hands-on tutorial

Cloud-Native .NET 8: Health Probes, Polly Resilience & Zero-Downtime Kubernetes Deployments

Take graceful shutdown further in the full project-style tutorial. Wire Kubernetes liveness and readiness probes to your health check endpoints, build a Polly v8 resilience pipeline with retries and circuit breakers, and implement the complete zero-downtime deployment playbook — preStop hooks, drain delays, and migration init containers — from a single working codebase you can fork and ship.

Build zero-downtime deployments step by step → Deploy with confidence — zero dropped requests, every time.

Things Developers Ask

How long does ASP.NET Core wait for in-flight requests to complete before shutting down?

By default, 5 seconds — set via HostOptions.ShutdownTimeout. Increase it with builder.Services.Configure<HostOptions>(o => o.ShutdownTimeout = TimeSpan.FromSeconds(30)). If any request is still running after the timeout, it is forcefully terminated. Set the timeout to match your 99th-percentile request latency, not your worst-case outlier — outliers should be handled by separate request timeout middleware, not by extending the shutdown window indefinitely.

Why does my Kubernetes rolling deploy still drop requests even after adding graceful shutdown?

Almost certainly the timing gap between Kubernetes removing the pod from the Endpoints list and the load balancer or kube-proxy propagating that change. New requests can still reach a pod that has already received SIGTERM for several seconds. The fix is a preStop hook that sleeps for 5–15 seconds before the process begins shutting down, giving the control plane enough time to drain traffic away from the pod first.

What happens to my BackgroundService jobs when the app shuts down?

The host cancels the stoppingToken passed to BackgroundService.ExecuteAsync. If your job loop does not observe this token — for example, by not passing it to await Task.Delay or database calls — it will be forcefully killed when the shutdown timeout expires, potentially mid-operation. Always pass stoppingToken through every async call, and only advance your queue cursor after an item is fully processed so partial work is never lost.

Should I use IHostApplicationLifetime or IHostedService for shutdown logic?

Use IHostApplicationLifetime for one-shot callbacks at specific lifecycle events — flushing telemetry, releasing external leases, notifying service registries, or logging a structured shutdown event. Use BackgroundService for long-running continuous work that needs a clean draining path. The two complement each other: a BackgroundService can inject IHostApplicationLifetime to trigger application shutdown if a critical unrecoverable failure occurs in the background worker.

Does graceful shutdown work with the ASP.NET Core development server?

Yes — pressing Ctrl+C in the terminal sends SIGINT, which the generic host converts into the same shutdown sequence as SIGTERM in production. IHostApplicationLifetime callbacks fire, in-flight requests get time to complete up to the shutdown timeout, and BackgroundService.ExecuteAsync receives its stoppingToken cancellation. You can test your entire graceful shutdown path locally without needing a Kubernetes cluster.

Back to Articles