The Silent Error You Ship on Every Deploy
Rolling deploys are supposed to be invisible to users. Kubernetes spins up a new pod, drains the old one, and traffic shifts without a blip. In practice, the old pod receives a SIGTERM and Kestrel stops accepting connections while requests are still in-flight — those requests get a connection reset, the user sees an error, and your metrics show a spike of 5xx responses clustered precisely around deploy time.
The failure is predictable and completely preventable. It happens because graceful shutdown requires three things working in concert: the process must delay long enough for the load balancer to stop sending it traffic, in-flight requests must be given time to complete, and background jobs must checkpoint their work before the process exits. Most ASP.NET Core applications implement none of the three correctly by default.
This article walks through each layer — what breaks, why, and the exact code that fixes it. The patterns apply whether you're running on Kubernetes, Azure App Service, or bare VMs behind a load balancer.
Why Requests Drop During Rolling Deploys
Understanding the failure requires following the shutdown sequence step by step. When Kubernetes decides to terminate a pod, it does two things simultaneously: it sends SIGTERM to the process, and it removes the pod from the Endpoints object so no new traffic is routed to it. The problem is "simultaneously" is not actually simultaneous — propagating the endpoint removal through kube-proxy and any external load balancer takes several seconds.
During that propagation window, the load balancer still considers the pod healthy and continues sending it traffic. But the process has already received SIGTERM and begun shutting down. Kestrel, by default, stops accepting new connections the moment shutdown is triggered. Requests that arrive in this window get a TCP reset — no HTTP response, just a connection failure. From the client's perspective, the server vanished mid-request.
// t=0s Kubernetes sends SIGTERM to pod
// Kubernetes simultaneously removes pod from Endpoints list
// t=0s ASP.NET Core host receives SIGTERM
// Default behaviour: starts shutdown sequence immediately
// Kestrel stops accepting NEW connections at t=0s
// t=0–5s Load balancer propagation delay
// kube-proxy, iptables rules, external LB — all still routing to pod
// New requests arrive at pod → TCP RESET (connection refused)
// In-flight requests: may or may not complete before timeout
// t=5s Default shutdown timeout expires
// Any in-flight requests still running are forcefully terminated
// Process exits
// ─── What we want instead ────────────────────────────────────────────────
// t=0s SIGTERM received
// preStop hook fires: sleep 10s (lets LB drain traffic away)
// t=10s preStop hook completes
// Kestrel stops accepting new connections (traffic already gone)
// In-flight requests run to completion
// t=10s+ All in-flight requests finish (within shutdown timeout)
// Background services checkpoint and stop cleanly
// Process exits — zero dropped requests
The two-part fix is: a preStop sleep to cover the LB propagation delay, and a shutdown timeout long enough for in-flight requests to complete. Neither alone is sufficient. Both together eliminate the race condition.
The Kubernetes preStop Hook
The preStop hook runs before Kubernetes sends SIGTERM to the container. A simple sleep command buys the control plane time to propagate the endpoint removal and drain traffic away from the pod before the process begins shutting down.
// YAML — paste into your Deployment spec under containers[].lifecycle
/*
spec:
containers:
- name: api
image: yourcompany/api:latest
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
# Shutdown timeout must be longer than preStop sleep + request drain time
# terminationGracePeriodSeconds = preStop sleep + shutdown timeout + buffer
# Example: 10s sleep + 25s drain + 5s buffer = 40s
terminationGracePeriodSeconds: 40
*/
// In Program.cs — set the shutdown timeout to match
builder.WebHost.ConfigureKestrel(options =>
{
// How long Kestrel waits for in-flight requests after shutdown begins
// Set to your 99th-percentile request latency, not your worst-case outlier
options.Limits.KeepAliveTimeout = TimeSpan.FromSeconds(25);
});
builder.Services.Configure(options =>
{
// Host-level shutdown timeout — covers background services too
// Must be >= Kestrel drain time
options.ShutdownTimeout = TimeSpan.FromSeconds(30);
});
The terminationGracePeriodSeconds in the pod spec is the hard ceiling. Kubernetes sends SIGKILL when it expires, regardless of what the process is doing. Always set it larger than preStop sleep + ShutdownTimeout with a few seconds of buffer. If terminationGracePeriodSeconds is shorter than your preStop hook, Kubernetes will kill the pod before the hook finishes — defeating the entire purpose.
IHostApplicationLifetime: Reacting to Shutdown Signals
IHostApplicationLifetime exposes three CancellationToken properties that fire at different points in the application lifetime. Registering callbacks on them lets you flush telemetry, release external leases, notify service registries, or log a structured shutdown event — all before the process exits.
var app = builder.Build();
// Resolve lifetime after build — safe to use in Program.cs top-level statements
var lifetime = app.Services.GetRequiredService();
var logger = app.Services.GetRequiredService>();
// ApplicationStarted: fires after the host is fully started and ready for traffic
lifetime.ApplicationStarted.Register(() =>
{
logger.LogInformation(
"Application started. Pod ready to serve traffic. PID={Pid}",
Environment.ProcessId);
// Example: register with a service discovery system
// serviceRegistry.RegisterAsync(instanceId, address, port);
});
// ApplicationStopping: fires when shutdown is triggered (SIGTERM received)
// The process is still alive — in-flight requests are still running
// This is the right place to stop accepting new work (e.g., pause a queue consumer)
lifetime.ApplicationStopping.Register(() =>
{
logger.LogWarning(
"Shutdown signal received. Draining in-flight requests. " +
"New requests will be rejected after Kestrel drain completes.");
// Example: pause a message queue consumer so no new jobs start
// queueConsumer.PauseAsync();
});
// ApplicationStopped: fires after all hosted services have stopped
// The process is about to exit — safe for final cleanup only
lifetime.ApplicationStopped.Register(() =>
{
logger.LogInformation(
"Application stopped. All hosted services completed shutdown.");
// Example: deregister from service discovery
// serviceRegistry.DeregisterAsync(instanceId);
// Example: flush buffered telemetry (OpenTelemetry, AppInsights)
// telemetryClient.Flush();
// Thread.Sleep(2000); // give flush time to complete
});
app.Run();
Register callbacks on ApplicationStopping rather than ApplicationStopped for anything that needs to complete before the process exits. By the time ApplicationStopped fires, hosted services have already been told to stop and the shutdown timeout clock is ticking. ApplicationStopping fires first, while there is still time to do meaningful work.
Propagating CancellationToken Through Every Handler
Graceful shutdown only works if your request handlers actually stop when cancelled. A handler that ignores its CancellationToken runs until it completes or the shutdown timeout kills the process — whichever comes first. For long-running handlers, that means either a dropped request or a delayed shutdown that blocks other pods from starting.
The rule is simple: every async method that does I/O must accept and forward the CancellationToken. No exceptions. A database query with no token is a ticking delay on every shutdown.
// ── WRONG: ignores cancellation — handler runs regardless of shutdown ──────
app.MapGet("/orders/{id}", async (int id, IOrderService orders) =>
{
var order = await orders.GetByIdAsync(id); // no token — blocks shutdown
var lines = await orders.GetLinesAsync(id); // no token — blocks shutdown
return order is null ? Results.NotFound() : Results.Ok(new { order, lines });
});
// ── CORRECT: token flows through every async call ─────────────────────────
app.MapGet("/orders/{id}", async (
int id,
IOrderService orders,
CancellationToken ct) => // ASP.NET Core injects HttpContext.RequestAborted
{
// If shutdown triggers mid-request, ct is cancelled →
// GetByIdAsync throws OperationCanceledException → handler exits cleanly
var order = await orders.GetByIdAsync(id, ct);
if (order is null) return Results.NotFound();
var lines = await orders.GetLinesAsync(id, ct);
return Results.Ok(new { order, lines });
});
// ── Repository layer — must accept and forward the token ──────────────────
public class OrderRepository(AppDbContext db)
{
public async Task GetByIdAsync(int id, CancellationToken ct = default) =>
await db.Orders
.AsNoTracking()
.FirstOrDefaultAsync(o => o.Id == id, ct); // EF Core respects ct
public async Task> GetLinesAsync(int id, CancellationToken ct = default) =>
await db.OrderLines
.Where(l => l.OrderId == id)
.AsNoTracking()
.ToListAsync(ct);
}
// ── Global exception handling — treat cancellation as a non-error ─────────
app.UseExceptionHandler(exApp => exApp.Run(async ctx =>
{
var ex = ctx.Features.Get()?.Error;
// Client disconnected or shutdown triggered — not a server error, don't log as one
if (ex is OperationCanceledException)
{
ctx.Response.StatusCode = 499; // Client Closed Request (nginx convention)
return;
}
ctx.Response.StatusCode = 500;
await ctx.Response.WriteAsJsonAsync(new
{
type = "https://tools.ietf.org/html/rfc7807",
title = "Internal Server Error",
status = 500
});
}));
ASP.NET Core automatically cancels HttpContext.RequestAborted when the client disconnects or when the server is shutting down. Using it as your handler's CancellationToken means you get correct cancellation behaviour for both scenarios with no extra wiring.
BackgroundService: Checkpoint Before the Process Exits
Background services are the most common source of data loss during shutdown. A job that processes items from a queue without observing the stoppingToken will be killed mid-item when the shutdown timeout expires — the item is lost, or worse, partially processed. The fix is checkpoint-based processing: only advance the queue cursor after an item is fully handled, and stop fetching new items the moment cancellation is requested.
public class OrderProcessingService(
IServiceScopeFactory scopeFactory,
ILogger logger)
: BackgroundService
{
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
logger.LogInformation("Order processing service started.");
// stoppingToken is cancelled when the host begins shutdown
while (!stoppingToken.IsCancellationRequested)
{
try
{
await ProcessBatchAsync(stoppingToken);
// Idle wait — passes stoppingToken so delay is cancelled on shutdown
// Without the token, Task.Delay blocks shutdown for its full duration
await Task.Delay(TimeSpan.FromSeconds(5), stoppingToken);
}
catch (OperationCanceledException)
{
// Shutdown requested — exit the loop cleanly, don't log as error
break;
}
catch (Exception ex)
{
logger.LogError(ex, "Order processing batch failed. Retrying in 30s.");
await Task.Delay(TimeSpan.FromSeconds(30), stoppingToken);
}
}
logger.LogInformation(
"Order processing service stopping. Completing current batch if any.");
}
private async Task ProcessBatchAsync(CancellationToken ct)
{
await using var scope = scopeFactory.CreateAsyncScope();
var db = scope.ServiceProvider.GetRequiredService();
var queue = scope.ServiceProvider.GetRequiredService();
// Fetch items — passes ct so the query cancels on shutdown
var items = await queue.DequeueBatchAsync(batchSize: 10, ct);
foreach (var item in items)
{
// Check before each item — stops mid-batch cleanly on shutdown
ct.ThrowIfCancellationRequested();
try
{
await ProcessOrderAsync(item, db, ct);
// ── Checkpoint: only advance cursor AFTER successful processing ──
await queue.AcknowledgeAsync(item.Id, ct);
}
catch (OperationCanceledException)
{
// Return item to queue so next instance picks it up
await queue.NackAsync(item.Id);
throw; // propagate to exit the batch loop
}
catch (Exception ex)
{
logger.LogError(ex,
"Failed to process order {OrderId}. Moving to dead-letter queue.",
item.OrderId);
await queue.DeadLetterAsync(item.Id);
}
}
}
public override async Task StopAsync(CancellationToken cancellationToken)
{
logger.LogWarning(
"Order processing service received stop signal. " +
"Finishing current item before exiting.");
// Give the base class time to complete the current iteration
await base.StopAsync(cancellationToken);
logger.LogInformation("Order processing service stopped cleanly.");
}
}
The ct.ThrowIfCancellationRequested() call at the top of each loop iteration is the most important line in the entire service. Without it, a batch of 100 items will process all 100 even after shutdown is requested — only stopping when the shutdown timeout kills the process. With it, the service stops after the current item completes, acknowledges that item, and exits — leaving the remaining items safely in the queue for the next pod instance to pick up.