Every Public API Needs a Bouncer
Rate limiting is not a feature you bolt on after your first scraping incident or after a buggy client floods your login endpoint with ten thousand requests a minute. It is a first-line defence that belongs in the same architectural conversation as authentication and authorisation — before you write a single endpoint handler. Without it, one misbehaving caller can saturate your Kestrel thread pool, exhaust your database connection pool, and take your API offline for every other legitimate client.
ASP.NET Core 8 ships a production-ready rate limiting middleware in Microsoft.AspNetCore.RateLimiting — no third-party NuGet package, no Redis dependency for single-instance deployments. Four algorithms out of the box: Fixed Window, Sliding Window, Token Bucket, and Concurrency Limiter. Each solves a distinct traffic-shaping problem. The middleware integrates directly with the endpoint routing system, supports per-client partitioning by IP address, authenticated user identity, or any request property you choose, and gives you full control over the rejection response your callers receive.
This article walks through all four algorithms with production-ready code, explains which one to reach for in each scenario, shows how to partition limits per client rather than sharing a global counter, and demonstrates how to return a proper 429 Too Many Requests response with a Retry-After header so well-behaved clients know exactly when to retry.
Wiring Up the Middleware: AddRateLimiter and Pipeline Order
Rate limiting in ASP.NET Core 8 follows the same pattern as every other middleware feature: register services in AddRateLimiter, add the middleware to the pipeline with UseRateLimiting, and bind named policies to endpoints with .RequireRateLimiting or [EnableRateLimiting]. The pipeline order is not optional — the middleware must appear after routing and authentication so it can read both the matched endpoint and the authenticated user identity.
using Microsoft.AspNetCore.RateLimiting;
using System.Threading.RateLimiting;
var builder = WebApplication.CreateBuilder(args);
// ── Step 1: Register rate limiting services ───────────────────────────────
// AddRateLimiter is where you define named policies.
// You can register as many named policies as you need — one per logical
// group of endpoints with the same throttling requirements.
builder.Services.AddRateLimiter(options =>
{
// Global rejection status code — applies to ALL policies
// unless overridden in a per-policy OnRejected callback.
// Default without this line is 503 Service Unavailable — wrong signal.
// 429 Too Many Requests is the correct HTTP status for rate limiting.
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
// Named policies are registered here — see sections below for each algorithm.
// options.AddFixedWindowLimiter("fixed", opt => { ... });
// options.AddSlidingWindowLimiter("sliding", opt => { ... });
// options.AddConcurrencyLimiter("concurrent", opt => { ... });
// options.AddTokenBucketLimiter("token-bucket", opt => { ... });
});
var app = builder.Build();
// ── Step 2: Middleware pipeline order ─────────────────────────────────────
// UseRateLimiting MUST be placed:
// ✓ After UseRouting — needs matched endpoint to apply the correct policy
// ✓ After UseAuthentication — needs HttpContext.User for per-user partitioning
// ✓ After UseAuthorization — consistent placement with other security middleware
// ✓ Before endpoint execution (MapControllers, MapGet, etc.)
//
// Placing it BEFORE UseAuthentication is a silent bug — HttpContext.User
// will be unauthenticated and per-user partition keys will always fall
// back to the anonymous path without any error or warning.
app.UseRouting();
app.UseAuthentication();
app.UseAuthorization();
app.UseRateLimiting(); // ← correct position
// ── Step 3: Bind named policies to endpoints ──────────────────────────────
// Minimal APIs: fluent chain
app.MapGet("/api/products", () => Results.Ok())
.RequireRateLimiting("fixed");
// Controllers: attribute on the class (applies to all actions)
// [EnableRateLimiting("per-user")]
// public class OrdersController : ControllerBase { ... }
// Controllers: attribute on a single action (overrides class-level)
// [EnableRateLimiting("sliding")]
// public IActionResult ExpensiveSearch() { ... }
// Exempt an endpoint from ALL policies including GlobalLimiter:
// app.MapGet("/healthz", HealthHandler).DisableRateLimiting();
// [DisableRateLimiting]
// public IActionResult HealthCheck() { ... }
// ── CRITICAL: Never rate-limit health check endpoints ─────────────────────
// Kubernetes liveness and readiness probes interpret any non-2xx response
// as an unhealthy signal. A 429 from a throttled /healthz will cause your
// pod to be restarted in a loop — a self-inflicted outage.
app.MapGet("/healthz", () => Results.Ok("healthy")); // no RequireRateLimiting here
app.Run();
The RejectionStatusCode = 429 line is the first thing to set in any rate limiting configuration. Without it, ASP.NET Core returns 503 Service Unavailable by default — which tells monitoring systems and load balancers that your service is down, potentially triggering failover or pod restarts. 429 Too Many Requests is the semantically correct code and the one every well-behaved HTTP client library knows how to handle.
Fixed Window vs Sliding Window: Choosing the Right Time-Based Algorithm
Both algorithms enforce an "N requests per time window" rule. The difference is what happens at the window boundary. Fixed Window resets its counter to zero at the end of each window — a client can exhaust the limit in the final second of one window and immediately exhaust the next window's limit, effectively doubling their burst rate across the boundary. Sliding Window eliminates that boundary burst by dividing each window into segments and weighting the count across the current and previous segment, so the effective limit holds across any sliding time period of the configured duration.
The practical decision is straightforward: start with Sliding Window. The memory and CPU overhead over Fixed Window is negligible for typical API workloads, and you eliminate an entire class of abuse pattern without any additional configuration.
builder.Services.AddRateLimiter(options =>
{
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
// ── Fixed Window ──────────────────────────────────────────────────────
// Simple counter that resets to zero every Window duration.
// Use when: the boundary burst is acceptable for your workload.
// Lowest memory overhead of all four algorithms.
options.AddFixedWindowLimiter("fixed", opt =>
{
opt.PermitLimit = 100; // requests per window
opt.Window = TimeSpan.FromSeconds(60); // window duration
opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
opt.QueueLimit = 0;
// QueueLimit = 0 → reject immediately when limit is reached
// QueueLimit > 0 → hold up to N requests in a queue and process
// them as permits become available in the next window.
// Useful for bursty but low-latency-tolerant workloads.
});
// ── The boundary burst problem — why Sliding Window exists ────────────
//
// Fixed Window PermitLimit=100, Window=60s:
//
// [0:00 ──────────────── 0:59] [1:00 ──────────────── 1:59]
// ↑ 100 req ↑ 100 req
// at 0:58 at 1:00
//
// → 200 requests in 2 seconds. Technically within policy. Potentially devastating.
//
// Sliding Window with the same PermitLimit and SegmentsPerWindow=6:
// The limiter calculates a weighted count across segment boundaries.
// No 2-second burst of 200 is ever permitted.
// ── Sliding Window ────────────────────────────────────────────────────
// Divides each window into SegmentsPerWindow equal segments.
// Calculates a weighted request count across current + previous segments.
// Use when: boundary bursts are unacceptable — auth flows, payments, OTPs.
options.AddSlidingWindowLimiter("sliding", opt =>
{
opt.PermitLimit = 100;
opt.Window = TimeSpan.FromSeconds(60);
opt.SegmentsPerWindow = 6; // 6 segments × 10s each
// Each segment tracks its own sub-counter.
// Memory cost: SegmentsPerWindow × counter
// per partition key.
opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
opt.QueueLimit = 0;
});
});
// ── When to use Fixed Window ─────────────────────────────────────────────
// Public read endpoints, search APIs, non-critical CRUD operations.
// Generous limits (100–1000 per minute) where boundary bursts are tolerable.
// Single-instance deployments where memory is a meaningful constraint.
// ── When to use Sliding Window ────────────────────────────────────────────
// Login endpoints, password reset, OTP verification, payment processing.
// Any endpoint where a boundary burst would cause account lockouts,
// financial impact, or downstream resource saturation.
// Default recommendation when Fixed Window has no specific advantage.
app.MapGet("/api/catalogue", () => Results.Ok()).RequireRateLimiting("fixed");
app.MapPost("/api/auth/login", () => Results.Ok()).RequireRateLimiting("sliding");
app.MapPost("/api/auth/reset-password", () => Results.Ok()).RequireRateLimiting("sliding");
The one rule that makes the choice mechanical: if the endpoint touches money, credentials, or any resource where a short burst causes irreversible side effects, use Sliding Window. For everything else, Fixed Window is a perfectly valid default — don't add complexity without a concrete reason.
Token Bucket & Concurrency Limiter: Beyond Time Windows
Token Bucket and Concurrency Limiter solve problems that time-window algorithms cannot. Token Bucket enforces a sustained average throughput while explicitly permitting short bursts up to the bucket capacity — the right shape for APIs that proxy to third-party services with their own rate limits. The Concurrency Limiter is categorically different from the other three: it limits how many requests are executing simultaneously, not how many arrived in a time window. A request holds a permit for the entire duration of its processing and releases it when the response completes.
builder.Services.AddRateLimiter(options =>
{
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
// ── Token Bucket ──────────────────────────────────────────────────────
// Tokens are added to a bucket at a fixed replenishment rate.
// Each request consumes one token. Requests are rejected when the bucket is empty.
// The bucket capacity defines the maximum burst size.
//
// Use when: enforcing a steady average throughput with a controlled burst allowance.
// Classic scenario: proxying to a third-party API that allows 20 req/s
// but tolerates short bursts up to 100.
options.AddTokenBucketLimiter("token-bucket", opt =>
{
opt.TokenLimit = 100; // bucket capacity = max burst
opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
opt.TokensPerPeriod = 20; // 20 tokens per 10s = 2 req/s average
opt.AutoReplenishment = true; // background timer refills the bucket
opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
opt.QueueLimit = 0;
// Sustained rate: 20 tokens ÷ 10s = 2 requests/second average
// Burst allowance: up to 100 requests when the bucket is full
// After a burst: the caller must wait for tokens to replenish
// before sending more requests
});
// ── Concurrency Limiter ───────────────────────────────────────────────
// Limits the number of requests executing IN PARALLEL — not per time window.
// A permit is acquired when the request enters the handler and
// released when the response is written. No time window involved.
//
// Use when: protecting a resource with a hard parallelism constraint.
// Examples: database connection pools, CPU-heavy report generation,
// external APIs that cap simultaneous connections.
options.AddConcurrencyLimiter("concurrent", opt =>
{
opt.PermitLimit = 10; // max 10 requests executing simultaneously
opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
opt.QueueLimit = 20; // queue up to 20 additional requests
// Total in-flight maximum: 10 + 20 = 30
// Request 31 receives an immediate 429
});
});
// ── Practical example: Concurrency Limiter for a report endpoint ──────────
// This endpoint calls a CPU-intensive PDF generation library.
// Allowing 50 parallel PDF renders would saturate the server's cores.
// A concurrency limit of 5 ensures the server stays responsive for all
// other endpoints even under heavy report load.
app.MapGet("/api/reports/generate", GenerateReportHandler)
.RequireRateLimiting("concurrent");
// ── Practical example: Token Bucket for a third-party SMS proxy ───────────
// The SMS provider allows 20 messages per 10 seconds with burst to 100.
// Token Bucket mirrors the provider's own policy exactly.
app.MapPost("/api/notifications/sms", SendSmsHandler)
.RequireRateLimiting("token-bucket");
// ── Algorithm selection guide ─────────────────────────────────────────────
// Fixed Window → simple N-per-minute, boundary burst tolerable, lowest overhead
// Sliding Window → same shape, no boundary burst — recommended default
// Token Bucket → steady average throughput + explicit burst allowance
// Concurrency → cap simultaneous in-flight requests, not arrival rate
The Concurrency Limiter is the most misunderstood of the four. Developers reach for it thinking "I want to limit requests" and then configure a PermitLimit of 100 — only to discover that under low traffic it never triggers because rarely are 100 requests executing at the exact same millisecond. It is not a throughput limiter. It is a parallelism limiter. If your database connection pool is 50 connections and one endpoint holds a connection open for the full request duration, a Concurrency Limiter set to 40 ensures the rest of your application always has connections available, regardless of how fast requests arrive.
Per-Client Rate Limiting: Partition Keys & User Identity
A single shared counter means one abusive caller burns through the limit for every other client on the same endpoint. Partitioned rate limiting gives each client its own independent counter — identified by IP address, authenticated user ID, API key, subscription tier, or any value you extract from the request context. One bad actor gets throttled; everyone else continues unaffected.
The partition key is the single most consequential decision in your rate limiting design. Partition by IP for unauthenticated public endpoints. Partition by user ID for endpoints behind [Authorize]. Partition by API key or subscription tier for multi-tenant developer APIs. Get the key wrong and you either throttle entire offices sharing a corporate NAT IP address, or you give each individual user a separate counter when a shared resource limit was the actual requirement.
using System.Security.Claims;
using System.Threading.RateLimiting;
builder.Services.AddRateLimiter(options =>
{
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
// ── Per-IP partitioned Fixed Window ───────────────────────────────────
// Each unique IP address gets its own 60-request-per-minute counter.
// One abusive IP is throttled; all other IPs are unaffected.
options.AddPolicy("per-ip", httpContext =>
RateLimitPartition.GetFixedWindowLimiter(
partitionKey: httpContext.Connection.RemoteIpAddress?.ToString()
?? "unknown", // fallback for requests behind a proxy
// where RemoteIpAddress can be null
factory: _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 60,
Window = TimeSpan.FromSeconds(60),
QueueLimit = 0
}));
// ── Per-user partitioned Sliding Window ───────────────────────────────
// Authenticated users get a higher limit keyed to their user ID.
// Anonymous callers get a stricter limit keyed to their IP address.
// This requires UseRateLimiting to appear after UseAuthentication
// in the pipeline — otherwise httpContext.User is always anonymous.
options.AddPolicy("per-user", httpContext =>
{
var userId = httpContext.User
.FindFirstValue(ClaimTypes.NameIdentifier);
// Authenticated path: generous limit keyed by stable user ID
if (!string.IsNullOrEmpty(userId))
{
return RateLimitPartition.GetSlidingWindowLimiter(
partitionKey: $"user:{userId}",
factory: _ => new SlidingWindowRateLimiterOptions
{
PermitLimit = 300,
Window = TimeSpan.FromSeconds(60),
SegmentsPerWindow = 6,
QueueLimit = 0
});
}
// Anonymous fallback: strict limit keyed by IP
return RateLimitPartition.GetSlidingWindowLimiter(
partitionKey: $"anon:{httpContext.Connection.RemoteIpAddress ?? "unknown"}",
factory: _ => new SlidingWindowRateLimiterOptions
{
PermitLimit = 30,
Window = TimeSpan.FromSeconds(60),
SegmentsPerWindow = 6,
QueueLimit = 0
});
});
// ── Per-API-key partitioned Token Bucket ──────────────────────────────
// For multi-tenant developer APIs — each API key gets its own token bucket.
// Extend this to implement subscription tiers: parse a "tier" claim
// from the token and return different TokenBucketRateLimiterOptions
// depending on whether the caller is on a Free, Pro, or Enterprise plan.
options.AddPolicy("per-apikey", httpContext =>
{
var apiKey = httpContext.Request.Headers["X-Api-Key"]
.FirstOrDefault()
?? httpContext.Connection.RemoteIpAddress?.ToString()
?? "unknown";
return RateLimitPartition.GetTokenBucketLimiter(
partitionKey: $"apikey:{apiKey}",
factory: _ => new TokenBucketRateLimiterOptions
{
TokenLimit = 200,
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 40,
AutoReplenishment = true,
QueueLimit = 0
});
});
});
// ── Bind partitioned policies to endpoints ────────────────────────────────
// Public unauthenticated endpoint — per-IP is the right key
app.MapGet("/api/catalogue", GetCatalogueHandler)
.RequireRateLimiting("per-ip");
// Authenticated endpoint — per authenticated user
app.MapPost("/api/orders", CreateOrderHandler)
.RequireRateLimiting("per-user");
// Developer API — per API key with token bucket burst control
app.MapGet("/api/v1/data", GetDataHandler)
.RequireRateLimiting("per-apikey");
// ── Optional: GlobalLimiter as a coarse backstop ──────────────────────────
// Applies to every endpoint that does not explicitly call RequireRateLimiting.
// Endpoints that DO call RequireRateLimiting use their named policy instead.
// Endpoints with DisableRateLimiting are exempt from GlobalLimiter too.
//
// options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext>(ctx =>
// RateLimitPartition.GetFixedWindowLimiter(
// partitionKey: ctx.Connection.RemoteIpAddress?.ToString() ?? "global",
// factory: _ => new FixedWindowRateLimiterOptions
// {
// PermitLimit = 1000,
// Window = TimeSpan.FromSeconds(60),
// QueueLimit = 0
// }));
Always prefix your partition keys with a namespace string — "user:", "anon:", "apikey:" — to prevent accidental key collisions between policies that happen to share a backing store. A user ID of "42" and an IP-derived key of "42" should not share a counter. The prefix costs nothing and eliminates an entire class of subtle, hard-to-reproduce throttling bugs in shared-vault deployments.
The 429 Response: Retry-After Headers & Structured Error Bodies
A bare 429 status code with no body and no headers is worse than useless in production. It tells the client they were rejected but gives them no information about when to retry, what policy they hit, or how to adjust their behaviour. The result is client developers implementing arbitrary exponential backoff guesses, support tickets from API consumers wondering why requests silently fail, and operations dashboards with no visibility into throttling events.
The OnRejected callback in AddRateLimiter is where you fix all of this at once: a structured RFC 9457 Problem Details body, a machine-readable Retry-After header, and a structured log event — without logging any sensitive values from the request.
builder.Services.AddRateLimiter(options =>
{
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
// ── OnRejected: runs every time a request is denied a permit ─────────
// This callback is shared across ALL named policies registered in this
// AddRateLimiter call. If you need policy-specific rejection bodies,
// read the policy name from httpContext.GetEndpoint()?.Metadata.
options.OnRejected = async (context, cancellationToken) =>
{
var httpContext = context.HttpContext;
var response = httpContext.Response;
response.StatusCode = StatusCodes.Status429TooManyRequests;
response.ContentType = "application/problem+json";
// ── Retry-After header ────────────────────────────────────────────
// The lease exposes RetryAfter metadata when the algorithm can
// calculate it — Fixed Window and Sliding Window always can.
// Token Bucket and Concurrency Limiter may return TimeSpan.Zero
// if they cannot determine a safe retry interval.
if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter)
&& retryAfter > TimeSpan.Zero)
{
// Retry-After header value: integer seconds (RFC 7231)
response.Headers["Retry-After"] = ((int)retryAfter.TotalSeconds).ToString();
}
// ── RFC 9457 Problem Details response body ────────────────────────
var retryAfterSeconds = context.Lease.TryGetMetadata(
MetadataName.RetryAfter, out var ra) && ra > TimeSpan.Zero
? (int?)((int)ra.TotalSeconds)
: null;
await response.WriteAsJsonAsync(new
{
type = "https://tools.ietf.org/html/rfc6585#section-4",
title = "Too Many Requests",
status = 429,
detail = "You have exceeded the allowed request rate. "
+ "Please wait before retrying.",
retryAfterSeconds
// Do NOT include the partition key or any request header values here —
// API keys, IPs, and user IDs should not appear in error response bodies.
}, cancellationToken: cancellationToken);
// ── Structured log event ──────────────────────────────────────────
// Log the rejection for observability — method, path, and IP only.
// Never log: Authorization headers, X-Api-Key values, query strings
// that might contain tokens, or cookie values.
var logger = httpContext.RequestServices
.GetRequiredService()
.CreateLogger("RateLimiting");
logger.LogWarning(
"Rate limit exceeded: {Method} {Path} from {RemoteIp} — RetryAfter {RetryAfterSeconds}s",
httpContext.Request.Method,
httpContext.Request.Path,
httpContext.Connection.RemoteIpAddress,
retryAfterSeconds?.ToString() ?? "unknown");
};
// Register your named policies below (fixed, sliding, per-user, etc.)
});
// ── What the client receives ──────────────────────────────────────────────
//
// HTTP/1.1 429 Too Many Requests
// Content-Type: application/problem+json
// Retry-After: 37
//
// {
// "type": "https://tools.ietf.org/html/rfc6585#section-4",
// "title": "Too Many Requests",
// "status": 429,
// "detail": "You have exceeded the allowed request rate. Please wait before retrying.",
// "retryAfterSeconds": 37
// }
//
// The Retry-After header is machine-readable — HttpClient's Polly retry
// policies, Axios interceptors, and the requests library all support
// reading it automatically to implement compliant back-off behaviour.
// The JSON body follows RFC 9457 for structured error handling in client code.
The OnRejected callback is not optional in a production API. It is the contract between your rate limiter and every client developer who integrates with your API. A callback that sets the status code, writes a structured body, and emits a Retry-After header takes fifteen minutes to implement and eliminates an entire category of integration friction — clients that retry immediately on 429, causing a retry storm, are almost always doing so because your API gave them no other guidance.