Why Your Work Queue Needs a Capacity Limit
System.Threading.Channels is the correct in-process work queue primitive in .NET 8. It is fast, allocation-efficient, async-native, and composes cleanly with BackgroundService consumers. It also has a default mode — Channel.CreateUnbounded<T>() — that will silently consume all available memory when producers outrun consumers, and a bounded mode — Channel.CreateBounded<T>() — that most teams reach for without fully understanding what happens when the channel is full.
The gap between a channel that works in testing and one that survives production comes down to three decisions: what capacity limit is correct for the workload, what the channel does when it reaches that limit (backpressure strategy), and what happens when a job in the queue fails repeatedly without being acknowledged. Get these three decisions right and a Channel<T> is a robust, observable, production-grade work queue. Get them wrong and you have either an unbounded memory consumer, a queue that silently drops work under load, or a single poison job that stalls every consumer goroutine until the process is restarted.
Bounded vs Unbounded: The Memory Blowup Risk
An unbounded channel has no capacity limit. Items written to it are held in memory until a consumer reads them. If your producer writes faster than your consumer processes — a burst of incoming requests, a slow downstream dependency, a consumer crash — the channel buffer grows indefinitely. There is no backpressure signal. No error. The process eventually runs out of memory and is killed by the OS or container runtime, taking all queued work with it. Unbounded channels are appropriate only when the producer rate is provably bounded by the consumer rate — which is rarely true under production load.
// ════════════════════════════════════════════════════════════════════════════
// UNBOUNDED CHANNEL — memory blowup risk under producer/consumer rate mismatch
// ════════════════════════════════════════════════════════════════════════════
// WRONG for production workloads where producers can outpace consumers:
var unbounded = Channel.CreateUnbounded();
// No capacity limit. If 10,000 requests arrive during a downstream slowdown,
// 10,000 JobItems accumulate in memory. The process is OOM-killed.
// No error is surfaced. No backpressure reaches the HTTP caller.
// The pod is restarted. All 10,000 jobs are lost.
// ════════════════════════════════════════════════════════════════════════════
// BOUNDED CHANNEL — capacity limit with explicit full-mode strategy
// ════════════════════════════════════════════════════════════════════════════
// The three options for BoundedChannelFullMode:
//
// Wait — WriteAsync blocks until a slot is free.
// Backpressure propagates up the call stack to the producer.
// Correct when the producer can safely slow down.
//
// DropWrite — The incoming item is silently discarded.
// WriteAsync returns immediately with result false.
// Correct for best-effort, non-critical work (telemetry, audit logs).
//
// DropOldest — The oldest item in the channel is discarded to make room.
// Correct when newer work supersedes older work (live sensor readings,
// price tick updates where stale data is worthless).
// ── Production default: Wait + single-writer optimisation ─────────────────
var jobChannel = Channel.CreateBounded(new BoundedChannelOptions(capacity: 500)
{
FullMode = BoundedChannelFullMode.Wait,
SingleWriter = false, // multiple HTTP request handlers enqueue concurrently
SingleReader = false, // multiple consumer workers dequeue concurrently
AllowSynchronousContinuations = false
// AllowSynchronousContinuations = false ensures continuations run on the thread pool,
// not inline on the writer's thread — prevents unexpected re-entrancy and stack overflows
// under high write throughput. Safe default. Set true only if profiling shows benefit.
});
// ── Registered as a singleton in DI so producers and consumers share the same instance:
builder.Services.AddSingleton(jobChannel);
builder.Services.AddSingleton(jobChannel.Reader); // consumers inject ChannelReader
builder.Services.AddSingleton(jobChannel.Writer); // producers inject ChannelWriter
// Separate Reader/Writer registration enforces the direction contract at the type level —
// a consumer that only needs to read cannot accidentally write, and vice versa.
// ════════════════════════════════════════════════════════════════════════════
// CAPACITY SIZING HEURISTIC
// ════════════════════════════════════════════════════════════════════════════
//
// Start with: burst_rate_per_second × burst_duration_seconds × 2
//
// Example: API receives up to 50 requests/second, each enqueuing one job.
// Consumer processes one job per 100ms with 4 concurrent workers
// → throughput: 40 jobs/second.
// A 10-second downstream slowdown produces 500 queued jobs.
// Capacity = 500 × 2 (headroom) = 1000.
//
// Monitor Channel.Reader.Count in production.
// Consistently near capacity → increase consumer concurrency or capacity.
// Consistently near zero → capacity is oversized for the workload.
The SingleWriter and SingleReader flags are optimisation hints that allow the channel implementation to skip certain thread-safety mechanisms when only one producer or consumer is active. Setting SingleWriter = true when multiple HTTP request handlers write concurrently is a correctness bug — concurrent writes without the thread-safety path produce corrupted channel state. Use the flags only when you can statically guarantee a single writer or reader. The default of false for both is always correct; the flags are opt-in performance optimisations for specific topologies, not defaults to override casually.
Backpressure: Propagating the Signal Back to the Producer
Backpressure is what happens when a bounded channel is full and the producer tries to write. With FullMode.Wait, the WriteAsync call suspends until a slot opens — this is backpressure working correctly. The suspension propagates up the call stack: the HTTP handler awaiting the write waits, the ASP.NET Core thread that was handling the request waits, and the client connection is held open. In effect, the queue's capacity limit becomes a load-shedding signal that naturally throttles incoming request rate when the system is under load.
// ════════════════════════════════════════════════════════════════════════════
// PRODUCER PATTERN: FullMode.Wait — backpressure to HTTP caller
// ════════════════════════════════════════════════════════════════════════════
[ApiController]
[Route("api/jobs")]
public class JobController : ControllerBase
{
private readonly ChannelWriter _writer;
private readonly ILogger _logger;
public JobController(ChannelWriter writer, ILogger logger)
=> (_writer, _logger) = (writer, logger);
[HttpPost]
public async Task EnqueueJob(
[FromBody] EnqueueJobRequest request,
CancellationToken cancellationToken)
{
var job = new JobItem
{
Id = Guid.NewGuid(),
Payload = request.Payload,
EnqueuedAt = DateTimeOffset.UtcNow,
Attempts = 0
};
// ── FullMode.Wait: WriteAsync suspends when channel is full ───────
// The HTTP request is held open. The client waits.
// When a consumer processes an item and frees a slot, WriteAsync completes.
// cancellationToken: client disconnect or request timeout aborts the wait —
// the job is not enqueued, the client receives 499/408 — correct behaviour.
try
{
await _writer.WriteAsync(job, cancellationToken);
}
catch (OperationCanceledException)
{
// Client disconnected or request timed out while waiting for a channel slot.
// The job was never enqueued — no orphaned work.
_logger.LogWarning("Job enqueue cancelled — channel full and client disconnected.");
return StatusCode(503, "Service temporarily at capacity. Retry after backoff.");
}
catch (ChannelClosedException)
{
// Channel was closed (application shutting down) — reject new work.
_logger.LogWarning("Job enqueue rejected — channel closed (shutdown in progress).");
return StatusCode(503, "Service shutting down. Retry after restart.");
}
_logger.LogInformation("Job {Id} enqueued.", job.Id);
return Accepted(new { job.Id });
}
}
// ════════════════════════════════════════════════════════════════════════════
// PRODUCER PATTERN: TryWrite — non-blocking with explicit drop handling
// ════════════════════════════════════════════════════════════════════════════
// Use when the producer cannot afford to wait — telemetry, audit events,
// fire-and-forget notifications where dropping under load is acceptable.
public class TelemetryPublisher
{
private readonly ChannelWriter _writer;
private readonly ILogger _logger;
private long _droppedCount;
public TelemetryPublisher(ChannelWriter writer, ILogger logger)
=> (_writer, _logger) = (writer, logger);
public void Publish(TelemetryEvent evt)
{
if (!_writer.TryWrite(evt))
{
// Channel is full — event is dropped.
// Increment counter for observability — surface via metrics endpoint.
var dropped = Interlocked.Increment(ref _droppedCount);
// Log at Warning, not Error — dropping telemetry is expected under load.
// Log periodically (not every drop) to avoid log flood.
if (dropped % 100 == 0)
{
_logger.LogWarning(
"Telemetry channel full. {Count} events dropped since startup.", dropped);
}
}
}
// Expose for metrics scraping (Prometheus, OpenTelemetry):
public long DroppedEventCount => Interlocked.Read(ref _droppedCount);
}
// ════════════════════════════════════════════════════════════════════════════
// BACKPRESSURE STRATEGY DECISION REFERENCE
// ════════════════════════════════════════════════════════════════════════════
//
// Work type FullMode Rationale
// ───────────────────────────────── ──────────────── ──────────────────────────────
// HTTP-initiated jobs (critical) Wait Backpressure to caller; 503 on
// timeout — correct load shedding
// Background polling tasks Wait Polling rate naturally throttled
// Telemetry / audit events DropWrite Dropping acceptable; never block
// Live data (sensor, price tick) DropOldest Newest data always preferred
// Email / notification dispatch Wait Losing messages unacceptable
The 503 response on OperationCanceledException from a full channel is not an error — it is the correct API contract under load. A client that receives a 503 with a Retry-After header knows the service is temporarily at capacity and can implement exponential backoff. A client whose request is held open indefinitely while the queue drains is holding a connection and a server thread for an unpredictable duration. Prefer a fast, honest 503 over a slow, silent wait when the queue is full and the client has a timeout shorter than your expected drain time. Configure your HTTP client's timeout on the caller side accordingly.
Poison Job Handling: Isolation, Retry Limits & Dead-Letter
A poison job is one that fails every processing attempt — due to corrupt data, a bug in the handler, an incompatible payload shape, or a transient external dependency that has become permanently unavailable for that specific job. Without explicit poison job handling, the consumer retries the job indefinitely, consuming processing capacity while blocking or slowing the processing of every healthy job behind it in the queue. The correct pattern catches failures at the per-job level, tracks attempt counts in the job envelope, and routes jobs that exceed the retry threshold to a dead-letter channel where they can be inspected, requeued, or discarded without affecting the main processing pipeline.
// ════════════════════════════════════════════════════════════════════════════
// JOB ENVELOPE: carries retry state alongside the payload
// ════════════════════════════════════════════════════════════════════════════
public sealed record JobItem
{
public Guid Id { get; init; } = Guid.NewGuid();
public string Payload { get; init; } = "";
public DateTimeOffset EnqueuedAt { get; init; } = DateTimeOffset.UtcNow;
public int Attempts { get; init; } = 0;
public string? LastError { get; init; }
public DateTimeOffset? LastFailedAt { get; init; }
// Immutable retry: create a new envelope with incremented state
public JobItem WithFailure(Exception ex) => this with
{
Attempts = Attempts + 1,
LastError = ex.Message,
LastFailedAt = DateTimeOffset.UtcNow
};
}
// ════════════════════════════════════════════════════════════════════════════
// CONSUMER WITH POISON JOB ISOLATION
// ════════════════════════════════════════════════════════════════════════════
public class JobConsumerWorker : BackgroundService
{
private const int MaxAttempts = 3;
private readonly ChannelReader _reader;
private readonly ChannelWriter _writer; // re-enqueue for retry
private readonly ChannelWriter _deadLetterWriter;
private readonly IJobProcessor _processor;
private readonly ILogger _logger;
public JobConsumerWorker(
ChannelReader reader,
ChannelWriter writer,
ChannelWriter deadLetterWriter,
IJobProcessor processor,
ILogger logger)
{
(_reader, _writer, _deadLetterWriter, _processor, _logger)
= (reader, writer, deadLetterWriter, processor, logger);
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
// ReadAllAsync yields items as they arrive — no polling, no busy-wait.
// Completes when the channel is closed AND empty (graceful shutdown path).
await foreach (var job in _reader.ReadAllAsync(stoppingToken))
{
await ProcessJobSafelyAsync(job, stoppingToken);
}
}
private async Task ProcessJobSafelyAsync(JobItem job, CancellationToken ct)
{
try
{
await _processor.ProcessAsync(job, ct);
_logger.LogInformation("Job {Id} processed successfully.", job.Id);
}
catch (OperationCanceledException)
{
// Host shutting down — do not retry, do not dead-letter.
// The job will be redelivered if the producer supports replay.
throw;
}
catch (Exception ex)
{
var failed = job.WithFailure(ex);
if (failed.Attempts >= MaxAttempts)
{
// ── Poison job: move to dead-letter channel ───────────────
_logger.LogError(ex,
"Job {Id} failed {Attempts} times. Moving to dead-letter.",
failed.Id, failed.Attempts);
// TryWrite to dead-letter: if dead-letter channel is also full,
// log and discard rather than blocking the consumer goroutine.
if (!_deadLetterWriter.TryWrite(failed))
{
_logger.LogCritical(
"Dead-letter channel full. Job {Id} discarded permanently.",
failed.Id);
}
}
else
{
// ── Transient failure: re-enqueue with backoff delay ──────
var delay = TimeSpan.FromSeconds(Math.Pow(2, failed.Attempts));
_logger.LogWarning(ex,
"Job {Id} failed (attempt {Attempts}/{Max}). Retrying in {Delay}s.",
failed.Id, failed.Attempts, MaxAttempts, delay.TotalSeconds);
// Delay before re-enqueue — pass ct so shutdown aborts the wait
await Task.Delay(delay, ct);
// Re-enqueue the failed job with updated attempt count.
// If the main channel is full, TryWrite returns false — log the loss.
if (!_writer.TryWrite(failed))
{
_logger.LogError(
"Main channel full during retry re-enqueue. Job {Id} lost.",
failed.Id);
}
}
}
}
}
// ════════════════════════════════════════════════════════════════════════════
// DEAD-LETTER CHANNEL: registration & inspection worker
// ════════════════════════════════════════════════════════════════════════════
// Register a separate bounded channel for dead-lettered jobs
var deadLetterChannel = Channel.CreateBounded(new BoundedChannelOptions(100)
{
FullMode = BoundedChannelFullMode.DropWrite // drop rather than block the consumer
});
builder.Services.AddSingleton(deadLetterChannel.Writer);
// Dead-letter inspection worker — logs, persists to DB, or alerts on failures
public class DeadLetterWorker : BackgroundService
{
private readonly ChannelReader _reader;
private readonly IDeadLetterRepository _repository;
private readonly ILogger _logger;
public DeadLetterWorker(
ChannelReader reader,
IDeadLetterRepository repository,
ILogger logger)
=> (_reader, _repository, _logger) = (reader, repository, logger);
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
await foreach (var job in _reader.ReadAllAsync(stoppingToken))
{
_logger.LogError(
"Dead-letter job {Id} — {Attempts} attempts, last error: {Error}",
job.Id, job.Attempts, job.LastError);
// Persist for manual inspection / requeue tooling
await _repository.SaveAsync(job, stoppingToken);
}
}
}
The TryWrite call on the dead-letter channel when it is full — and the explicit log of the loss — is the honest design choice in a bounded in-process queue. Unlike a durable message broker, there is no guaranteed delivery mechanism. If both the main channel and the dead-letter channel are full simultaneously, the job is lost and the only record is a log entry. This is the correct trade-off to acknowledge in code and in documentation: in-process channels trade durability for simplicity and speed. Teams with zero-loss requirements for failed jobs should persist to a database or route to a durable broker at the dead-letter stage, not rely on the in-process dead-letter channel as their only failure record.
Deduplication & Observability: Knowing What the Queue Is Doing
Two production concerns that are easy to defer until they become incidents: deduplication — preventing the same job from being enqueued multiple times when a retry-heavy producer fires repeatedly — and observability — knowing in real time how full the channel is, how many jobs have been processed, dropped, or dead-lettered, and whether consumers are keeping up. Neither requires complex infrastructure — both can be implemented against the Channel<T> API directly.
// ════════════════════════════════════════════════════════════════════════════
// DEDUPLICATION: idempotent enqueue with in-memory tracking set
// ════════════════════════════════════════════════════════════════════════════
// Suitable when: the dedup window fits in memory, and approximate dedup is
// acceptable (the tracking set is lost on restart).
// For exact dedup across restarts: check a Redis SET or database unique index
// before enqueuing, and remove the entry after successful processing.
public sealed class DeduplicatingJobQueue
{
private readonly ChannelWriter _writer;
private readonly ConcurrentDictionary _inflight;
private readonly ILogger _logger;
public DeduplicatingJobQueue(
ChannelWriter writer,
ILogger logger)
{
_writer = writer;
_inflight = new ConcurrentDictionary(StringComparer.Ordinal);
_logger = logger;
}
public async ValueTask TryEnqueueAsync(
JobItem job,
CancellationToken ct = default)
{
// Dedup key: use a natural business key (e.g., "report:user-123:2026-02-27")
// rather than the random Job.Id — Id-based dedup would never deduplicate.
var dedupKey = job.Payload; // replace with job.BusinessKey in real usage
if (!_inflight.TryAdd(dedupKey, 0))
{
_logger.LogDebug("Job {Key} already queued — duplicate suppressed.", dedupKey);
return false; // duplicate — not enqueued
}
try
{
await _writer.WriteAsync(job, ct);
return true;
}
catch
{
// Enqueue failed — remove from tracking so it can be retried
_inflight.TryRemove(dedupKey, out _);
throw;
}
}
// Called by the consumer after successful processing:
public void MarkComplete(string dedupKey)
=> _inflight.TryRemove(dedupKey, out _);
}
// ════════════════════════════════════════════════════════════════════════════
// OBSERVABILITY: channel metrics via IHostedService + Channel.Reader.Count
// ════════════════════════════════════════════════════════════════════════════
public class ChannelMetricsWorker : BackgroundService
{
private readonly ChannelReader _reader;
private readonly IMeterFactory _meterFactory;
private readonly ILogger _logger;
// OpenTelemetry / System.Diagnostics.Metrics instruments
private readonly ObservableGauge _depthGauge;
private readonly Counter _processedCounter;
private readonly Counter _droppedCounter;
public ChannelMetricsWorker(
ChannelReader reader,
IMeterFactory meterFactory,
ILogger logger)
{
_reader = reader;
_meterFactory = meterFactory;
_logger = logger;
var meter = meterFactory.Create("MyApp.JobQueue");
// Observable gauge: current depth — sampled when metrics are collected
_depthGauge = meter.CreateObservableGauge(
name: "job_queue_depth",
observeValue: () => _reader.Count,
unit: "{jobs}",
description: "Number of jobs currently waiting in the channel buffer.");
_processedCounter = meter.CreateCounter(
name: "job_queue_processed_total",
unit: "{jobs}",
description: "Total jobs successfully processed.");
_droppedCounter = meter.CreateCounter(
name: "job_queue_dropped_total",
unit: "{jobs}",
description: "Total jobs dropped due to full channel or dead-letter overflow.");
}
// Increment these from the consumer and producer respectively:
public void RecordProcessed() => _processedCounter.Add(1);
public void RecordDropped() => _droppedCounter.Add(1);
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
// Periodic depth logging as a fallback when metrics scraping is unavailable
using var timer = new PeriodicTimer(TimeSpan.FromSeconds(30));
while (await timer.WaitForNextTickAsync(stoppingToken))
{
var depth = _reader.Count;
if (depth > 0)
{
_logger.LogInformation("Job queue depth: {Depth} items pending.", depth);
}
}
}
}
// ── Alerting thresholds (configure in your metrics / alerting platform) ───
//
// job_queue_depth > capacity * 0.80 → Warning: consumers falling behind
// job_queue_depth == capacity → Critical: channel at limit, backpressure active
// job_queue_dropped_total rate > 0 → Warning: jobs being dropped (DropWrite mode)
// job_queue_processed_total rate → 0 → Critical: consumers have stopped processing
The Channel.Reader.Count property is the single most useful production signal for a bounded channel — it tells you at any instant how many items are waiting to be processed. An ObservableGauge over this value, scraped by Prometheus or emitted to OpenTelemetry, gives you a depth time series that shows queue buildup during consumer slowdowns, drain patterns after they recover, and the steady-state operating depth under normal load. Set an alert threshold at 80% of channel capacity — not 100%, because by the time a channel hits 100% it is already applying backpressure to producers, and you want to know before that happens so you can scale consumers or investigate the slowdown.