.NET 8 Channel-Based Queue: Bounded Capacity, Backpressure & Poison Job Handling

Why Your Work Queue Needs a Capacity Limit

System.Threading.Channels is the correct in-process work queue primitive in .NET 8. It is fast, allocation-efficient, async-native, and composes cleanly with BackgroundService consumers. It also has a default mode — Channel.CreateUnbounded<T>() — that will silently consume all available memory when producers outrun consumers, and a bounded mode — Channel.CreateBounded<T>() — that most teams reach for without fully understanding what happens when the channel is full.

The gap between a channel that works in testing and one that survives production comes down to three decisions: what capacity limit is correct for the workload, what the channel does when it reaches that limit (backpressure strategy), and what happens when a job in the queue fails repeatedly without being acknowledged. Get these three decisions right and a Channel<T> is a robust, observable, production-grade work queue. Get them wrong and you have either an unbounded memory consumer, a queue that silently drops work under load, or a single poison job that stalls every consumer goroutine until the process is restarted.

Bounded vs Unbounded: The Memory Blowup Risk

An unbounded channel has no capacity limit. Items written to it are held in memory until a consumer reads them. If your producer writes faster than your consumer processes — a burst of incoming requests, a slow downstream dependency, a consumer crash — the channel buffer grows indefinitely. There is no backpressure signal. No error. The process eventually runs out of memory and is killed by the OS or container runtime, taking all queued work with it. Unbounded channels are appropriate only when the producer rate is provably bounded by the consumer rate — which is rarely true under production load.

ChannelCreation.cs — Unbounded Risk vs Bounded Configuration
// ════════════════════════════════════════════════════════════════════════════
// UNBOUNDED CHANNEL — memory blowup risk under producer/consumer rate mismatch
// ════════════════════════════════════════════════════════════════════════════

// WRONG for production workloads where producers can outpace consumers:
var unbounded = Channel.CreateUnbounded();
// No capacity limit. If 10,000 requests arrive during a downstream slowdown,
// 10,000 JobItems accumulate in memory. The process is OOM-killed.
// No error is surfaced. No backpressure reaches the HTTP caller.
// The pod is restarted. All 10,000 jobs are lost.


// ════════════════════════════════════════════════════════════════════════════
// BOUNDED CHANNEL — capacity limit with explicit full-mode strategy
// ════════════════════════════════════════════════════════════════════════════

// The three options for BoundedChannelFullMode:
//
// Wait        — WriteAsync blocks until a slot is free.
//               Backpressure propagates up the call stack to the producer.
//               Correct when the producer can safely slow down.
//
// DropWrite   — The incoming item is silently discarded.
//               WriteAsync returns immediately with result false.
//               Correct for best-effort, non-critical work (telemetry, audit logs).
//
// DropOldest  — The oldest item in the channel is discarded to make room.
//               Correct when newer work supersedes older work (live sensor readings,
//               price tick updates where stale data is worthless).

// ── Production default: Wait + single-writer optimisation ─────────────────
var jobChannel = Channel.CreateBounded(new BoundedChannelOptions(capacity: 500)
{
    FullMode          = BoundedChannelFullMode.Wait,
    SingleWriter      = false,   // multiple HTTP request handlers enqueue concurrently
    SingleReader      = false,   // multiple consumer workers dequeue concurrently
    AllowSynchronousContinuations = false
    // AllowSynchronousContinuations = false ensures continuations run on the thread pool,
    // not inline on the writer's thread — prevents unexpected re-entrancy and stack overflows
    // under high write throughput. Safe default. Set true only if profiling shows benefit.
});

// ── Registered as a singleton in DI so producers and consumers share the same instance:
builder.Services.AddSingleton(jobChannel);
builder.Services.AddSingleton(jobChannel.Reader);  // consumers inject ChannelReader
builder.Services.AddSingleton(jobChannel.Writer);  // producers inject ChannelWriter
// Separate Reader/Writer registration enforces the direction contract at the type level —
// a consumer that only needs to read cannot accidentally write, and vice versa.


// ════════════════════════════════════════════════════════════════════════════
// CAPACITY SIZING HEURISTIC
// ════════════════════════════════════════════════════════════════════════════
//
// Start with: burst_rate_per_second × burst_duration_seconds × 2
//
// Example: API receives up to 50 requests/second, each enqueuing one job.
//          Consumer processes one job per 100ms with 4 concurrent workers
//          → throughput: 40 jobs/second.
//          A 10-second downstream slowdown produces 500 queued jobs.
//          Capacity = 500 × 2 (headroom) = 1000.
//
// Monitor Channel.Reader.Count in production.
// Consistently near capacity → increase consumer concurrency or capacity.
// Consistently near zero     → capacity is oversized for the workload.

The SingleWriter and SingleReader flags are optimisation hints that allow the channel implementation to skip certain thread-safety mechanisms when only one producer or consumer is active. Setting SingleWriter = true when multiple HTTP request handlers write concurrently is a correctness bug — concurrent writes without the thread-safety path produce corrupted channel state. Use the flags only when you can statically guarantee a single writer or reader. The default of false for both is always correct; the flags are opt-in performance optimisations for specific topologies, not defaults to override casually.

Backpressure: Propagating the Signal Back to the Producer

Backpressure is what happens when a bounded channel is full and the producer tries to write. With FullMode.Wait, the WriteAsync call suspends until a slot opens — this is backpressure working correctly. The suspension propagates up the call stack: the HTTP handler awaiting the write waits, the ASP.NET Core thread that was handling the request waits, and the client connection is held open. In effect, the queue's capacity limit becomes a load-shedding signal that naturally throttles incoming request rate when the system is under load.

Backpressure.cs — Wait Strategy, Drop Strategy & Producer Patterns
// ════════════════════════════════════════════════════════════════════════════
// PRODUCER PATTERN: FullMode.Wait — backpressure to HTTP caller
// ════════════════════════════════════════════════════════════════════════════

[ApiController]
[Route("api/jobs")]
public class JobController : ControllerBase
{
    private readonly ChannelWriter _writer;
    private readonly ILogger _logger;

    public JobController(ChannelWriter writer, ILogger logger)
        => (_writer, _logger) = (writer, logger);

    [HttpPost]
    public async Task EnqueueJob(
        [FromBody] EnqueueJobRequest request,
        CancellationToken cancellationToken)
    {
        var job = new JobItem
        {
            Id        = Guid.NewGuid(),
            Payload   = request.Payload,
            EnqueuedAt = DateTimeOffset.UtcNow,
            Attempts  = 0
        };

        // ── FullMode.Wait: WriteAsync suspends when channel is full ───────
        // The HTTP request is held open. The client waits.
        // When a consumer processes an item and frees a slot, WriteAsync completes.
        // cancellationToken: client disconnect or request timeout aborts the wait —
        // the job is not enqueued, the client receives 499/408 — correct behaviour.
        try
        {
            await _writer.WriteAsync(job, cancellationToken);
        }
        catch (OperationCanceledException)
        {
            // Client disconnected or request timed out while waiting for a channel slot.
            // The job was never enqueued — no orphaned work.
            _logger.LogWarning("Job enqueue cancelled — channel full and client disconnected.");
            return StatusCode(503, "Service temporarily at capacity. Retry after backoff.");
        }
        catch (ChannelClosedException)
        {
            // Channel was closed (application shutting down) — reject new work.
            _logger.LogWarning("Job enqueue rejected — channel closed (shutdown in progress).");
            return StatusCode(503, "Service shutting down. Retry after restart.");
        }

        _logger.LogInformation("Job {Id} enqueued.", job.Id);
        return Accepted(new { job.Id });
    }
}


// ════════════════════════════════════════════════════════════════════════════
// PRODUCER PATTERN: TryWrite — non-blocking with explicit drop handling
// ════════════════════════════════════════════════════════════════════════════
// Use when the producer cannot afford to wait — telemetry, audit events,
// fire-and-forget notifications where dropping under load is acceptable.

public class TelemetryPublisher
{
    private readonly ChannelWriter _writer;
    private readonly ILogger _logger;
    private long _droppedCount;

    public TelemetryPublisher(ChannelWriter writer, ILogger logger)
        => (_writer, _logger) = (writer, logger);

    public void Publish(TelemetryEvent evt)
    {
        if (!_writer.TryWrite(evt))
        {
            // Channel is full — event is dropped.
            // Increment counter for observability — surface via metrics endpoint.
            var dropped = Interlocked.Increment(ref _droppedCount);

            // Log at Warning, not Error — dropping telemetry is expected under load.
            // Log periodically (not every drop) to avoid log flood.
            if (dropped % 100 == 0)
            {
                _logger.LogWarning(
                    "Telemetry channel full. {Count} events dropped since startup.", dropped);
            }
        }
    }

    // Expose for metrics scraping (Prometheus, OpenTelemetry):
    public long DroppedEventCount => Interlocked.Read(ref _droppedCount);
}


// ════════════════════════════════════════════════════════════════════════════
// BACKPRESSURE STRATEGY DECISION REFERENCE
// ════════════════════════════════════════════════════════════════════════════
//
// Work type                          FullMode          Rationale
// ─────────────────────────────────  ────────────────  ──────────────────────────────
// HTTP-initiated jobs (critical)     Wait              Backpressure to caller; 503 on
//                                                      timeout — correct load shedding
// Background polling tasks           Wait              Polling rate naturally throttled
// Telemetry / audit events           DropWrite         Dropping acceptable; never block
// Live data (sensor, price tick)     DropOldest        Newest data always preferred
// Email / notification dispatch      Wait              Losing messages unacceptable

The 503 response on OperationCanceledException from a full channel is not an error — it is the correct API contract under load. A client that receives a 503 with a Retry-After header knows the service is temporarily at capacity and can implement exponential backoff. A client whose request is held open indefinitely while the queue drains is holding a connection and a server thread for an unpredictable duration. Prefer a fast, honest 503 over a slow, silent wait when the queue is full and the client has a timeout shorter than your expected drain time. Configure your HTTP client's timeout on the caller side accordingly.

Poison Job Handling: Isolation, Retry Limits & Dead-Letter

A poison job is one that fails every processing attempt — due to corrupt data, a bug in the handler, an incompatible payload shape, or a transient external dependency that has become permanently unavailable for that specific job. Without explicit poison job handling, the consumer retries the job indefinitely, consuming processing capacity while blocking or slowing the processing of every healthy job behind it in the queue. The correct pattern catches failures at the per-job level, tracks attempt counts in the job envelope, and routes jobs that exceed the retry threshold to a dead-letter channel where they can be inspected, requeued, or discarded without affecting the main processing pipeline.

PoisonJobHandling.cs — Job Envelope, Retry Tracking & Dead-Letter Channel
// ════════════════════════════════════════════════════════════════════════════
// JOB ENVELOPE: carries retry state alongside the payload
// ════════════════════════════════════════════════════════════════════════════

public sealed record JobItem
{
    public Guid            Id           { get; init; } = Guid.NewGuid();
    public string          Payload      { get; init; } = "";
    public DateTimeOffset  EnqueuedAt   { get; init; } = DateTimeOffset.UtcNow;
    public int             Attempts     { get; init; } = 0;
    public string?         LastError    { get; init; }
    public DateTimeOffset? LastFailedAt { get; init; }

    // Immutable retry: create a new envelope with incremented state
    public JobItem WithFailure(Exception ex) => this with
    {
        Attempts    = Attempts + 1,
        LastError   = ex.Message,
        LastFailedAt = DateTimeOffset.UtcNow
    };
}


// ════════════════════════════════════════════════════════════════════════════
// CONSUMER WITH POISON JOB ISOLATION
// ════════════════════════════════════════════════════════════════════════════

public class JobConsumerWorker : BackgroundService
{
    private const int MaxAttempts = 3;

    private readonly ChannelReader  _reader;
    private readonly ChannelWriter  _writer;         // re-enqueue for retry
    private readonly ChannelWriter  _deadLetterWriter;
    private readonly IJobProcessor           _processor;
    private readonly ILogger _logger;

    public JobConsumerWorker(
        ChannelReader  reader,
        ChannelWriter  writer,
        ChannelWriter  deadLetterWriter,
        IJobProcessor           processor,
        ILogger logger)
    {
        (_reader, _writer, _deadLetterWriter, _processor, _logger)
            = (reader, writer, deadLetterWriter, processor, logger);
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        // ReadAllAsync yields items as they arrive — no polling, no busy-wait.
        // Completes when the channel is closed AND empty (graceful shutdown path).
        await foreach (var job in _reader.ReadAllAsync(stoppingToken))
        {
            await ProcessJobSafelyAsync(job, stoppingToken);
        }
    }

    private async Task ProcessJobSafelyAsync(JobItem job, CancellationToken ct)
    {
        try
        {
            await _processor.ProcessAsync(job, ct);
            _logger.LogInformation("Job {Id} processed successfully.", job.Id);
        }
        catch (OperationCanceledException)
        {
            // Host shutting down — do not retry, do not dead-letter.
            // The job will be redelivered if the producer supports replay.
            throw;
        }
        catch (Exception ex)
        {
            var failed = job.WithFailure(ex);

            if (failed.Attempts >= MaxAttempts)
            {
                // ── Poison job: move to dead-letter channel ───────────────
                _logger.LogError(ex,
                    "Job {Id} failed {Attempts} times. Moving to dead-letter.",
                    failed.Id, failed.Attempts);

                // TryWrite to dead-letter: if dead-letter channel is also full,
                // log and discard rather than blocking the consumer goroutine.
                if (!_deadLetterWriter.TryWrite(failed))
                {
                    _logger.LogCritical(
                        "Dead-letter channel full. Job {Id} discarded permanently.",
                        failed.Id);
                }
            }
            else
            {
                // ── Transient failure: re-enqueue with backoff delay ──────
                var delay = TimeSpan.FromSeconds(Math.Pow(2, failed.Attempts));
                _logger.LogWarning(ex,
                    "Job {Id} failed (attempt {Attempts}/{Max}). Retrying in {Delay}s.",
                    failed.Id, failed.Attempts, MaxAttempts, delay.TotalSeconds);

                // Delay before re-enqueue — pass ct so shutdown aborts the wait
                await Task.Delay(delay, ct);

                // Re-enqueue the failed job with updated attempt count.
                // If the main channel is full, TryWrite returns false — log the loss.
                if (!_writer.TryWrite(failed))
                {
                    _logger.LogError(
                        "Main channel full during retry re-enqueue. Job {Id} lost.",
                        failed.Id);
                }
            }
        }
    }
}


// ════════════════════════════════════════════════════════════════════════════
// DEAD-LETTER CHANNEL: registration & inspection worker
// ════════════════════════════════════════════════════════════════════════════

// Register a separate bounded channel for dead-lettered jobs
var deadLetterChannel = Channel.CreateBounded(new BoundedChannelOptions(100)
{
    FullMode = BoundedChannelFullMode.DropWrite   // drop rather than block the consumer
});
builder.Services.AddSingleton(deadLetterChannel.Writer);

// Dead-letter inspection worker — logs, persists to DB, or alerts on failures
public class DeadLetterWorker : BackgroundService
{
    private readonly ChannelReader       _reader;
    private readonly IDeadLetterRepository        _repository;
    private readonly ILogger    _logger;

    public DeadLetterWorker(
        ChannelReader  reader,
        IDeadLetterRepository   repository,
        ILogger logger)
        => (_reader, _repository, _logger) = (reader, repository, logger);

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        await foreach (var job in _reader.ReadAllAsync(stoppingToken))
        {
            _logger.LogError(
                "Dead-letter job {Id} — {Attempts} attempts, last error: {Error}",
                job.Id, job.Attempts, job.LastError);

            // Persist for manual inspection / requeue tooling
            await _repository.SaveAsync(job, stoppingToken);
        }
    }
}

The TryWrite call on the dead-letter channel when it is full — and the explicit log of the loss — is the honest design choice in a bounded in-process queue. Unlike a durable message broker, there is no guaranteed delivery mechanism. If both the main channel and the dead-letter channel are full simultaneously, the job is lost and the only record is a log entry. This is the correct trade-off to acknowledge in code and in documentation: in-process channels trade durability for simplicity and speed. Teams with zero-loss requirements for failed jobs should persist to a database or route to a durable broker at the dead-letter stage, not rely on the in-process dead-letter channel as their only failure record.

Deduplication & Observability: Knowing What the Queue Is Doing

Two production concerns that are easy to defer until they become incidents: deduplication — preventing the same job from being enqueued multiple times when a retry-heavy producer fires repeatedly — and observability — knowing in real time how full the channel is, how many jobs have been processed, dropped, or dead-lettered, and whether consumers are keeping up. Neither requires complex infrastructure — both can be implemented against the Channel<T> API directly.

DedupAndObservability.cs — Idempotent Enqueue & Channel Metrics
// ════════════════════════════════════════════════════════════════════════════
// DEDUPLICATION: idempotent enqueue with in-memory tracking set
// ════════════════════════════════════════════════════════════════════════════

// Suitable when: the dedup window fits in memory, and approximate dedup is
// acceptable (the tracking set is lost on restart).
// For exact dedup across restarts: check a Redis SET or database unique index
// before enqueuing, and remove the entry after successful processing.

public sealed class DeduplicatingJobQueue
{
    private readonly ChannelWriter         _writer;
    private readonly ConcurrentDictionary _inflight;
    private readonly ILogger _logger;

    public DeduplicatingJobQueue(
        ChannelWriter         writer,
        ILogger logger)
    {
        _writer   = writer;
        _inflight = new ConcurrentDictionary(StringComparer.Ordinal);
        _logger   = logger;
    }

    public async ValueTask TryEnqueueAsync(
        JobItem job,
        CancellationToken ct = default)
    {
        // Dedup key: use a natural business key (e.g., "report:user-123:2026-02-27")
        // rather than the random Job.Id — Id-based dedup would never deduplicate.
        var dedupKey = job.Payload; // replace with job.BusinessKey in real usage

        if (!_inflight.TryAdd(dedupKey, 0))
        {
            _logger.LogDebug("Job {Key} already queued — duplicate suppressed.", dedupKey);
            return false;   // duplicate — not enqueued
        }

        try
        {
            await _writer.WriteAsync(job, ct);
            return true;
        }
        catch
        {
            // Enqueue failed — remove from tracking so it can be retried
            _inflight.TryRemove(dedupKey, out _);
            throw;
        }
    }

    // Called by the consumer after successful processing:
    public void MarkComplete(string dedupKey)
        => _inflight.TryRemove(dedupKey, out _);
}


// ════════════════════════════════════════════════════════════════════════════
// OBSERVABILITY: channel metrics via IHostedService + Channel.Reader.Count
// ════════════════════════════════════════════════════════════════════════════

public class ChannelMetricsWorker : BackgroundService
{
    private readonly ChannelReader      _reader;
    private readonly IMeterFactory               _meterFactory;
    private readonly ILogger _logger;

    // OpenTelemetry / System.Diagnostics.Metrics instruments
    private readonly ObservableGauge  _depthGauge;
    private readonly Counter         _processedCounter;
    private readonly Counter         _droppedCounter;

    public ChannelMetricsWorker(
        ChannelReader      reader,
        IMeterFactory               meterFactory,
        ILogger logger)
    {
        _reader       = reader;
        _meterFactory = meterFactory;
        _logger       = logger;

        var meter = meterFactory.Create("MyApp.JobQueue");

        // Observable gauge: current depth — sampled when metrics are collected
        _depthGauge = meter.CreateObservableGauge(
            name:        "job_queue_depth",
            observeValue: () => _reader.Count,
            unit:        "{jobs}",
            description: "Number of jobs currently waiting in the channel buffer.");

        _processedCounter = meter.CreateCounter(
            name:        "job_queue_processed_total",
            unit:        "{jobs}",
            description: "Total jobs successfully processed.");

        _droppedCounter = meter.CreateCounter(
            name:        "job_queue_dropped_total",
            unit:        "{jobs}",
            description: "Total jobs dropped due to full channel or dead-letter overflow.");
    }

    // Increment these from the consumer and producer respectively:
    public void RecordProcessed() => _processedCounter.Add(1);
    public void RecordDropped()   => _droppedCounter.Add(1);

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        // Periodic depth logging as a fallback when metrics scraping is unavailable
        using var timer = new PeriodicTimer(TimeSpan.FromSeconds(30));
        while (await timer.WaitForNextTickAsync(stoppingToken))
        {
            var depth = _reader.Count;
            if (depth > 0)
            {
                _logger.LogInformation("Job queue depth: {Depth} items pending.", depth);
            }
        }
    }
}

// ── Alerting thresholds (configure in your metrics / alerting platform) ───
//
// job_queue_depth > capacity * 0.80  → Warning: consumers falling behind
// job_queue_depth == capacity        → Critical: channel at limit, backpressure active
// job_queue_dropped_total rate > 0   → Warning: jobs being dropped (DropWrite mode)
// job_queue_processed_total rate → 0 → Critical: consumers have stopped processing

The Channel.Reader.Count property is the single most useful production signal for a bounded channel — it tells you at any instant how many items are waiting to be processed. An ObservableGauge over this value, scraped by Prometheus or emitted to OpenTelemetry, gives you a depth time series that shows queue buildup during consumer slowdowns, drain patterns after they recover, and the steady-state operating depth under normal load. Set an alert threshold at 80% of channel capacity — not 100%, because by the time a channel hits 100% it is already applying backpressure to producers, and you want to know before that happens so you can scale consumers or investigate the slowdown.

What Developers Want to Know

When should I use System.Threading.Channels instead of a message broker like RabbitMQ or Azure Service Bus?

System.Threading.Channels is an in-process queue — it lives in your application's memory and disappears when the process restarts. Use it when work is generated and consumed within the same process, durability across restarts is not required, and you need the lowest possible latency with zero infrastructure dependencies. Use a message broker when work must survive process restarts, producers and consumers run in separate services, or you need guaranteed at-least-once delivery with acknowledgement semantics. Many production systems use both — a Channel as an in-process buffer between the broker consumer and the processing pipeline.

What happens to jobs in a bounded Channel when the application shuts down?

They are lost unless you implement drain logic in your BackgroundService. A Channel<T> is an in-memory structure — when the process terminates, all items in the buffer that have not been processed are gone. To minimise loss on graceful shutdown, stop enqueuing new items (close the channel writer) when the stopping token fires, then continue processing items already in the buffer until it is empty or the shutdown timeout expires. For work that absolutely cannot be lost across restarts, a durable message broker with acknowledgement semantics is the correct infrastructure.

What is the difference between BoundedChannelFullMode.Wait and DropWrite?

Wait blocks the producer — WriteAsync suspends until a consumer frees a slot, applying backpressure up the call stack. This is correct when the producer can safely slow down and losing work is unacceptable. DropWrite discards the incoming item immediately and returns without blocking — correct for best-effort buffers like telemetry where dropping under load is preferable to blocking. DropOldest is a third option that removes the oldest queued item to make room for the new one — useful when newer work supersedes older work, such as live price tick updates.

How do I prevent a single failing job from blocking all consumers?

Wrap every job execution in a try/catch inside the consumer loop — never let an exception escape to the ReadAllAsync loop, which would terminate the consumer. Catch failures at the per-job level, track attempt counts in the job envelope, and route jobs that exceed the retry threshold to a dead-letter channel. The consumer continues to the next item immediately. Without this isolation, a single poison job retried indefinitely starves all other jobs in the queue behind it.

How do I size the bounded channel capacity for my workload?

Start with: burst rate per second × burst duration in seconds × 2 for headroom. If your producer can burst at 50 jobs/second for up to 10 seconds and your consumers process 40 jobs/second, the burst adds 100 items — a capacity of 200 absorbs it with headroom. Monitor Channel.Reader.Count in production via an ObservableGauge. Consistently at 80%+ capacity means consumers cannot keep up — increase concurrency or investigate processing slowdowns. Consistently near zero means the capacity limit has no practical effect on the workload.

Back to Articles