Modern C# offers tools that eliminate heap allocations, slice buffers without copying, and process data with CPU vector instructions. If you're building parsers, network protocols, or data pipelines where performance matters, understanding Span<T>, Memory<T>, and System.IO.Pipelines changes what's possible. This isn't theoretical—by the end you'll have built a production-ready log ingestor that processes NDJSON streams with zero-allocation parsing, pooled buffers, and optional SIMD acceleration. We'll cover safety rules the compiler enforces, benchmarking with real numbers, and the production checklist that keeps allocations predictable.
This guide targets .NET 8 LTS with callouts for .NET 9 improvements where relevant. You'll see complete runnable code, not snippets. Each section builds toward the final ingestor, but includes focused micro-examples you can test in isolation. Whether you're optimizing hot paths identified by profiling or architecting high-throughput services from scratch, these patterns deliver measurable results.
Overview & Safety Model
High-performance C# revolves around avoiding heap allocations in hot loops. The garbage collector is fast, but allocating millions of short-lived objects creates pressure. Span<T> solves this by representing contiguous memory—arrays, stack buffers, or native memory—without allocation overhead. It's a ref struct, meaning it lives only on the stack and can't be boxed or stored in fields of regular classes.
🎯
Zero-Copy Slicing
Slice arrays, strings, and buffers without allocating new objects. Span wraps existing memory with length tracking.
🔒
Compiler Safety
ref struct lifetime rules prevent escaping references. You can't accidentally store stack pointers beyond valid scope.
⚡
Pooled Buffers
ArrayPool recycles arrays across requests. Pair with Memory for async-friendly buffer management.
🚀
SIMD Acceleration
Process multiple elements per instruction with Vector. Works across x64, ARM, and other architectures.
Safety Rules You Need to Know
The compiler enforces strict lifetime rules for ref struct types like Span<T>. You can't return them from async methods, store them in heap-allocated objects, or box them into interfaces. This prevents dangling pointers and use-after-free bugs at compile time. If you need to cross async boundaries, use Memory<T> instead—it's a regular struct that wraps a managed reference and works everywhere.
Span Lifetime Rules
// ✅ Valid: Span in synchronous method
public void ProcessSync(ReadOnlySpan data)
{
foreach (var b in data)
{
Console.Write(b);
}
}
// ❌ Invalid: Can't use Span across await
public async Task ProcessAsync(ReadOnlySpan data)
{
await Task.Delay(100); // Compiler error CS4033
// data no longer valid after await
}
// ✅ Valid: Memory works with async
public async Task ProcessAsyncCorrect(ReadOnlyMemory data)
{
await Task.Delay(100);
ProcessSync(data.Span); // Convert to Span only when needed
}
Why this restriction? Async methods compile to state machines stored on the heap. If Span<T> pointed to stack memory and you awaited, the stack frame could unwind, leaving a dangling pointer. Memory<T> holds a safe reference that survives across continuations.
GC Interaction Span<T> can wrap managed arrays, and the GC won't move them while the Span is in scope because ref struct lifetime guarantees they're stack-bound. For unmanaged memory, use Span<T> with pointers obtained from Marshal.AllocHGlobal or similar—just ensure the memory outlives the Span.
Span & ReadOnlySpan Fundamentals
Span<T> gives you a mutable view over contiguous memory. ReadOnlySpan<T> is the read-only variant. Both support slicing—extracting a subsection without copying bytes. This is huge for parsers: you can slice a line from a buffer, then slice tokens from that line, all without allocations.
Creating Spans
Span Creation Patterns
// From array
int[] numbers = { 1, 2, 3, 4, 5 };
Span span = numbers.AsSpan();
// From array with range
Span slice = numbers.AsSpan(1, 3); // [2, 3, 4]
// From stack (covered in stackalloc section)
Span buffer = stackalloc byte[256];
// From string (ReadOnlySpan)
ReadOnlySpan text = "Hello, World!".AsSpan();
ReadOnlySpan word = text.Slice(0, 5); // "Hello"
Notice AsSpan() returns a view—the original array isn't copied. Changes through the span mutate the underlying array. For strings, you get ReadOnlySpan<char> because strings are immutable.
Slicing Operations
Zero-Copy Slicing
ReadOnlySpan log = "2025-01-13T10:15:30 INFO User logged in".AsSpan();
// Extract date without allocation
ReadOnlySpan date = log.Slice(0, 10); // "2025-01-13"
// Extract level (starts at index 20, length 4)
ReadOnlySpan level = log.Slice(20, 4); // "INFO"
// Extract message (from index 25 to end)
ReadOnlySpan message = log.Slice(25); // "User logged in"
// Use range syntax (.NET 8+)
ReadOnlySpan dateRange = log[..10]; // Same as Slice(0, 10)
ReadOnlySpan messageRange = log[25..]; // Same as Slice(25)
Each slice operation returns a new Span that points into the same memory. No string allocations, no copying. This pattern scales to megabyte buffers with the same constant-time performance.
Common Pitfalls
Span Can't Escape Method Scope You can't store Span<T> in a class field or return it from a method that might outlive the underlying buffer. The compiler blocks this. If you need longer-lived references, convert to Memory<T> or copy the data to a regular array with .ToArray().
Don't Do This
public class Parser
{
private Span _buffer; // ❌ Compiler error CS8345
// Span can't be a field
}
// ✅ Use Memory for fields
public class Parser
{
private Memory _buffer; // Valid
}
Practical Example: Tokenizing CSV
CSV Tokenizer with Span
public static List ParseCsvLine(ReadOnlySpan line)
{
var tokens = new List();
int start = 0;
for (int i = 0; i < line.Length; i++)
{
if (line[i] == ',')
{
tokens.Add(line.Slice(start, i - start).ToString());
start = i + 1;
}
}
// Add final token
if (start < line.Length)
{
tokens.Add(line.Slice(start).ToString());
}
return tokens;
}
// Usage
ReadOnlySpan csv = "Alice,30,Engineer".AsSpan();
var fields = ParseCsvLine(csv);
// ["Alice", "30", "Engineer"]
We slice without allocating intermediate strings until we call .ToString() on the final tokens. For high-throughput scenarios, you'd avoid even those allocations by processing spans directly instead of converting to strings.
Performance Tip Many BCL methods now accept ReadOnlySpan<char> overloads: int.Parse(), Guid.Parse(), DateTime.Parse(), etc. Use these instead of converting spans to strings first.
Memory<T> & Pooling with IMemoryOwner
When you need to store buffer references beyond a single method or pass them through async calls, Memory<T> is the answer. It's a regular struct that wraps a reference to managed memory—think of it as a safe, GC-friendly pointer. You convert it to Span<T> only when you need to read or write.
Memory<T> Basics
Memory Usage Pattern
public class BufferManager
{
private Memory _buffer;
public void AllocateBuffer(int size)
{
_buffer = new byte[size];
}
public async Task ProcessAsync()
{
await FillBufferAsync(_buffer);
// Convert to Span when doing actual work
Span span = _buffer.Span;
for (int i = 0; i < span.Length; i++)
{
span[i] = (byte)(span[i] ^ 0xFF); // Process data
}
}
private async Task FillBufferAsync(Memory buffer)
{
await Task.Delay(10); // Simulate I/O
Random.Shared.NextBytes(buffer.Span);
}
}
Memory<T> stores the reference safely across the await boundary. You access .Span when you need direct memory manipulation, which is always synchronous and stack-confined.
ArrayPool for Zero-Allocation Buffer Reuse
Allocating buffers repeatedly hammers the garbage collector. ArrayPool<T>.Shared maintains a pool of reusable arrays. You rent an array, use it, then return it. This is critical for high-throughput services handling thousands of requests per second.
ArrayPool Pattern
using System.Buffers;
public void ProcessData()
{
byte[] buffer = ArrayPool.Shared.Rent(4096);
try
{
Span span = buffer.AsSpan(0, 4096);
// Use span for processing
FillWithData(span);
ProcessBuffer(span);
}
finally
{
ArrayPool.Shared.Return(buffer);
}
}
private void FillWithData(Span buffer)
{
for (int i = 0; i < buffer.Length; i++)
{
buffer[i] = (byte)(i % 256);
}
}
private void ProcessBuffer(ReadOnlySpan buffer)
{
int sum = 0;
foreach (var b in buffer)
{
sum += b;
}
Console.WriteLine($"Sum: {sum}");
}
Always use try-finally to ensure you return the buffer even if an exception occurs. Failing to return leaks pooled arrays, eventually exhausting the pool and forcing new allocations.
IMemoryOwner for Automatic Cleanup
IMemoryOwner with using Statement
using System.Buffers;
public void ProcessWithOwner()
{
using IMemoryOwner owner = MemoryPool.Shared.Rent(4096);
Memory memory = owner.Memory.Slice(0, 4096);
// Use memory - it's automatically returned when disposed
Span span = memory.Span;
Random.Shared.NextBytes(span);
Console.WriteLine($"Processed {span.Length} bytes");
} // owner.Dispose() returns buffer to pool
IMemoryOwner<T> implements IDisposable, so you can use using statements for automatic cleanup. This is cleaner than manual try-finally blocks and pairs well with async methods.
Don't Store Pooled References Never keep references to rented arrays beyond the scope where you called Return or Dispose. The pool may hand that same array to another caller, causing data corruption. If you need the data longer, copy it to a new array or keep the IMemoryOwner alive.
Choosing Between Approaches
Span<T>: Synchronous methods, hot loops, no async
Memory<T>: Async methods, storing references, passing buffers around
ArrayPool: Reusable buffers with manual lifetime management
IMemoryOwner: Reusable buffers with automatic disposal
stackalloc & ref struct Patterns
For small, short-lived buffers, stackalloc allocates directly on the stack with zero GC overhead. It's blazing fast but limited by stack size (typically 1 MB per thread). Use it for buffers under 1 KB in methods that aren't deeply recursive.
Basic stackalloc Usage
Stack-Allocated Buffer
public static string ToHexString(ReadOnlySpan bytes)
{
// Small buffer on stack (512 chars max for 256 bytes)
Span buffer = stackalloc char[bytes.Length * 2];
for (int i = 0; i < bytes.Length; i++)
{
buffer[i * 2] = GetHexChar(bytes[i] >> 4);
buffer[i * 2 + 1] = GetHexChar(bytes[i] & 0xF);
}
return new string(buffer);
}
private static char GetHexChar(int value) =>
value < 10 ? (char)('0' + value) : (char)('A' + value - 10);
The buffer vanishes when the method returns. No allocation, no GC, no overhead. Perfect for temporary work buffers in tight loops.
Dynamic Threshold Pattern
What if the size varies? Use a threshold: stackalloc for small sizes, ArrayPool for large ones.
Threshold-Based Allocation
using System.Buffers;
public void ProcessDynamicSize(int size)
{
const int StackThreshold = 512;
byte[]? rentedArray = null;
Span buffer = size <= StackThreshold
? stackalloc byte[StackThreshold]
: (rentedArray = ArrayPool.Shared.Rent(size));
try
{
Span slice = buffer.Slice(0, size);
// Process buffer
for (int i = 0; i < slice.Length; i++)
{
slice[i] = (byte)(i % 256);
}
Console.WriteLine($"Processed {slice.Length} bytes");
}
finally
{
if (rentedArray != null)
{
ArrayPool.Shared.Return(rentedArray);
}
}
}
This pattern gives you stack speed for small buffers and pool efficiency for large ones. The threshold of 512 bytes is conservative—you can go higher (up to ~1 KB) if you're confident about call depth.
Custom ref struct Types
You can write your own ref struct types to encapsulate stack-bound logic. Useful for parsers or state machines that must stay stack-allocated.
Custom ref struct
public ref struct SpanParser
{
private ReadOnlySpan _data;
private int _position;
public SpanParser(ReadOnlySpan data)
{
_data = data;
_position = 0;
}
public bool TryReadInt(out int value)
{
value = 0;
int start = _position;
while (_position < _data.Length && char.IsDigit(_data[_position]))
{
_position++;
}
if (_position == start) return false;
return int.TryParse(_data.Slice(start, _position - start), out value);
}
public void SkipWhitespace()
{
while (_position < _data.Length && char.IsWhiteSpace(_data[_position]))
{
_position++;
}
}
}
// Usage
ReadOnlySpan input = "42 99 123".AsSpan();
var parser = new SpanParser(input);
if (parser.TryReadInt(out int first))
{
Console.WriteLine($"First: {first}");
parser.SkipWhitespace();
if (parser.TryReadInt(out int second))
{
Console.WriteLine($"Second: {second}");
}
}
SpanParser is a ref struct so it can hold ReadOnlySpan<char> as a field. It can't escape the stack, which is exactly what we want for a parser that points into a temporary buffer.
.NET 9 Enhancement .NET 9 improves stackalloc with better codegen for initialization patterns. The runtime also expands Span support in more BCL APIs. The patterns here work identically on .NET 8, but .NET 9 might show minor performance improvements in benchmarks.
Parsing with Span<char> & Utf8Parser
String parsing traditionally allocates substring after substring. With ReadOnlySpan<char> and methods like int.TryParse(ReadOnlySpan<char>, out int), you can parse without intermediate allocations. For UTF-8 data, Utf8Parser works directly on byte spans.
Parsing Numbers from Spans
Zero-Allocation Number Parsing
public static bool TryParseLogLine(ReadOnlySpan line, out int level, out DateTime timestamp)
{
level = 0;
timestamp = default;
// Expected format: "2025-01-13T10:15:30 INFO 42 Some message"
// Parse timestamp (first 19 chars)
if (line.Length < 24 || !DateTime.TryParse(line.Slice(0, 19), out timestamp))
{
return false;
}
// Skip to level number (after "INFO ")
int levelStart = 25;
int levelEnd = line.Slice(levelStart).IndexOf(' ');
if (levelEnd == -1) return false;
return int.TryParse(line.Slice(levelStart, levelEnd), out level);
}
// Usage
ReadOnlySpan log = "2025-01-13T10:15:30 INFO 42 Request completed".AsSpan();
if (TryParseLogLine(log, out int level, out DateTime ts))
{
Console.WriteLine($"Level {level} at {ts}");
}
No substring allocations. DateTime.TryParse and int.TryParse both have ReadOnlySpan<char> overloads that parse in place.
UTF-8 Parsing with Utf8Parser
Many protocols (HTTP, JSON, Protobuf) work with UTF-8 bytes. System.Buffers.Text.Utf8Parser parses integers, floats, dates, and guids directly from ReadOnlySpan<byte>.
UTF-8 Byte Parsing
using System.Buffers.Text;
public static bool TryParseUtf8Int(ReadOnlySpan utf8Bytes, out int value)
{
return Utf8Parser.TryParse(utf8Bytes, out value, out int bytesConsumed);
}
// Example: Parse "12345" as UTF-8 bytes
byte[] data = "12345"u8.ToArray();
if (TryParseUtf8Int(data, out int result))
{
Console.WriteLine($"Parsed: {result}"); // 12345
}
// Parse ISO 8601 date
byte[] dateBytes = "2025-01-13T10:15:30Z"u8.ToArray();
if (Utf8Parser.TryParse(dateBytes, out DateTime dt, out _))
{
Console.WriteLine($"Date: {dt:O}");
}
Utf8Parser is faster than converting to string first because it avoids UTF-8 to UTF-16 transcoding. If you're reading from a network stream or file, you already have UTF-8 bytes—parse them directly.
Tokenizing with IndexOf and Slicing
Tokenize CSV with Span
public static void ParseCsvRow(ReadOnlySpan row, Span destination)
{
int fieldIndex = 0;
int start = 0;
while (start < row.Length && fieldIndex < destination.Length)
{
int commaIndex = row.Slice(start).IndexOf(',');
int end = commaIndex == -1 ? row.Length : start + commaIndex;
ReadOnlySpan field = row.Slice(start, end - start);
if (int.TryParse(field, out int value))
{
destination[fieldIndex++] = value;
}
start = end + 1;
}
}
// Usage
ReadOnlySpan csv = "10,20,30,40,50".AsSpan();
Span values = stackalloc int[5];
ParseCsvRow(csv, values);
foreach (var v in values)
{
Console.WriteLine(v);
}
We allocate nothing except the stack buffer for results. For high-frequency parsing (thousands of rows per second), this makes a measurable difference.
Culture-Aware Parsing Use overloads that accept NumberStyles and IFormatProvider when parsing user input. For logs and protocols you control, invariant culture is faster: int.TryParse(span, NumberStyles.Integer, CultureInfo.InvariantCulture, out result).
MemoryMarshal & Unsafe: When and How
System.Runtime.InteropServices.MemoryMarshal and System.Runtime.CompilerServices.Unsafe bypass safety checks for extreme performance. Use them only when profiling proves it's necessary and you understand the risks. Common scenarios: reinterpreting byte arrays as value types, or skipping bounds checks in tight loops.
MemoryMarshal.Cast for Type Reinterpretation
Reinterpret Bytes as Integers
using System.Runtime.InteropServices;
public static void ProcessBinaryData(ReadOnlySpan bytes)
{
// Reinterpret bytes as int32 values (4 bytes per int)
ReadOnlySpan ints = MemoryMarshal.Cast(bytes);
foreach (var value in ints)
{
Console.WriteLine(value);
}
}
// Example: 8 bytes = 2 ints
byte[] data = { 0x01, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00 };
ProcessBinaryData(data);
// Output: 1, 2 (little-endian)
MemoryMarshal.Cast reinterprets memory without copying. It's safe as long as the byte length is a multiple of the target type size. The runtime checks alignment at the time of cast.
Endianness Matters MemoryMarshal.Cast assumes the platform's native byte order. On little-endian systems (x64, ARM), multi-byte values are stored least-significant byte first. If you're reading big-endian data (network protocols), use BinaryPrimitives.ReadInt32BigEndian instead.
When to Use Unsafe
Unsafe methods skip bounds checking and can manipulate raw pointers. Useful in inner loops where bounds checks dominate runtime, but dangerous because you can corrupt memory if you miscalculate indices.
Unsafe Loop Optimization
using System.Runtime.CompilerServices;
public static void UnsafeFillArray(Span span, int value)
{
ref int ptr = ref MemoryMarshal.GetReference(span);
for (int i = 0; i < span.Length; i++)
{
Unsafe.Add(ref ptr, i) = value;
}
}
// Usage
Span buffer = stackalloc int[100];
UnsafeFillArray(buffer, 42);
Console.WriteLine(buffer[50]); // 42
This bypasses array bounds checks. The JIT can also vectorize such loops more aggressively. However, modern JIT compilers (RyuJIT in .NET 8) already eliminate bounds checks in many cases when it can prove safety, so profile before going unsafe.
Practical Example: Summing Integers
Safe Sum with Bounds Checks
public static long SafeSum(ReadOnlySpan values)
{
long sum = 0;
for (int i = 0; i < values.Length; i++)
{
sum += values[i]; // Bounds check on each access
}
return sum;
}
Unsafe Sum (No Bounds Checks)
public static long UnsafeSum(ReadOnlySpan values)
{
long sum = 0;
ref int ptr = ref MemoryMarshal.GetReference(values);
for (int i = 0; i < values.Length; i++)
{
sum += Unsafe.Add(ref ptr, i);
}
return sum;
}
In .NET 8, the JIT often optimizes both versions identically. Benchmark before deciding unsafe is worth the risk. Exceptions: SIMD loops and interop scenarios where you're already working with unmanaged memory.
When Unsafe is Justified Platform interop (P/Invoke with fixed buffers), custom SIMD implementations not covered by Vector<T>, and hot paths identified by profiling where the JIT demonstrably fails to elide bounds checks. Always validate with BenchmarkDotNet before shipping unsafe code.
SIMD Quick Wins with Vectorization
SIMD (Single Instruction, Multiple Data) lets you process multiple elements per CPU instruction. Modern CPUs support 128-bit (SSE), 256-bit (AVX2), and 512-bit (AVX-512) vector operations. .NET exposes these via System.Numerics.Vector<T> (portable) and System.Runtime.Intrinsics (explicit intrinsics).
Portable SIMD with Vector<T>
Start with Vector<T>. It adapts to the CPU's vector width automatically—16 bytes on SSE2, 32 bytes on AVX2, potentially 64 bytes on AVX-512. The JIT generates appropriate instructions.
Portable Vector Sum
using System.Numerics;
public static int VectorSum(ReadOnlySpan values)
{
int vectorSize = Vector.Count; // e.g., 8 on AVX2 (256-bit / 32-bit int)
int sum = 0;
int i = 0;
// Process full vectors
for (; i <= values.Length - vectorSize; i += vectorSize)
{
var vector = new Vector(values.Slice(i, vectorSize));
sum += Vector.Sum(vector);
}
// Process remaining elements
for (; i < values.Length; i++)
{
sum += values[i];
}
return sum;
}
// Usage
int[] data = Enumerable.Range(1, 1000).ToArray();
int total = VectorSum(data);
Console.WriteLine($"Sum: {total}");
The vector loop processes 4–16 integers per iteration (depending on CPU). The scalar tail loop handles leftovers. On AVX2 hardware, this can be 4x faster than the scalar version.
SIMD for Delimiter Scanning
A common pattern: scanning a buffer for a delimiter character (newline, comma, null byte). SIMD can check 16–32 bytes at once.
SIMD Newline Scanner
using System.Numerics;
public static int FindNewline(ReadOnlySpan buffer)
{
byte newline = (byte)'\n';
int vectorSize = Vector.Count; // e.g., 32 on AVX2
for (int i = 0; i <= buffer.Length - vectorSize; i += vectorSize)
{
var vector = new Vector(buffer.Slice(i, vectorSize));
var matches = Vector.Equals(vector, new Vector(newline));
if (matches != Vector.Zero)
{
// Found a match in this vector
for (int j = 0; j < vectorSize; j++)
{
if (buffer[i + j] == newline)
{
return i + j;
}
}
}
}
// Scalar tail
for (int i = buffer.Length - (buffer.Length % vectorSize); i < buffer.Length; i++)
{
if (buffer[i] == newline) return i;
}
return -1;
}
// Test
byte[] data = "Hello\nWorld"u8.ToArray();
int index = FindNewline(data);
Console.WriteLine($"Newline at index: {index}"); // 5
Vector.Equals compares all bytes in one instruction. If any match, we scan that vector for the exact index. This scales to megabyte buffers where scanning byte-by-byte becomes a bottleneck.
Explicit Intrinsics for Control
When you need specific CPU instructions, use System.Runtime.Intrinsics. Guard with IsSupported checks to avoid crashes on older hardware.
Intrinsics with Fallback
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
public static int SumWithIntrinsics(ReadOnlySpan values)
{
if (Avx2.IsSupported && values.Length >= 8)
{
return SumAvx2(values);
}
else if (Sse2.IsSupported && values.Length >= 4)
{
return SumSse2(values);
}
else
{
return SumScalar(values);
}
}
private static int SumAvx2(ReadOnlySpan values)
{
// 256-bit vectors (8 ints)
Vector256 sum256 = Vector256.Zero;
int i = 0;
for (; i <= values.Length - 8; i += 8)
{
var vec = Vector256.Create(values.Slice(i, 8));
sum256 = Avx2.Add(sum256, vec);
}
// Horizontal sum
int sum = 0;
for (int j = 0; j < 8; j++)
{
sum += sum256.GetElement(j);
}
// Tail
for (; i < values.Length; i++)
{
sum += values[i];
}
return sum;
}
private static int SumSse2(ReadOnlySpan values)
{
// 128-bit vectors (4 ints)
Vector128 sum128 = Vector128.Zero;
int i = 0;
for (; i <= values.Length - 4; i += 4)
{
var vec = Vector128.Create(values.Slice(i, 4));
sum128 = Sse2.Add(sum128, vec);
}
int sum = 0;
for (int j = 0; j < 4; j++)
{
sum += sum128.GetElement(j);
}
for (; i < values.Length; i++)
{
sum += values[i];
}
return sum;
}
private static int SumScalar(ReadOnlySpan values)
{
int sum = 0;
foreach (var v in values)
{
sum += v;
}
return sum;
}
This provides AVX2, SSE2, and scalar fallbacks. Ship this and it'll use the best path available at runtime.
Optional (.NET 9+) .NET 9 adds Vector512<T> for AVX-512 on supported hardware. The patterns are identical—just check Avx512F.IsSupported and process 64-byte vectors. Most servers don't have AVX-512 yet, so treat it as an extra fast path, not a requirement.
When SIMD Isn't Worth It
Small arrays (under 100 elements): setup cost dominates
Complex per-element logic: SIMD shines on arithmetic and comparisons, not branching
Profile first. Many "slow" loops are bottlenecked by cache misses or allocations, not instruction throughput. Fix those before reaching for SIMD.
System.IO.Pipelines Primer
System.IO.Pipelines is a high-performance I/O abstraction that decouples reading and writing with built-in backpressure. It's what powers Kestrel's HTTP parser. You get a PipeReader for consuming data and a PipeWriter for producing it, both working with ReadOnlySequence<byte>—a type that handles fragmented buffers.
Why Pipelines?
Traditional streams require you to manage buffers, handle partial reads, and coordinate async cancellation. Pipelines do this automatically: the writer fills buffers, the reader consumes them, and the pipe handles flow control. Perfect for network protocols, file parsing, or any producer-consumer scenario.
Basic PipeReader Example
Reading Lines from a Pipe
using System.IO.Pipelines;
using System.Text;
public static async Task ReadLinesAsync(PipeReader reader)
{
while (true)
{
ReadResult result = await reader.ReadAsync();
ReadOnlySequence buffer = result.Buffer;
// Process lines in buffer
while (TryReadLine(ref buffer, out ReadOnlySequence line))
{
ProcessLine(line);
}
// Tell the pipe how much we consumed
reader.AdvanceTo(buffer.Start, buffer.End);
if (result.IsCompleted) break;
}
await reader.CompleteAsync();
}
private static bool TryReadLine(ref ReadOnlySequence buffer, out ReadOnlySequence line)
{
SequencePosition? position = buffer.PositionOf((byte)'\n');
if (position == null)
{
line = default;
return false;
}
line = buffer.Slice(0, position.Value);
buffer = buffer.Slice(buffer.GetPosition(1, position.Value));
return true;
}
private static void ProcessLine(ReadOnlySequence line)
{
string text = Encoding.UTF8.GetString(line);
Console.WriteLine($"Line: {text}");
}
ReadAsync returns when data is available or the writer completes. AdvanceTo tells the pipe what you consumed and what you examined. This enables intelligent backpressure: if you don't consume data, the writer pauses instead of buffering unbounded memory.
PipeWriter for Producing Data
Writing to a Pipe
public static async Task WriteDataAsync(PipeWriter writer)
{
for (int i = 0; i < 100; i++)
{
// Get memory from the pipe
Memory memory = writer.GetMemory(256);
// Write data
string message = $"Message {i}\n";
int bytesWritten = Encoding.UTF8.GetBytes(message, memory.Span);
// Advance the writer
writer.Advance(bytesWritten);
// Flush periodically
await writer.FlushAsync();
}
await writer.CompleteAsync();
}
GetMemory requests a buffer from the pipe (it may be pooled). You write to it, call Advance to commit the bytes, then FlushAsync to make them available to the reader.
Backpressure in Action
If the reader lags behind, FlushAsync pauses until buffer space frees up. This prevents unbounded memory growth. Contrast with BufferedStream or MemoryStream, which allocate until you run out of memory.
Cancellation Support Pass a CancellationToken to ReadAsync and FlushAsync. When cancelled, the pipe raises OperationCanceledException and cleans up resources. No need for manual cleanup logic in your reader/writer loops.
Creating a Pipe
Pipe Setup
var pipe = new Pipe();
// Start reader and writer concurrently
Task writerTask = WriteDataAsync(pipe.Writer);
Task readerTask = ReadLinesAsync(pipe.Reader);
await Task.WhenAll(writerTask, readerTask);
Pipe creates a producer-consumer channel. Writer and reader run independently, coordinated by the pipe's internal buffer management.
Build the Log/CSV Ingestor
Now we assemble everything into a complete log ingestor. It reads NDJSON (newline-delimited JSON) or CSV from a stream, parses fields with Span<T>, uses ArrayPool for buffers, optionally scans delimiters with SIMD, and exposes metrics via an ASP.NET Core endpoint.
Core Ingestor with Pipelines
LogIngestor.cs
using System.Buffers;
using System.IO.Pipelines;
using System.Text;
public class LogIngestor
{
private long _linesProcessed;
public long LinesProcessed => _linesProcessed;
public async Task IngestAsync(Stream stream, CancellationToken ct = default)
{
var reader = PipeReader.Create(stream);
while (!ct.IsCancellationRequested)
{
ReadResult result = await reader.ReadAsync(ct);
ReadOnlySequence buffer = result.Buffer;
while (TryReadLine(ref buffer, out ReadOnlySequence line))
{
ProcessLine(line);
Interlocked.Increment(ref _linesProcessed);
}
reader.AdvanceTo(buffer.Start, buffer.End);
if (result.IsCompleted) break;
}
await reader.CompleteAsync();
}
private bool TryReadLine(ref ReadOnlySequence buffer, out ReadOnlySequence line)
{
SequencePosition? position = buffer.PositionOf((byte)'\n');
if (position == null)
{
line = default;
return false;
}
line = buffer.Slice(0, position.Value);
buffer = buffer.Slice(buffer.GetPosition(1, position.Value));
return true;
}
private void ProcessLine(ReadOnlySequence line)
{
// Parse CSV: timestamp,level,message
Span buffer = stackalloc byte[(int)Math.Min(line.Length, 512)];
line.CopyTo(buffer);
ReadOnlySpan span = buffer.Slice(0, (int)line.Length);
// Find first comma
int firstComma = span.IndexOf((byte)',');
if (firstComma == -1) return;
// Parse timestamp
ReadOnlySpan timestampBytes = span.Slice(0, firstComma);
// (For demo, skip parsing—just count)
// Find second comma
int secondComma = span.Slice(firstComma + 1).IndexOf((byte)',');
if (secondComma == -1) return;
// Parse level
ReadOnlySpan levelBytes = span.Slice(firstComma + 1, secondComma);
if (Utf8Parser.TryParse(levelBytes, out int level, out _))
{
// Process level (e.g., filter, aggregate)
}
// Message is remainder
ReadOnlySpan message = span.Slice(firstComma + secondComma + 2);
// (Process message as needed)
}
}
This ingestor reads lines via PipeReader, uses stackalloc for small buffers, and parses with Utf8Parser. Zero allocations per line in the common case (lines under 512 bytes).
Adding SIMD Delimiter Scanning
Replace PositionOf with a SIMD scanner for larger files where newline scanning dominates.
SIMD Newline Finder
using System.Numerics;
private bool TryReadLineSIMD(ref ReadOnlySequence buffer, out ReadOnlySequence line)
{
if (buffer.IsSingleSegment)
{
ReadOnlySpan span = buffer.FirstSpan;
int index = FindNewlineSIMD(span);
if (index == -1)
{
line = default;
return false;
}
line = buffer.Slice(0, index);
buffer = buffer.Slice(buffer.GetPosition(index + 1));
return true;
}
// Fallback for multi-segment buffers
SequencePosition? position = buffer.PositionOf((byte)'\n');
if (position == null)
{
line = default;
return false;
}
line = buffer.Slice(0, position.Value);
buffer = buffer.Slice(buffer.GetPosition(1, position.Value));
return true;
}
private int FindNewlineSIMD(ReadOnlySpan span)
{
byte newline = (byte)'\n';
int vectorSize = Vector.Count;
int i = 0;
for (; i <= span.Length - vectorSize; i += vectorSize)
{
var vector = new Vector(span.Slice(i, vectorSize));
var matches = Vector.Equals(vector, new Vector(newline));
if (matches != Vector.Zero)
{
for (int j = 0; j < vectorSize; j++)
{
if (span[i + j] == newline) return i + j;
}
}
}
// Scalar tail
for (; i < span.Length; i++)
{
if (span[i] == newline) return i;
}
return -1;
}
On AVX2 hardware, this scans 32 bytes per iteration instead of 1. For multi-megabyte logs, this cuts parsing time significantly.
Console Runner
Program.cs
using System.Diagnostics;
class Program
{
static async Task Main(string[] args)
{
string filePath = args.Length > 0 ? args[0] : "sample.csv";
var ingestor = new LogIngestor();
var sw = Stopwatch.StartNew();
await using var stream = File.OpenRead(filePath);
await ingestor.IngestAsync(stream);
sw.Stop();
Console.WriteLine($"Processed {ingestor.LinesProcessed:N0} lines in {sw.ElapsedMilliseconds} ms");
Console.WriteLine($"Throughput: {ingestor.LinesProcessed / sw.Elapsed.TotalSeconds:N0} lines/sec");
}
}
Run this on a 1 GB log file and watch it process millions of lines with minimal allocations. Measure with dotnet-counters to verify GC pressure stays low.
Optional: ASP.NET Core Metrics Endpoint
WebHost.cs
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Http;
public static class WebHost
{
public static void AddMetricsEndpoint(WebApplication app, LogIngestor ingestor)
{
app.MapGet("/metrics", () =>
{
return Results.Ok(new
{
LinesProcessed = ingestor.LinesProcessed,
Timestamp = DateTime.UtcNow
});
});
}
}
// In Program.cs (web mode)
var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
var ingestor = new LogIngestor();
// Start ingestor in background
_ = Task.Run(async () =>
{
await using var stream = File.OpenRead("sample.csv");
await ingestor.IngestAsync(stream);
});
WebHost.AddMetricsEndpoint(app, ingestor);
app.Run();
Now you can query GET /metrics to see live ingestion progress. This pattern extends to Prometheus exporters, health checks, or OpenTelemetry spans.
Complete Code A full runnable project with sample data, tests, and benchmarks is available at github.com/dotnet-guide-com/tutorials under csharp-performance/log-ingestor.
Benchmarks & Diagnostics
Speculation about performance is worthless. Measure. BenchmarkDotNet is the standard for micro-benchmarks in .NET. It handles warmup, statistical analysis, and memory diagnostics automatically.
Setting Up BenchmarkDotNet
Install BenchmarkDotNet
dotnet add package BenchmarkDotNet
Benchmark Example
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
[MemoryDiagnoser]
public class ParsingBenchmarks
{
private readonly byte[] _csvLine = "2025-01-13,42,Sample message"u8.ToArray();
[Benchmark(Baseline = true)]
public void ParseWithString()
{
string line = Encoding.UTF8.GetString(_csvLine);
string[] parts = line.Split(',');
int level = int.Parse(parts[1]);
}
[Benchmark]
public void ParseWithSpan()
{
ReadOnlySpan span = _csvLine;
int firstComma = span.IndexOf((byte)',');
int secondComma = span.Slice(firstComma + 1).IndexOf((byte)',');
ReadOnlySpan levelBytes = span.Slice(firstComma + 1, secondComma);
Utf8Parser.TryParse(levelBytes, out int level, out _);
}
}
class Program
{
static void Main(string[] args)
{
BenchmarkRunner.Run();
}
}
Run with dotnet run -c Release. BenchmarkDotNet outputs mean time, allocations, and statistical confidence intervals. Typical results: the Span version is 3–5x faster and allocates zero bytes.
Analyzing Results
Sample BenchmarkDotNet Output
| Method | Mean | Allocated |
|----------------- |---------:|----------:|
| ParseWithString | 245.3 ns | 152 B |
| ParseWithSpan | 48.7 ns | 0 B |
The Span version eliminates 152 bytes of allocations per invocation. Multiply by millions of log lines and you're saving gigabytes of GC pressure.
Low allocation rate (under 10 MB/sec) and infrequent Gen 2 collections indicate healthy memory usage. If you see Gen 2 collections every second, you're allocating too much—revisit your buffer pooling.
This identifies hot methods. If TryReadLine consumes 80% of CPU time, that's your optimization target. Add SIMD scanning there, re-profile, and verify the improvement.
Profiling Workflow 1. Identify slow scenario (e.g., "ingesting 1M lines takes 10 seconds"). 2. Profile with dotnet-trace to find hot methods. 3. Benchmark isolated hot path with BenchmarkDotNet. 4. Optimize (Span, SIMD, pooling). 5. Verify improvement with benchmark and end-to-end test. 6. Ship.
Production Checklist
High-performance code in production requires more than fast benchmarks. Here's what to verify before deploying.
Allocation Hygiene
✅ All hot paths use Span<T> or Memory<T> instead of string operations
✅ ArrayPool buffers are always returned in finally blocks
✅ No accidental boxing (check with [MemoryDiagnoser])
✅ stackalloc sizes stay under 1 KB
✅ No large object heap (LOH) allocations in loops (arrays > 85 KB)
Safety Checks
✅ No Span<T> crossing async boundaries
✅ Unsafe code isolated to reviewed methods with comments justifying use
✅ MemoryMarshal.Cast used only on aligned, size-compatible types
✅ SIMD intrinsics guarded by IsSupported checks with scalar fallbacks
Exception Handling
Don't Throw in Hot Paths Exceptions allocate and unwind the stack. Use TryParse patterns and return error codes instead of throwing in loops that run millions of times. Reserve exceptions for truly exceptional cases (network failures, file corruption).
Error Handling Pattern
// ❌ Throws on every parse failure
public int ParseLevel(ReadOnlySpan bytes)
{
if (!Utf8Parser.TryParse(bytes, out int level, out _))
{
throw new FormatException("Invalid level");
}
return level;
}
// ✅ Returns error indicator
public bool TryParseLevel(ReadOnlySpan bytes, out int level)
{
return Utf8Parser.TryParse(bytes, out level, out _);
}
Culture Invariance
Parsing user input respects CultureInfo.CurrentCulture, but log/protocol parsing should use CultureInfo.InvariantCulture for deterministic behavior across locales.
Invariant Parsing
// Use invariant culture for protocols
if (int.TryParse(span, NumberStyles.Integer, CultureInfo.InvariantCulture, out int value))
{
// Process value
}
Fuzz testing: random/malformed inputs to catch edge cases
Regression benchmarks: CI runs BenchmarkDotNet to catch perf regressions
CI Integration Run BenchmarkDotNet in CI with [SimpleJob] for faster execution. Compare results against baseline. Fail the build if allocations increase or throughput drops by >10%. This catches accidental perf regressions before they ship.
FAQ & Next Steps
When should I use Span<T> vs Memory<T>?
Use Span<T> for synchronous operations within a single method scope—it's a ref struct that can't cross async boundaries. Use Memory<T> when you need to store references across await points or return them from methods, as it's a regular struct that works everywhere. Convert Memory<T> to Span<T> via .Span when you need to process the data.
Is SIMD worth the complexity for most applications?
Only if profiling shows hot loops processing arrays or buffers. Start with Vector<T> for portable code. If that's insufficient, use Vector128/Vector256 with IsSupported guards. Most apps won't need SIMD—focus on reducing allocations first with Span<T> and ArrayPool. SIMD delivers 2–8x speedups in tight numeric loops, but allocation elimination often yields bigger wins.
How do I avoid memory leaks with ArrayPool?
Always pair Rent with Return in a try-finally block. Never store references to pooled arrays beyond the scope where you called Return. Consider using IMemoryOwner<T> from MemoryPool which implements IDisposable for automatic cleanup with using statements.
Can I use System.IO.Pipelines with ASP.NET Core?
Yes. Pipelines power Kestrel internally. You can expose PipeReader from HttpRequest.BodyReader or wrap custom streams. Useful for streaming large uploads, protocol parsers, or real-time data processing endpoints without buffering entire payloads. See the ingestor example for a pattern you can adapt to web endpoints.
What's the largest safe size for stackalloc?
Keep stackalloc buffers under 1 KB (approximately 1024 bytes). Larger allocations risk stack overflow, especially in recursive or deeply nested calls. For dynamic sizes, use ArrayPool<T>.Shared.Rent with a threshold check to decide between stack and pool. The default stack size is 1 MB per thread, but leave headroom for other allocations.
Next Steps
You've built a production-ready log ingestor with zero-allocation parsing, SIMD scanning, and Pipelines. Here's where to go deeper:
Extend the ingestor: Add JSON parsing with System.Text.Json's Utf8JsonReader (also span-based)
Integrate OpenTelemetry: Add spans to track parsing latency and throughput
Distribute processing: Use System.Threading.Channels to parallelize line processing across cores
Add compression: Wrap streams with GZipStream or BrotliStream (both work with PipeReader)