Tail-based sampling for errors.
Head-based sampling makes the keep-or-drop decision before the trace completes. For errors, that decision is always wrong: the root span cannot know whether a downstream span will throw. Tail-based sampling waits until the trace is done, inspects the full picture, and keeps every trace that contains an error. This guide covers how both approaches work, the math that makes head-based sampling dangerous for error tracking, and how to configure the OTel Collector to keep 100% of error traces without storing 100% of everything.
20 seconds. Head-based sampling decides at trace root whether to keep a trace. A 10% sampler drops 90% of all traces, including 90% of error traces. Tail-based sampling buffers spans, waits for the trace to complete, then decides. An error-aware tail sampler keeps 100% of error traces and applies the volume reduction only to clean ones.
60 seconds. Most services cannot keep every trace at full volume. A service at 1,000 requests per second generates too many spans to store indefinitely. Head-based sampling is the cheap solution: flip a coin at the root span and propagate the decision downstream via the W3C trace context. Cheap works well for latency analysis, where a random sample is statistically representative. It is the wrong tool for errors, because errors are not random. An error on a payment span in a 20-span trace is invisible to the root span. A 10% head-based sampler discards that payment trace 90% of the time.
Tail-based sampling moves the decision to the end of the trace. The OTel Collector buffers every span for a configurable window, checks whether any span in the trace has status ERROR or carries an exception event, and keeps the full trace if it does. The cost is memory on the Collector: all in-flight spans sit in a buffer until the decision fires. The benefit is complete error coverage. The recommended architecture pairs a load-balancing front tier (to route all spans for a trace to the same Collector instance) with a stateful back tier that runs the tail sampler. urgentry receives the sampled output and groups exception events into issues.
The sampling decision problem
Every distributed tracing system eventually confronts the same constraint: you cannot keep everything. A service handling 500 requests per second, each producing a trace with 15 spans, generates 7,500 spans per second. At 1 KB per span (a conservative estimate including attributes and events), that is 7.5 MB per second of trace data. Over 24 hours that is 648 GB. Most teams cannot store 648 GB of traces per service per day.
The answer is sampling: keep a representative fraction. The question is how to choose which fraction to keep. Keep too little and you lose signal. Keep too much and you exceed your storage budget. Keep the wrong subset and you answer statistical questions about performance while missing the one trace that explains tonight’s incident.
For performance analysis, a random sample works. The median latency of a random 5% sample approximates the median latency of 100%. For error tracking, a random sample fails completely. Errors are rare events. A 5% sample of a service that errors on 0.1% of requests will contain an error trace approximately every 200,000 requests in the sample. That error trace may never appear. The sampling strategy has to change.
Head-based sampling
Head-based sampling makes the keep-or-drop decision at the root span, the moment a trace begins. The decision propagates downstream via the W3C traceparent header. Every service in the call graph reads the sampling flag from the incoming header and either records its spans or discards them. No span is ever created for a dropped trace. No data crosses the network to the Collector.
This is why head-based sampling is cheap. The dropped work happens at instrumentation time, before any serialization or network call. A 10% probabilistic sampler pays 10% of the export cost compared to keeping everything. The OTel SDK’s ParentBased sampler implements this: it reads the incoming sampling decision from the parent context and propagates it, deferring to a root sampler (such as TraceIdRatioBased) when there is no parent.
The fundamental flaw for error tracking: the root span does not know what any downstream span will do. A payment service root span starts processing a checkout request. Downstream, a card validation call hits a gateway timeout. The root span, which made the sampling decision 200 ms earlier, has no visibility into that failure. It either kept the trace (if the coin flip said yes) or dropped it (if the coin flip said no), independent of what happened downstream.
Head-based sampling is the right tool for performance percentile analysis. It is the wrong tool when every error matters.
Tail-based sampling
Tail-based sampling defers the decision until after the trace is complete. A component (the OTel Collector, in the standard architecture) receives spans as they are exported, groups them by trace ID, and holds them in memory. When the last span for a trace arrives, or when a timeout fires, the sampler applies its policies against the full set of spans and makes the keep-or-drop decision.
The OTel Collector’s tail_sampling processor implements this. The configuration parameter decision_wait controls how long the processor waits before treating a trace as complete. A 10-second decision_wait means every span is held in memory for up to 10 seconds. For a service at 1,000 spans per second, that buffer holds up to 10,000 spans at steady state.
The costs are real. Memory on the Collector grows with span volume and decision_wait duration. Latency increases because no span reaches the backend until the decision fires. A span exported 1 second after the trace starts will not appear in the backend for at least another 9 seconds if decision_wait is 10 seconds. The Collector becomes stateful, which means scaling it requires routing by trace ID rather than round-robin load balancing.
The benefit is complete trace visibility at decision time. Every policy runs against the full trace, including the span that errored at the tail end of a 20-hop call chain.
Why head-based sampling is wrong for errors
The math is direct. If a service has a 2% error rate and you run a 10% head-based sampler, the sampler keeps a random 10% of traces. The error rate in the kept set is still 2%, but you have discarded 90% of the traces. Of every 100 error traces your service produces, the sampler keeps 10.
For a service generating 1,000 requests per minute with a 2% error rate, that is 20 error traces per minute. The head-based sampler keeps 2 of them. The other 18 are gone before any backend sees them. You lose the specific spans, the exception stack traces, the request attributes, and the trace context that links the error to the call that triggered it. Your error tracker shows 2 occurrences of the issue. The actual occurrence count is 20.
This is not a theoretical problem. It surfaces in practice as error counts that do not match application metrics, incidents that are harder to reproduce because the incriminating traces were dropped, and regression alarms that fire too late because the error frequency appears lower than it is.
The problem compounds for low-frequency errors. A bug that hits 0.01% of requests generates 10 error traces per 100,000 requests. A 10% head-based sampler keeps 1 of them. On a service at 100 requests per second, you see one error trace every 100,000 seconds, or about once per 28 hours. The bug is real and frequent; the sampler makes it look like a ghost.
What tail-based sampling gives you for errors
An error-aware tail sampler applies two policies:
- Keep any trace where at least one span has status ERROR or carries an exception event. Keep rate: 100%.
- Apply a probabilistic rate to all other traces. Keep rate: configurable, typically 1–10%.
The result: every error trace reaches the backend. Every exception span, with its full stack trace, attributes, and trace context, is available for grouping and investigation. The volume reduction applies to clean traces, which are statistically representative even at low sample rates.
For the service above (1,000 requests per minute, 2% error rate), a tail sampler with a 5% fallback for non-error traces keeps 20 error traces per minute (100%) and 49 clean traces per minute (5% of 980). Total trace volume drops 95% for clean traces while error coverage stays complete.
The clean trace sample still serves latency analysis. The p50/p99 latency of a 5% random sample of non-error traces is statistically close to the p50/p99 of all non-error traces. You lose precision at extreme percentiles with very small samples, but for most teams the latency signal from non-error traces is secondary to having complete error coverage.
How to configure it in the OpenTelemetry Collector
The tail_sampling processor ships in the OTel Collector Contrib distribution. A configuration that preserves all error traces and samples 5% of non-error traces:
processors:
tail_sampling:
decision_wait: 10s
num_traces: 100000
expected_new_traces_per_sec: 1000
policies:
- name: keep-errors
type: status_code
status_code:
status_codes: [ERROR]
- name: keep-slow-traces
type: latency
latency:
threshold_ms: 2000
- name: keep-flagged-traces
type: string_attribute
string_attribute:
key: sampling.priority
values: ["1"]
- name: probabilistic-fallback
type: probabilistic
probabilistic:
sampling_percentage: 5
The policies evaluate in order. A trace that matches any policy is kept. The processor checks keep-errors first: if any span in the trace has status ERROR, the trace is kept at 100%. keep-slow-traces catches latency outliers that carry no error flag but represent performance incidents worth investigating. keep-flagged-traces is a manual escape hatch for SDK-side annotations (set sampling.priority=1 on a span to force it through). probabilistic-fallback keeps 5% of everything that passed the first three policies without matching.
The num_traces parameter caps how many distinct traces the processor holds in memory at once. When this limit is reached, the oldest traces are evicted and a decision is forced. Set this to a value that comfortably exceeds your peak concurrent trace count (concurrent request count times average trace duration in seconds).
Connect the processor in a pipeline:
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [otlphttp]
Place memory_limiter before tail_sampling. If the Collector runs out of memory, memory_limiter starts dropping incoming spans before the buffer fills. This is the safer failure mode: drop new spans rather than crash the Collector and lose all buffered spans simultaneously.
The operational cost nobody mentions
Tail-based sampling shifts cost from storage to the Collector. The tradeoffs deserve an honest account.
Memory grows with buffer size. At a 10-second decision_wait, every in-flight span for every active trace sits in the Collector’s heap. At 200 bytes per span and 10,000 spans per second, that is 2 MB per second of new data, with up to 10 seconds of accumulation before decisions fire. Steady-state buffer: 20 MB. Add attribute payloads, stack traces in exception events, and longer trace durations, and the real number for a busy service can reach 500 MB to several GB on the Collector.
The decision_wait timer is a trade-off. A longer decision_wait gives stragglers more time to arrive before the decision fires, which reduces false drops (traces that are evicted as complete but have one span still in transit). A longer wait also increases Collector memory and delays data landing in the backend. 10 seconds is a common default. Services with async background jobs that open a trace on a web request and close child spans minutes later will need a much longer window or a separate pipeline.
Tail-based sampling requires a dedicated Collector tier. When a service runs on multiple instances, spans from the same trace arrive at different Collector instances under round-robin load balancing. The tail sampler on instance A sees the root span; instance B sees the error span. Neither has the full trace and neither can make a correct decision.
The solution is two tiers. The front tier receives spans from all service instances and uses the OTel Collector’s loadbalancingexporter to route spans to the back tier by trace ID. All spans sharing a trace ID land on the same back-tier instance. The back tier runs tail_sampling with the full trace visible.
# Front-tier collector: routes by trace ID, no sampling
exporters:
loadbalancing:
protocol:
otlp:
tls:
insecure: true
resolver:
static:
hostnames:
- tail-collector-1:4317
- tail-collector-2:4317
- tail-collector-3:4317
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter]
exporters: [loadbalancing]
The back-tier instances run the full tail_sampling config from the previous section. This two-tier pattern is standard for any multi-instance service. It adds operational complexity (two Collector deployments to maintain) but is the only correct architecture for distributed tail sampling.
When tail-based sampling is overkill
Not every service needs this complexity.
Low-volume services. If a service generates fewer than 5,000 spans per minute, the storage cost of keeping 100% of traces is likely small enough to skip sampling entirely. A backend like urgentry at 52 MB resident under 400 events per second can accept full trace volume from several low-traffic services without sampling. The operational overhead of a two-tier Collector deployment outweighs the benefit when the raw data fits in the budget.
Debug and staging environments. Pre-production environments have intentionally low traffic. Keep everything. The traces that exist in staging are usually the ones you want to inspect in full, and the storage cost is trivial. Introducing a sampler in staging also risks filtering out exactly the test cases your team is actively investigating.
Error-only backends. If the backend only stores spans with status ERROR (discarding non-error spans entirely at ingest), the benefit of tail-based sampling shrinks. The problem tail-based sampling solves is the loss of error traces due to head-based sampling. If the backend already keeps all error spans and ignores clean ones, the sampling architecture is less critical.
Where this fits with urgentry
urgentry sits at the end of the pipeline: it receives the output of whatever sampling strategy the operator runs upstream. The sampling decision belongs to the Collector, not to urgentry.
This separation is intentional. urgentry’s job is to ingest spans, extract exception events, fingerprint them, and group them into issues. It accepts OTLP/HTTP JSON at /v1/traces. What arrives there is what was forwarded by the Collector. If the Collector uses head-based sampling at 10%, urgentry sees 10% of error traces. If the Collector uses an error-aware tail sampler, urgentry sees 100% of error traces and the configured fraction of clean ones.
The recommended setup for teams that care about complete error coverage:
- OTel SDK exports spans to a front-tier Collector via OTLP/gRPC or OTLP/HTTP.
- The front tier routes by trace ID using
loadbalancingexporterto a back-tier Collector fleet. - The back tier runs
tail_samplingwith astatus_code: ERRORpolicy at 100% and aprobabilisticfallback at 5–10%. - The back tier exports kept traces to urgentry via OTLP/HTTP.
Configure the back-tier exporter to point at urgentry:
exporters:
otlphttp:
endpoint: https://urgentry.example.com
headers:
X-Sentry-Auth: "Sentry sentry_key=your-dsn-key"
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [otlphttp]
urgentry groups span exception events into issues using the same fingerprinting engine as the Sentry SDK path. A span with status ERROR and an exception event arrives at /v1/traces; urgentry extracts exception.type and exception.message, fingerprints the stack, and either opens a new issue or increments an existing one. The trace ID from the parent span links the issue back to the full trace.
For teams already using the Sentry SDK on some services and OTLP on others, both signal paths land in the same urgentry issue list. The tail sampler handles the OTel traces path; the Sentry SDK path sends directly to urgentry without a Collector in the middle.
Frequently asked questions
Does tail-based sampling lose any errors?
Only if the sampler is error-unaware. A properly configured tail sampler keeps every trace that contains at least one span with status ERROR at 100%, regardless of overall volume. Rate reduction applies to clean traces only. The risk of losing errors with a tail sampler comes from misconfiguration (no error policy) or from a decision_wait that expires before the error span arrives.
How much memory does the tail_sampling processor use?
Memory scales with span volume and decision_wait duration. At 10 seconds and 1,000 spans per second, the buffer holds roughly 10,000 spans. At 200–400 bytes per span (without large exception stacktraces), that is 2–4 MB. Services with long stacktraces in exception events, large attribute payloads, or high fan-out traces will see higher per-span sizes. Set the memory_limiter processor ahead of tail_sampling to cap Collector memory use.
Can I run tail-based sampling in the OTel SDK instead of the Collector?
The OTel SDK sampler runs at span start, before any downstream spans exist. It cannot see whether a later span in the trace will carry status ERROR. Tail-based sampling requires a component that buffers the complete trace before deciding. That is the Collector’s role. An SDK-side error sampler can preserve traces where the error is known at the root span, but it cannot catch errors that arise deeper in the call chain.
Do I need a separate Collector tier for tail-based sampling?
For any service that runs on more than one instance, yes. Spans from the same trace arrive at different Collector instances under round-robin load balancing. The tail sampler needs the full trace on one instance to make a correct decision. The loadbalancingexporter in a front-tier Collector routes spans by trace ID to the back tier, ensuring all spans for a trace land on the same instance.
Where does urgentry fit in a tail-sampling pipeline?
urgentry is the backend that receives the sampled output. The OTel Collector decides which traces to keep; urgentry ingests what the Collector forwards at /v1/traces via OTLP/HTTP. urgentry extracts exception events, fingerprints them, and groups them into issues. The sampling decision stays in the Collector. urgentry runs the same process whether the Collector uses head-based or tail-based sampling — the difference is how many error traces arrive.
Sources
- OTel Collector tail_sampling processor — reference configuration, policy types, and
decision_waitsemantics. - OpenTelemetry sampling specification — the canonical definitions of head-based and tail-based sampling, sampler interfaces, and W3C traceparent propagation.
- OTel Collector loadbalancingexporter — trace-ID-based routing for the front-tier Collector in a two-tier tail-sampling architecture.
- OTel Collector configuration reference — pipeline assembly, processor ordering, and
memory_limiterconfiguration. - Honeycomb “Tail Sampling” — a detailed engineering post on the operational costs and trade-offs of tail-based sampling in production, including memory budgeting and the two-tier architecture.
The tail sampler keeps the errors. urgentry groups them.
urgentry accepts the sampled output of your OTel Collector pipeline at /v1/traces via OTLP/HTTP. Exception events become issues. Traces link back from every issue. One Go binary at 52 MB resident.