OTLP for error tracking: the missing manual.
OpenTelemetry Protocol carries exceptions. Most teams know this in theory and miss it in practice. This is the end-to-end account: how the data model works, what the SDK actually puts on the wire, where the “issue” abstraction comes from, and what your tooling needs to do with it.
20 seconds. OTLP carries exceptions as span events with the reserved name exception. The attributes are exception.type, exception.message, and exception.stacktrace. An OTel SDK records them when you call span.RecordException(err) or its language equivalent. Most OTLP backends store the span. Errors-first tools group the exception events into issues.
60 seconds. The OpenTelemetry Protocol defines three primary signal types: traces (as resource spans), metrics, and logs. Exceptions sit inside the traces signal, attached to the span that was active when the exception was raised. The exception gets its own event within that span: a timestamp, the reserved name exception, and a set of semconv attributes that carry the type, message, and stringified stack. The span status is typically set to ERROR alongside the event.
Most distributed tracing backends (Jaeger, Tempo, SigNoz) accept this data and display the span event in the trace view. They store it; they do not group it across traces into something that looks like an error tracker “issue.” That grouping step is what separates a tracing backend from an errors-first tool.
urgentry accepts OTLP/HTTP JSON directly at /v1/traces and /v1/logs, performs the grouping in the same binary that handles Sentry-SDK errors, and links each issue back to its parent trace. One ingest path, two signal types, one binary at 52 MB resident.
Why OTLP is the question now
Error tracking used to be a separate channel. You shipped a Sentry SDK or a Rollbar agent. It intercepted uncaught exceptions, serialized them into a proprietary envelope, and sent them to a dedicated error backend. Traces went somewhere else. Logs went somewhere else. The signals were isolated by design.
OpenTelemetry changed the economics of that separation. When a team adopts the OTel SDK to instrument their services for distributed tracing, they gain a signal path that also carries exceptions — without a second SDK, without a second agent, without a second ingest endpoint. The question becomes: can the OTel SDK replace the dedicated error SDK? And if so, what does the tool on the receiving end need to do to turn span events into something that works for error monitoring?
That question is now showing up in engineering discussions regularly enough that it deserves a straight answer. This guide is that answer.
What OTLP actually is: the data model
OTLP is the wire protocol defined by the OpenTelemetry project for exporting telemetry data from instrumented applications to a collector or backend. It carries three signal types:
- Traces — represented as
ResourceSpans. A resource (the service) has one or moreScopeSpans(instrumentation library groups), each containing individualSpanrecords. Each span has a trace ID, span ID, parent span ID, name, start and end timestamps, status, attributes, and events. - Metrics — time series data: counters, gauges, histograms, exponential histograms, and summaries.
- Logs —
ResourceLogscarryingLogRecordentries with timestamps, severity, body, and attributes.
The proto definitions live in opentelemetry-proto. The transport comes in three flavors (more on this below). For error tracking, the signal that matters is traces, because exceptions are modeled as span events within the traces signal.
The key structural point: a Span contains a repeated field called events. Each event has a name, a timestamp, and a set of attributes. The OpenTelemetry semantic conventions define a reserved event name — exception — that carries the exception payload.
How OTel models exceptions: span events and semconv
When an application calls span.RecordException(err) (Go), span.record_exception(exc) (Python), or the equivalent in any OTel SDK, the SDK appends an event to the current span. That event has:
- Name:
exception(the reserved string defined by semconv) - Timestamp: the moment of the call
- Attributes carrying three stable semconv keys:
| Attribute | Type | Description |
|---|---|---|
exception.type |
string | The exception class or type name. E.g. net/http: request canceled in Go, ValueError in Python. |
exception.message |
string | The exception message string. |
exception.stacktrace |
string | A stringified stack trace. Format varies by language; OTel SDKs use language-idiomatic formatting. |
exception.escaped |
boolean | Whether the exception escaped the span’s scope (i.e. the span ended with the exception uncaught). |
In addition to calling RecordException, the convention is to set the span status to ERROR with a description matching the exception message. The semconv exceptions spec makes this explicit.
Here is what the SDK does in Go:
ctx, span := tracer.Start(ctx, "process-payment")
defer span.End()
if err := chargeCard(ctx, order); err != nil {
span.RecordException(err)
span.SetStatus(codes.Error, err.Error())
return err
}
And in Python:
with tracer.start_as_current_span("process-payment") as span:
try:
charge_card(order)
except Exception as exc:
span.record_exception(exc)
span.set_status(StatusCode.ERROR, str(exc))
raise
And in Node.js:
const span = tracer.startSpan("process-payment");
try {
await chargeCard(order);
} catch (err) {
span.recordException(err);
span.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
throw err;
} finally {
span.end();
}
All three patterns produce the same wire payload: a span with status ERROR and an event named exception carrying the three semconv attributes.
What the SDK puts on the wire
The OTLP/HTTP JSON payload for a trace containing one exception event looks like this (abbreviated for clarity):
{
"resourceSpans": [{
"resource": {
"attributes": [
{ "key": "service.name", "value": { "stringValue": "payment-service" } },
{ "key": "service.version", "value": { "stringValue": "2.4.1" } }
]
},
"scopeSpans": [{
"scope": { "name": "go.opentelemetry.io/otel/sdk/trace" },
"spans": [{
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"spanId": "00f067aa0ba902b7",
"name": "process-payment",
"kind": 2,
"startTimeUnixNano": "1716364800000000000",
"endTimeUnixNano": "1716364800142000000",
"status": { "code": 2, "message": "card declined: insufficient funds" },
"events": [{
"name": "exception",
"timeUnixNano": "1716364800098000000",
"attributes": [
{ "key": "exception.type",
"value": { "stringValue": "CardDeclinedError" } },
{ "key": "exception.message",
"value": { "stringValue": "card declined: insufficient funds" } },
{ "key": "exception.stacktrace",
"value": { "stringValue": "goroutine 1 [running]:\nmain.chargeCard(...)\n\t/app/payment.go:42 +0x1a4\n..." } },
{ "key": "exception.escaped",
"value": { "boolValue": true } }
]
}]
}]
}]
}]
}
This is a complete, self-contained record. The backend receives the trace context (trace ID, span ID, parent span ID), the service identity (via service.name on the resource), the span timing and status, and the full exception payload in the event attributes.
The transport variants
OTLP comes in three transports. The choice matters for backend configuration and firewall rules.
OTLP/gRPC
The original transport. Uses HTTP/2 with protobuf bodies. Default port 4317. The endpoint is the gRPC service path: opentelemetry.proto.collector.trace.v1.TraceService/Export. This transport is efficient and supports streaming, but requires HTTP/2 end-to-end — which means most HTTP/1.1 proxies cannot terminate it without explicit gRPC support.
OTLP/HTTP with protobuf
HTTP/1.1 (or HTTP/2) with protobuf-encoded bodies. Default port 4318. Endpoints:
/v1/tracesfor trace data/v1/metricsfor metric data/v1/logsfor log data
Content-Type is application/x-protobuf. This is the transport you see in production where gRPC is blocked by a load balancer or WAF.
OTLP/HTTP with JSON
The same /v1/traces, /v1/metrics, /v1/logs paths as the protobuf variant, but with application/json bodies. The JSON schema mirrors the protobuf message structure. This is the transport used in browser-side SDKs and anywhere you need human-readable payloads for debugging. The payload example above is OTLP/HTTP JSON.
The endpoint convention is the same across both HTTP variants: a base URL (e.g. https://otel.example.com) plus the signal path (/v1/traces). The SDK is configured via the OTEL_EXPORTER_OTLP_ENDPOINT environment variable or programmatically. The SDK appends the signal path automatically unless OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set to the full path.
Setting OTEL_EXPORTER_OTLP_ENDPOINT=https://host/v1/traces causes the SDK to send to https://host/v1/traces/v1/traces. Set the base URL only (https://host) and let the SDK append the path, or use the signal-specific variable OTEL_EXPORTER_OTLP_TRACES_ENDPOINT set to the full path.
Sentry SDK vs OTel SDK: when each wins
The question comes up in every team that already uses OTel for distributed tracing: do we keep the Sentry SDK for errors or drop it and use the OTel SDK alone?
There is no universal answer, but the decision tree is short.
Use the OTel SDK when
- You already run an OTel pipeline and want a single SDK across all signals. Adding a second SDK adds a second dependency, second configuration surface, and second failure mode.
- Your team prioritizes trace context on every error. OTel exceptions are always attached to the active span, so the trace ID is always present on the error record. The Sentry SDK adds trace context only when you have installed the OTel or tracing integration explicitly.
- You ship to an errors-first backend that accepts OTLP natively. The exception event arrives with full semconv attributes; the backend does the grouping.
- You want to avoid vendor lock-in at the instrumentation layer. The OTel SDK is vendor-neutral. Switching backends means changing an endpoint, not rewriting instrumentation.
Use the Sentry SDK when
- You need the richest error UX immediately. The Sentry SDK ships automatic breadcrumb capture (HTTP requests, console logs, user actions), preconfigured session replay integration, release tracking, and user context. The OTel SDK carries none of this by default.
- You use Sentry’s performance monitoring product natively and want the issue-to-trace link to work without configuration. The Sentry SDK wires this automatically.
- Your team is not running an OTel pipeline at all. The Sentry SDK is the lower-friction choice when there is no existing OTel investment.
- You ship mobile apps. The Sentry mobile SDKs (iOS, Android, React Native) carry crash reporting, ANR detection, and native crash symbolication that OTel mobile SDKs do not yet match.
The two are not mutually exclusive. Sentry’s own SDKs now use the OTel API internally for span creation. It is possible to run the Sentry SDK for error capture while using an OTel exporter for trace data, but that complexity is usually a signal to pick one path and commit.
The “issue” abstraction over span exception events
This is where errors-first tools diverge from general tracing backends.
Jaeger, Tempo, and SigNoz all accept OTLP traces with exception events. They store the span. They show the exception event in the trace view. What they do not do is group repeated occurrences of CardDeclinedError across a thousand traces into a single “issue” with an occurrence count, a first-seen timestamp, an assignment owner, and a resolved/regression lifecycle.
That grouping — the “issue” abstraction — is the core product of an error tracker. Building it from span events requires:
- Fingerprinting. Given an exception event, derive a stable key that identifies “the same error.” The most common strategy seeds the fingerprint from
exception.typeplus a stack hash (not the full stacktrace string, which includes line numbers that move across deploys). - Grouping. Merge events that share a fingerprint into an issue record. Track occurrence count, affected users, and affected releases.
- Lifecycle. Mark issues as open, resolved, ignored, or regressed (reopened after a deploy following resolution).
- Trace linkage. For each exception event, preserve the
traceIdfrom the parent span so the issue links to the full distributed trace.
A general tracing backend skips steps 1 through 3. It stores the span exactly as received, indexed for trace lookup, but does not build the issue graph. This is a product decision, not a capability gap — Jaeger’s job is trace search, not error triage.
Where OTel is incomplete for error UX out of the box: the semconv spec defines how to attach exceptions to spans, but it does not define how a backend should group them. Fingerprinting strategy is left to the backend. Two different OTLP backends receiving the same exception events may group them differently or not at all.
Sampling for errors
Sampling is where OTLP error tracking gets complicated.
Head-based sampling
Head-based sampling makes the keep-or-drop decision at the root span, before any downstream spans are created. A 10% head-based sampler drops 90% of traces at the moment the root span starts, which means 90% of exceptions in those traces are never exported.
For errors, this is usually the wrong configuration. A 10% sampler on a service throwing 100 errors per minute means roughly 90 errors per minute are never recorded by any backend.
Tail-based sampling
Tail-based sampling buffers spans until the trace is complete, then applies the sampling decision. This allows the sampler to inspect the full trace before deciding. An error-aware tail sampler keeps any trace containing a span with status ERROR or an exception event at 100% regardless of volume, and applies the rate reduction only to non-error traces.
The OTel Collector’s tailsamplingprocessor implements this. A policy configuration that preserves errors looks like this:
processors:
tail_sampling:
decision_wait: 10s
policies:
- name: keep-errors
type: status_code
status_code: { status_codes: [ERROR] }
- name: keep-exception-events
type: span_count
span_count: { min_spans: 1 }
- name: probabilistic-rest
type: probabilistic
probabilistic: { sampling_percentage: 10 }
The keep-errors policy fires on any trace where at least one span has status ERROR. A more precise policy inspects the span events directly, though the tailsamplingprocessor does not yet expose a policy type for event name matching as of the 0.100 collector release; the status code policy is the pragmatic proxy.
Error-preserving sampling at the SDK level
Some teams handle error preservation at the SDK rather than the Collector. The OTel Go SDK accepts custom samplers. An error-preserving sampler that wraps a probabilistic base sampler:
type errorPreservingSampler struct {
base sdktrace.Sampler
}
func (s errorPreservingSampler) ShouldSample(
p sdktrace.SamplingParameters,
) sdktrace.SamplingResult {
for _, attr := range p.Attributes {
if attr.Key == "error" && attr.Value.AsBool() {
return sdktrace.SamplingResult{
Decision: sdktrace.RecordAndSample,
Tracestate: p.ParentContext.TraceState(),
}
}
}
return s.base.ShouldSample(p)
}
The limitation of SDK-side error sampling: it can only see attributes present at span start. An exception recorded mid-span is not visible to the sampler. Tail-based sampling at the Collector level is the more complete solution.
OTLP ingest in urgentry: one binary, two endpoints
urgentry accepts OTLP/HTTP JSON at two endpoints, handled in the same Go binary that processes Sentry SDK errors:
POST /v1/traces— trace data in theExportTraceServiceRequestJSON envelopePOST /v1/logs— log data in theExportLogsServiceRequestJSON envelope
The OTLP ingest path shares the same fingerprinting and grouping engine as the Sentry SDK envelope path. A span with an exception event arrives at /v1/traces, and urgentry extracts the semconv attributes, fingerprints the exception, and either opens a new issue or increments an existing one — the same result you get from the Sentry SDK path.
Configuring an OTel SDK to send to urgentry:
export OTEL_EXPORTER_OTLP_ENDPOINT=https://urgentry.example.com
export OTEL_EXPORTER_OTLP_PROTOCOL=http/json
export OTEL_SERVICE_NAME=payment-service
The SDK appends /v1/traces automatically. No Collector proxy is needed between the SDK and urgentry. The OTLP endpoint and the Sentry SDK endpoint both write to the same issue store and the same trace index.
This is the difference that matters for teams running mixed instrumentation: if some services use the Sentry SDK and others use the OTel SDK, both signal paths land in the same issue list. Errors from sentry-go and errors from go.opentelemetry.io/otel/sdk/trace are grouped and cross-linked in the same UI without separate pipelines.
urgentry runs at ~52 MB resident memory at 400 events per second. The OTLP ingest path adds no separate process. There is no Collector sidecar, no Relay process, no translator layer between the SDK and the backend.
What this means for tooling choice
The honest comparison of backends that accept OTLP traces:
| Backend | Accepts OTLP | Exception grouping (issues) | Trace linkage from issue | Error UX |
|---|---|---|---|---|
| Jaeger | yes (gRPC + HTTP) | no | n/a | trace-centric |
| Tempo (Grafana) | yes | no (Loki integration for logs) | via exemplars | trace-centric |
| SigNoz | yes | partial (Exceptions tab) | yes | observability-centric |
| urgentry | yes (HTTP/JSON) | yes, same engine as Sentry SDK | yes | errors-first |
SigNoz deserves a note here because it does surface an Exceptions tab that shows exception events grouped by type. It is not the same depth as the issue lifecycle you get from an errors-first tool — assignment, resolution, regression detection, and per-release occurrence curves are not there — but it is closer than Jaeger or Tempo. If your team primarily uses SigNoz for traces and exceptions are secondary, SigNoz is a defensible choice. If error triage is a first-class workflow, the issue lifecycle matters.
The urgentry angle is specific: OTLP ingest as a first-class path in an errors-first tool. You do not give up the issue UX to use the OTel SDK. You get both.
Frequently asked questions
Does OTLP natively carry exceptions?
Yes. The OTel data model defines exceptions as span events with the reserved name exception. The attributes exception.type, exception.message, and exception.stacktrace are part of the stable semantic conventions for exceptions and are supported by every major OTel SDK.
What endpoint does an OTel SDK send traces to?
By default, /v1/traces appended to the base URL configured in OTEL_EXPORTER_OTLP_ENDPOINT. The default port for OTLP/HTTP is 4318. Logs go to /v1/logs on the same base URL. gRPC uses port 4317 with a different path convention.
Should I use the OTel SDK or the Sentry SDK for error tracking?
Use the OTel SDK when you already run an OTel pipeline and want a unified instrumentation path. Use the Sentry SDK when you need automatic breadcrumbs, user context, session replay integration, and rich error UX out of the box. The two can coexist but usually one path is cleaner.
Does tail-based sampling lose errors?
Only if the sampler is error-unaware. A properly configured tail sampler preserves any trace where at least one span has status ERROR or carries an exception event, applying rate reduction only to clean traces. The OTel Collector’s tailsamplingprocessor with a status_code policy handles this.
How does urgentry turn OTLP span exception events into issues?
urgentry ingests OTLP/HTTP JSON at /v1/traces. For each span carrying an exception event, it extracts exception.type and exception.message as the fingerprint seed, groups matching events into an issue record, and links the issue to the parent trace by trace ID. The same grouping engine handles Sentry SDK errors, so issues from both SDK paths appear in the same list.
Sources and further reading
- OpenTelemetry Protocol (OTLP) specification — the canonical spec for transport formats, endpoint conventions, and content types.
- OTel semantic conventions: exceptions on spans — defines
exception.type,exception.message,exception.stacktrace,exception.escaped, and the reserved event nameexception. - opentelemetry-proto — protobuf definitions for
ResourceSpans,Span,Event, and all OTLP message types. - OTel Collector tailsamplingprocessor — policy configuration for tail-based sampling, including
status_codepolicy for error preservation. - OTel Go instrumentation guide —
span.RecordExceptionandspan.SetStatususage. - OTel Python instrumentation guide —
span.record_exceptionandspan.set_statususage. - OTel JavaScript instrumentation guide —
span.recordExceptionandspan.setStatusin Node.js and browser. - urgentry vs Sentry self-hosted comparison — OTLP-native ingest in context of the broader capability comparison.
One binary. One endpoint. Both signal paths.
urgentry accepts OTLP/HTTP JSON at /v1/traces and /v1/logs in the same binary that handles Sentry SDK errors. Change one environment variable. Keep the issue UX.