What is error tracking? A 2026 definition.
Error tracking is the practice of automatically capturing exceptions and unhandled errors from running software, grouping them by signature, and surfacing them to the team that owns the code.
20 seconds. Error tracking is a category of software tooling that captures runtime exceptions, groups repeated occurrences of the same exception into a single issue, and alerts the responsible team. It is distinct from logging (which records arbitrary events), APM (which measures performance), and uptime monitoring (which checks reachability). The core artifacts are the issue, the stack trace, and the alert.
60 seconds. An error tracker works in two parts: an SDK in your application and a backend that receives events. When an exception escapes a handled scope, the SDK serializes the exception type, message, stack trace, breadcrumbs, and user context into an event and sends it to the backend. The backend derives a fingerprint from the event, finds or creates a matching issue, increments the occurrence count, and fires an alert if this is the first occurrence or a regression after a resolved issue. The team sees one grouped issue regardless of whether the exception fired once or fifty thousand times. Resolution closes the issue; a subsequent occurrence after the fix reopens it as a regression.
The definition
Error tracking is the practice of automatically capturing exceptions and unhandled errors from running software, grouping them by signature, and surfacing them to the team that owns the code.
The three components of that definition matter equally. Capturing without grouping produces an undifferentiated flood of duplicate events. Grouping without surfacing means the data exists but no one acts on it. All three together produce the workflow that makes error tracking useful: one issue per problem, visible to the right team, with enough context to fix it.
Error tracking became a product category in the early 2010s, when Sentry (founded 2012) and Rollbar (founded 2012) established the SDK-plus-backend model that every subsequent tool has followed. The core abstraction has not changed: SDK in the app, envelope on the wire, issue in the UI. What changed in the 2020s was where error tracking sits in the broader observability stack.
What error tracking is not
The category is clearest when compared to the adjacent tools it is often confused with.
Not logging
Logging records arbitrary events as text or structured records: a request came in, a cache was missed, a query took 40ms. Error tracking is logging's narrower, alert-driven cousin. A log line that says ERROR: connection timeout goes into a log store. An error tracking event for the same failure goes into an issue tracker, gets a fingerprint, increments a counter, and fires an alert. Logs answer "what happened?" Error tracking answers "what broke, and did it break before?"
The operational difference: you have to search logs to find a problem. Error tracking tells you about the problem without a search. That is the alerting contract that distinguishes the two.
Not APM
APM (application performance monitoring) measures how long things take and how many resources they consume: p99 latency, throughput, CPU, memory. It answers "is the service healthy?" Error tracking measures failure: which exceptions occur, how often, and in which release. APM tells you a service is slow. Error tracking tells you it is throwing a NullPointerException on line 42 of PaymentService.java.
In 2026 many products combine both signals. That does not make them the same thing. They answer different questions and produce different artifacts. APM produces dashboards and latency histograms. Error tracking produces issues with stack traces and assignment workflows.
Not uptime monitoring
Uptime monitoring checks that a service responds: it sends an HTTP request to a health endpoint every 30 seconds and pages if it does not get a 200. It does not know why the service failed or what exception it threw on the way down. Error tracking captures what fails inside the service, before (and sometimes without) the service becoming unreachable. A service can throw ten thousand CardDeclinedError exceptions per hour and still return 200 from its health endpoint.
The four things every error tracker does
Implementations differ in quality and depth, but every error tracker performs the same four operations.
- Capture. An SDK in your application intercepts uncaught exceptions and unhandled errors. It serializes the exception into an event: type, message, stack trace, the breadcrumbs that preceded it, user and session context, and the release identifier. It sends the event to the backend over HTTP.
- Group. The backend receives the event and derives a fingerprint: a stable key that identifies "the same error." Two events with the same fingerprint become the same issue. The backend increments the occurrence count on the existing issue rather than creating a new one. This is what separates an error tracker from a raw event store.
- Alert. When an issue is created for the first time, the backend fires a notification: email, Slack, PagerDuty, or whatever the team has configured. Alerts also fire on regressions: when an issue that was resolved reappears after a deploy. The alert is the mechanism that closes the loop between failure and awareness.
- Resolve. An issue has a lifecycle. Once a team fixes the underlying bug and deploys a release, they mark the issue resolved. The error tracker watches subsequent events. If the same fingerprint appears again in a later release, it reopens the issue and alerts again. This regression detection is the feature that makes error tracking durable rather than a one-time lookup.
What gets captured in an error event
A well-formed error event carries everything a developer needs to reproduce and fix the issue without a follow-up investigation.
- Exception type and message. The class or type name of the exception and its message string. These form the primary human-readable identifier of the issue:
CardDeclinedError: insufficient fundsorTypeError: Cannot read properties of undefined (reading 'id'). - Stack trace. The call frames at the moment the exception was thrown, from the origin frame down to the frame that caught or reported it. The stack is the primary diagnostic artifact: it tells you what code ran, in what order, and where it failed.
- Breadcrumbs. A ring buffer of structured log-like events that occurred before the exception: HTTP requests made, database queries executed, user actions taken. Breadcrumbs answer "what was the app doing right before this happened?"
- User and session context. The user ID, email, or anonymous session identifier that was active when the exception occurred. This lets you answer "how many users hit this?" and "can I reproduce it by logging in as this user?"
- Release and environment. The version string of the deployed release and the environment (production, staging, preview). Release context powers regression detection and lets you plot "this exception appeared in v2.4.1 and was gone by v2.4.2."
How grouping works
Grouping is the technically hardest part of building an error tracker, and it is what separates a useful tool from a raw event dump.
The core technique is fingerprinting: deriving a stable key from the event that identifies "the same error." Most implementations seed the fingerprint from two inputs: the exception type name and a hash of the stack trace shape. The stack trace shape is the set of file paths and function names in the stack, without line numbers. Line numbers change as code is edited between releases; file paths and function names are more stable. A fingerprint built on the full raw stack string would create a new issue every time a line number shifted, which defeats the purpose of grouping.
The difference between a new error and the 50,000th occurrence of a known one: both fire through the same ingest path. If the fingerprint matches an existing issue, the backend increments the occurrence counter and updates the last-seen timestamp. No new issue is created. If the fingerprint is new, the backend creates a new issue and fires the first-occurrence alert. The issue list shows one entry regardless of occurrence volume, which is what makes triage tractable.
Regression detection works by attaching release context to every occurrence. When a team marks an issue resolved, the tracker records the current release version. If the same fingerprint appears in a subsequent release, the issue reopens and a regression alert fires. This is distinct from the issue never being resolved: a resolved issue that regresses is a signal that the fix was incomplete or reverted.
The 2020s vs 2026 difference
The 2020s framing of error tracking treated it as a standalone vertical: a dedicated SDK, a dedicated backend, a dedicated UI. Error events traveled over a proprietary envelope (Sentry's envelope format, Rollbar's API). Traces lived elsewhere (Jaeger, Zipkin). Logs lived elsewhere (Elasticsearch, Loki). The three signals were isolated by product boundary.
The 2026 framing treats error tracking as one signal alongside OpenTelemetry traces and logs, often in the same product surface. OpenTelemetry Protocol (OTLP) models exceptions as span events with stable semantic conventions: exception.type, exception.message, exception.stacktrace. A team that already runs an OTel pipeline for distributed tracing gets an exception signal path without a second SDK. Error-first backends that accept OTLP (like urgentry) perform the same fingerprinting and issue grouping on OTLP exception events that they perform on Sentry-protocol events.
The practical implication: the question in 2026 is not "should we have an error tracker?" It is "does our error tracker sit alongside or inside our observability pipeline?" The answer affects SDK choice, ingest architecture, and which backend you run.
How error tracking fits in observability
The three-signal model of observability (traces, metrics, logs) has a quiet fourth signal: exceptions. Error tracking is the product built around that fourth signal.
In the signal hierarchy, each serves a different role:
- Errors are the alerting signal. An exception fires an alert. That alert triggers the investigation. Without error tracking, the team finds out about failures from user reports, uptime monitors, or log searches.
- Traces are the investigation signal. Once alerted, a developer follows the distributed trace from the failing request through every service it touched. The trace shows where latency accumulated and where errors propagated.
- Logs are the context signal. Logs fill in what the trace does not capture: the exact database query that failed, the config value that was loaded, the third-party API response body.
The OTel exception event sits at the intersection of all three. It is an event on a span (trace context), carrying structured attributes (log-like context), describing a failure (error tracking subject). A backend that accepts OTLP and performs issue grouping on exception events gives you error tracking and trace context in one ingest path.
When you need it
Any production service that serves users needs error tracking. The cost is low: a small SDK dependency and the bandwidth to send events. The cost of not having it is high: failures discovered by users, not engineers.
The answer is "always" for anything customer-facing. Internal tools, admin panels, batch processors, and background workers are worth tracking too, but the priority is clear: if a user can trigger the code path, you need to know when it fails before they tell you.
Event volume is lower than log volume by orders of magnitude. A service handling a million requests per day might generate tens of thousands of errors per day in a degraded state, and far fewer in normal operation. The bandwidth and storage costs of error tracking are small relative to full request logging.
The build-vs-buy-vs-self-host question
Rolling your own error tracker is a common impulse and a consistently bad decision. The capture layer is easy: send an HTTP request with a serialized exception. The grouping layer is where homegrown tools fail. Stack trace fingerprinting has edge cases: minified JavaScript, dynamically generated code, recursive call stacks, language-specific frame formats. Teams that build their own grouping spend months getting it wrong before giving up.
The Sentry SDK is the de facto standard instrumentation layer. It exists for every major language and runtime, handles the edge cases, and sends a well-specified envelope format. The question of what receives that envelope is separate from the instrumentation question.
The SaaS-vs-self-host decision comes down to data residency and cost, not capability. Sentry's SaaS product and a self-hosted alternative built on the same SDK receive the same events and provide the same grouping quality. The reasons to self-host are: keeping error data inside your infrastructure boundary, removing per-event pricing, and running on hardware you control. urgentry is a single Go binary that accepts the Sentry SDK envelope format and OTLP exception events, runs at 52 MB resident memory at 400 events per second, and requires no external services. It is the self-hosted path that does not require a Kubernetes cluster to operate.
Frequently asked questions
What's the difference between error tracking and APM?
APM measures latency, throughput, and resource utilization. Error tracking measures failure: which exceptions occur, how often, and in which release. APM tells you a service is slow. Error tracking tells you it is throwing a specific exception in a specific function. In 2026 many products surface both signals, but the underlying questions are different.
Do I need error tracking if I have logs?
Yes. Logs require you to know a problem happened before you can search for it. Error tracking alerts you on first occurrence without a search. It also groups repeated occurrences of the same exception into one issue, which no log query does automatically. Logs and error tracking answer different questions and complement each other.
What is a DSN?
A DSN (Data Source Name) is the URL the SDK uses to send error events to the backend. It encodes the ingest endpoint, the project identifier, and an authentication key in a single string, typically in the form https://<key>@<host>/<project_id>. You set it once in SDK initialization. The Sentry SDK format is the standard; urgentry uses the same format.
What is fingerprinting?
Fingerprinting is the process of deriving a stable key from an error event so that repeated occurrences of the same exception collapse into one issue. Most implementations seed the fingerprint from the exception type and a hash of the stack trace shape (file paths and function names, without line numbers). Two events with the same fingerprint are the same issue.
Is error tracking expensive?
The SDK cost is a small runtime dependency. Event volume is far lower than log volume: most services generate tens of thousands of error events per day, not billions. SaaS pricing scales with volume; self-hosted tools like urgentry run on a single binary with no external dependencies. The infrastructure cost of self-hosting is low enough to run on a $5 VPS for modest event volumes.
Sources
- OTel semantic conventions: exceptions on spans — defines
exception.type,exception.message,exception.stacktrace,exception.escaped, and the reserved span event nameexception. - OpenTelemetry Protocol specification — the canonical reference for OTLP transport formats and endpoint conventions.
- Sentry: Issue grouping and fingerprints — Sentry’s documentation of their fingerprinting algorithm and grouping strategies.
- OWASP Top Ten — error handling and logging failures appear as a recurring category; error tracking is the operational control that catches exploitation attempts before they become incidents.
- Functional Source License (FSL-1.1-Apache-2.0) — the license under which urgentry is distributed.
- urgentry compatibility matrix — supported SDK versions, OTLP protocol versions, and envelope format compatibility.
Self-hosted error tracking. One binary.
urgentry accepts the Sentry SDK envelope format and OTLP exception events in the same binary. No external services. No per-event pricing. Change one environment variable and start receiving events.