Logs vs traces vs errors: when to use which.
Three signals, three jobs. Errors tell you something broke and needs attention. Traces tell you where in the system a request went and how long each step took. Logs tell you what your code said it was doing at a specific moment. Confusing them leads to slow incident response, wrong tooling choices, and alert fatigue. This guide draws the lines clearly.
Errors are grouped, deduplicated records of unexpected failures with a lifecycle (open, resolved, regressed) and an occurrence count. Use them for alerting and triage.
Traces are end-to-end records of a single request flowing through your system, composed of spans that carry timing, attributes, and status across every service hop. Use them to answer “where did the time go?” and “which service caused this?”
Logs are timestamped records of what your code did at a specific moment, carrying a severity level, a message body, and optional structured attributes. Use them for ad-hoc context that spans did not capture.
The working model: errors are the alerting signal, traces are the investigation signal, logs are the long-tail context. No single signal replaces the other two.
The one-sentence definitions
Three signals, three sentences:
- An error is a record of an unexpected thing that happened and that you want to know about, grouped with every other occurrence of the same failure.
- A trace is a record of a request flowing through your system, composed of spans that show what each service did and how long it took.
- A log is a timestamped record of what your code did, emitted by the application at the moment the code ran.
Each definition points to a distinct workflow. Errors go to an error tracker. Traces go to a trace backend. Logs go to a log store or a structured query interface. The confusion arises because modern tooling blurs these lines: OTLP carries all three, structured logs can look like errors, and a span carrying an exception event is technically both a trace span and an error record. The signal definitions still hold. The tooling just puts them on the same wire.
When you want errors
Reach for an error signal when any of the following are true:
- A user-facing operation failed. A 500 response, a checkout that did not complete, a file upload that timed out.
- An exception bubbled up to a handler. Uncaught exceptions, caught-and-logged errors that represent unexpected states, panics in Go, unhandled promise rejections in JavaScript.
- A background job failed. A queue consumer crashed mid-processing, a cron job exited with a non-zero status, a retry budget was exhausted.
- You need to alert on a failure condition. Errors drive alerting because they have an occurrence count and a rate. "This error fired 50 times in the last 5 minutes" is a threshold you can set. "This log line appeared 50 times" requires log aggregation and a query, which is a slower path to a page.
The key property of an error: it gets grouped. Every occurrence of CardDeclinedError across a thousand requests becomes one issue with an occurrence count, a first-seen timestamp, and an owner. A log line or a trace span cannot give you that on its own. Errors also have a lifecycle: open, resolved, regressed. That lifecycle makes error tracking a triage workflow, not a search workflow.
When you want traces
Reach for a trace when any of the following are true:
- A request is slow and you do not know which service or span is responsible. The trace waterfall shows you exactly where time went: a 2-second database query in a 2.1-second request is visible in one view.
- You are investigating a failure in a distributed system and need to follow a request across service boundaries. A trace carries a single trace ID through every service in the call graph, so you can reconstruct the full path even when the request touched six services.
- You need to answer a systemic question: which endpoint has the highest p99 latency, which database queries run most often, which downstream service causes the most errors. Trace data aggregated across requests gives you these answers.
- You want to understand what happened before and after an error. The trace that contains an error span carries every other span in the same request, giving you the full context that the error record alone cannot provide.
The key property of traces: they are request-scoped and causally ordered. Every span shares a trace ID and carries a parent span ID. A log message that says "database query took 800 ms" is useful. A span that says the same thing, linked to the HTTP handler that called it, linked to the root span that started the request, is investigable.
When you want logs
Reach for logs when any of the following are true:
- You need context that was not captured as a span attribute. A span tells you the query took 800 ms; a log line tells you the query was a full-table scan because an index was dropped in the last migration.
- The application emitted an informational message you want to query later. A startup message, a configuration value that was loaded, a cache hit rate logged every 60 seconds. These are not failures and not requests; they are statements about what the application did.
- You want a detailed internal narrative of what a piece of code did, at a granularity that would be too noisy to model as spans. A function that loops 200 times may log each iteration for debugging purposes. Modeling each iteration as a span would produce 200 spans per request, which is expensive and hard to read in a trace viewer.
- A system component does not emit traces. Legacy services, third-party libraries, operating system components, and system services emit logs without spans. Logs are the only signal available.
The key property of logs: they are free-form and cheap to emit. No context propagation is required, no span lifecycle to manage. A log line is a write to a file or a structured record appended to a stream. That simplicity is the strength and the limitation.
Where they overlap
The three signals are distinct but structurally entangled in modern observability tooling.
An error captured in a trace is both an error and a span event. When an OTel SDK calls span.RecordException(err), it appends an event named exception to the current span and sets the span status to ERROR. An errors-first tool receives the trace, extracts the exception event, and groups it into an issue. The same record is a span to a trace backend and an issue occurrence to an error tracker.
A structured log record with severity ERROR overlaps with errors semantically but not functionally. A severity: ERROR log record is queryable as an error-class event, but it lacks fingerprinting, grouping, and a lifecycle. It answers "how many ERROR-level lines did this service emit?" not "how many times did this failure class occur and is it getting worse?"
The OTel exception event is structurally a log record on a span. Span events and log records share the same structure: a timestamp, a name or body, and attributes. They differ in transport (events travel with the trace; log records travel the log signal path) and in how backends treat them. A backend that ingests both can correlate them via trace ID.
These overlaps are why a single OTel SDK can feed all three signal paths, and why a backend that accepts OTLP can extract error semantics from trace data without a separate SDK.
The relationship between the three
In production incident response, the three signals play different roles in sequence:
- Errors are the alerting signal. An alert fires because an error rate crossed a threshold, or because a new error class appeared. The error gives you the exception type, message, stack trace, and occurrence count. You know what broke and how often.
- Traces are the investigation signal. From the error, you navigate to the trace that contains it. The trace shows you the full request: which services were involved, which span carried the error, what happened before and after. You identify the root cause by following the causal chain backward.
- Logs are the long-tail context. When the trace alone does not answer the question, you reach for logs. The span tells you a database call failed; the log line from the database client tells you the specific query, the connection pool state, and the retry count that preceded the failure.
This sequence is the working model. Errors get you to the problem fast. Traces get you to the cause. Logs fill the gaps that spans did not instrument. The signals are complementary, not competing.
A worked example: a 500 in production
A payment service returns a 500 to a checkout request at 14:32 UTC.
The error fires first. Your error tracker shows a new issue: DatabaseConnectionError: connection pool exhausted, 47 occurrences in 10 minutes, all from payment-service, first seen 11 minutes ago. Stack trace points to payment/db/pool.go:214. You know what broke and when.
The trace comes second. From the issue, you open the trace. The waterfall shows: HTTP handler (2 ms) called the payment processor span (2,140 ms, status ERROR). The payment processor attempted four database calls, each timing out at 500 ms. You know the failure was in the payment processor's database layer, not upstream.
The logs come last. The trace tells you the pool exhausted but not why. You query logs for the 10 minutes before the incident and find: "connection pool size reduced from 20 to 5 due to config reload" at 14:21. A configuration change 11 minutes earlier caused the failure. The trace showed you where. The log showed you why.
Common misconceptions
1. “Logs are enough”
For a single-process application with low traffic, logs often are enough. You can grep for ERROR, read the stack trace, and understand the failure. At scale and in distributed systems, logs fail in three ways.
At scale and in distributed systems, logs fail in three ways. First, they do not carry causal context across service boundaries. Reconstructing a call chain across 10 services from correlated log lines requires significant tooling. Traces carry that structure natively. Second, log volume grows linearly with traffic. A distributed system logging at INFO produces terabytes per day before any errors occur. Third, logs do not group failures. You cannot know "this class of failure occurred 1,200 times this week" without building that aggregation yourself. Error tracking builds it for you.
2. “Traces replace logs”
Traces record what happened on the instrumented request path. They do not capture what your instrumentation did not anticipate. Ad-hoc messages, internal state, and third-party library output all live outside the span model. When a library logs "retrying connection: attempt 3 of 5" and that retry saved the request, that message is not on any span. It is in the logs. Teams with thorough instrumentation find traces answer 80% of investigation questions. The remaining 20% requires logs.
3. “Errors are just severity-ERROR logs”
A log record with level: ERROR is a message. An error in an error tracker is a grouped, deduplicated, lifecycle-managed issue. You cannot assign a log line to an engineer or mark it resolved and expect it to reopen on regression. Log aggregation tools can approximate this with enough configuration, but fingerprinting, grouping heuristics, and regression detection are not built into any log store by default. Error tracking treats each failure class as a first-class entity with state and ownership. Log storage treats each record as a row in a time-series store. The workflows they support are different.
Where urgentry sits
urgentry's primary product surface is errors. The UI is built around the issue list: grouped failures, occurrence counts, stack traces, and the trace link from each issue to the request that produced it. This is intentional. Errors are the signal that drives triage. They are the right place to start.
At the ingest layer, urgentry accepts all three signals in the same Go binary. The Sentry SDK envelope format delivers errors directly. OTLP traces arrive at /v1/traces: urgentry extracts exception events, fingerprints them, and groups them into issues alongside SDK-delivered errors. OTLP logs arrive at /v1/logs and are available for context queries linked to the issues and traces they accompany.
The binary at 52 MB resident handles all three signal paths without a collector sidecar, relay process, or translator layer. One endpoint, one process, one issue list.
The recommendation for teams starting with urgentry: instrument errors first. Get alerts working. Then add OTLP traces to gain the investigation layer. Then enable OTLP log forwarding for the services where logs provide context that spans do not. The signals build on each other; you do not need all three on day one.
Frequently asked questions
Are errors just logs with severity ERROR?
No. A log record with severity ERROR is a line of text with a timestamp and a level field. An error in an error tracker has a fingerprint, a deduplicated occurrence count, a lifecycle (open, resolved, regressed), and an assignment owner. The two share a name but serve different workflows. You can build error-like aggregations on top of a log store, but the grouping heuristics, fingerprinting logic, and regression detection are not present by default.
Do traces replace logs?
No. Traces record what happened on the instrumented request path. Logs record ad-hoc messages that were not anticipated at instrumentation time. A trace tells you that a database call took 800 ms. A log line tells you the query planner chose a full-table scan because an index was missing. Teams with thorough instrumentation find traces answer 80% of investigation questions. The remaining 20% requires logs.
Can one signal be enough on its own?
For simple single-process services at low volume, logs can be enough. For distributed systems, no single signal covers all three use cases: alerting on failures (errors), investigating slowness or root cause (traces), and capturing ad-hoc context (logs). The signals are complementary and build on each other in the incident response sequence.
What is the relationship between an OTel exception event and an error?
An OTel exception event is a span event named exception carrying the attributes exception.type, exception.message, and exception.stacktrace. It travels over the traces signal path as part of the span that was active when the exception occurred. An errors-first tool extracts that event, fingerprints it, and groups repeated occurrences into an issue. The exception event is how the error travels on the wire. The issue is the product abstraction built on top of it.
How does urgentry handle all three signals?
urgentry accepts errors via the Sentry SDK envelope format and accepts OTLP traces and logs natively at /v1/traces and /v1/logs in the same Go binary. Errors are the primary product surface. Traces link back from every issue. Logs are available for ad-hoc context queries. No separate pipeline or sidecar process is required.
Sources
- OpenTelemetry signals overview — the canonical definitions of traces, metrics, logs, and the baggage signal type from the OTel project.
- OTel semantic conventions: exceptions on spans — defines the
exceptionevent name,exception.type,exception.message,exception.stacktrace, andexception.escaped. - OTel log data model — the specification for log records, severity levels, and the relationship between log records and span events.
- Charity Majors — “The Many Faces of Observability” — a field perspective on when each signal type is the right tool and why structured events matter more than log volume.
- urgentry compatibility matrix — Sentry SDK versions, OTLP endpoint behavior, and signal support by urgentry release.
Errors first. Traces on click. Logs for context.
urgentry accepts errors via the Sentry SDK, OTLP traces at /v1/traces, and OTLP logs at /v1/logs in the same Go binary. Start with errors. Add traces when you need the investigation layer. One binary at 52 MB resident.