Guide AI agents & MCP observability ~10 min read Updated May 2, 2026

Self-host Claude Code observability with OTLP.

Claude Code ships a built-in OpenTelemetry exporter. It reads the standard OTEL_EXPORTER_OTLP_* environment variables and emits spans and metrics to any OTLP endpoint. This guide covers how to point that exporter at a self-hosted urgentry instance, what data arrives, and what you can do with it: token cost per session, tool failure patterns, and latency baselines that let you catch regressions before they become problems.

TL;DR

20 seconds. Claude Code has a built-in OTLP exporter controlled by CLAUDE_CODE_ENABLE_TELEMETRY=1 and the standard OTEL_EXPORTER_OTLP_* variables. Set those variables to point at urgentry’s OTLP endpoint on port 4318, start a Claude Code session, and token metrics and tool spans appear in urgentry within seconds. No code changes, no plugins.

60 seconds. Claude Code exports three categories of data: token counts (input and output, per model call), tool call spans (name, duration, success/failure), and session-level activity metrics. These arrive over standard OTLP/HTTP and land in urgentry’s /v1/traces and /v1/metrics endpoints in the same binary that handles error tracking. You get one place to query agent behavior alongside application errors.

The reason teams self-host this rather than send the data to a SaaS observability vendor is straightforward: Claude Code’s tool calls touch your codebase, your shell, and your filesystem. The telemetry that describes those calls is metadata about your work. Self-hosting keeps that metadata on infrastructure you control, under retention policies you set, with no third-party access to the shape of your agent’s activity.

Why you’d want this

Claude Code charges by the token. A multi-hour agent session can burn through a surprising amount of input and output tokens, especially when the agent iterates on complex tasks, retries tool calls, or pulls large files into context. Without observability, you have no visibility into where those tokens go: which tasks are expensive, which tool call patterns drive cost up, or when a session is running longer than the task warrants.

The second problem is tool failure. Claude Code calls tools — bash commands, file reads, editor operations, MCP server calls — and some of those calls fail or stall. A tool that times out or retries repeatedly contributes to both latency and token cost but leaves no trace unless you capture the spans. The OTLP export gives you that trace.

The third problem is privacy. Sending agent telemetry to a SaaS observability vendor means sending metadata about your coding activity to that vendor’s data plane. For teams working on proprietary codebases, unreleased features, or compliance-sensitive projects, that is a data handling decision worth making deliberately rather than by default. Self-hosting the observability backend keeps the data on your infrastructure.

urgentry is a practical fit here because it accepts OTLP/HTTP in the same binary that handles error tracking, runs at around 52 MB resident memory, and can run on a $5 VPS or a laptop. There is no separate collector, no pipeline to maintain, and no vendor to grant access.

What Claude Code exports out of the box

Claude Code’s telemetry feature is off by default. You enable it and configure the endpoint with environment variables. All of them follow either the Anthropic CLAUDE_CODE_* namespace or the standard OpenTelemetry OTEL_* namespace.

The environment variables

Variable Value / example Purpose
CLAUDE_CODE_ENABLE_TELEMETRY 1 Master switch. Telemetry is off unless this is set to 1.
OTEL_METRICS_EXPORTER otlp Tells the OTel SDK to export metrics via OTLP rather than stdout or none.
OTEL_EXPORTER_OTLP_PROTOCOL http/protobuf OTLP transport format. Use http/protobuf or http/json. urgentry accepts both.
OTEL_EXPORTER_OTLP_ENDPOINT http://your-vps:4318 Base URL of the OTLP receiver. The SDK appends /v1/traces and /v1/metrics automatically.
OTEL_EXPORTER_OTLP_HEADERS x-api-key=yourkey Optional auth headers sent with every OTLP request. Use this if your urgentry instance sits behind an auth proxy.
OTEL_RESOURCE_ATTRIBUTES service.name=claude-code,team=infra Resource attributes attached to every span and metric. Use service.name to distinguish sessions or team members.
Endpoint vs. signal-specific endpoint

Set OTEL_EXPORTER_OTLP_ENDPOINT to the base URL only — for example http://your-vps:4318, not http://your-vps:4318/v1/traces. The SDK appends the signal path automatically. Setting the full path causes the SDK to send to /v1/traces/v1/traces, which returns 404.

What arrives in urgentry

Once the variables are set and Claude Code runs a session, three categories of data reach urgentry:

  • Token metrics. Input token count and output token count per model call, in the claude.* metric namespace. These are counters that accumulate over the session. You can query them by time window to see cost per hour or cost per task.
  • Tool call spans. Each tool invocation (bash command, file read, editor operation, MCP server call) appears as a span with a name, start time, duration, and status. Failed tool calls carry an error status.
  • Session metrics. Session-level activity metrics that track the number of active sessions and session duration. These give you a fleet view if you run Claude Code across multiple machines or team members.

The spans land at /v1/traces. The metrics land at /v1/metrics. Both endpoints are in the same urgentry binary; no separate pipeline or collector sits between Claude Code and the backend.

Stand up urgentry on a $5 VPS

The self-hosted error monitoring on a $5 VPS guide covers the full setup. The short version for this use case:

curl -fsSL https://urgentry.com/install.sh | sh
URGENTRY_BASE_URL=https://errors.example.com ./urgentry serve --role=all

urgentry writes a SQLite database on first boot and serves the UI and all ingest endpoints from a single binary. The OTLP endpoint is available at http://your-host:4318 by default, on the same port as any standard OTLP/HTTP receiver.

Put Caddy or nginx in front for TLS:

errors.example.com {
  reverse_proxy localhost:8080
}

The OTLP port (4318) does not need to be public. Claude Code can reach it over an internal network, a VPN, or SSH tunnel. Only the UI port (8080) needs to face the proxy.

For local development, urgentry runs on a laptop without modification. The OTLP endpoint on localhost:4318 is available the moment the binary starts.

Point Claude Code at urgentry

Export these variables before starting a Claude Code session. The exact values depend on whether your urgentry instance is local or remote.

Remote urgentry on a VPS

export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_EXPORTER_OTLP_ENDPOINT=https://errors.example.com
export OTEL_RESOURCE_ATTRIBUTES=service.name=claude-code,env=production

Local urgentry for development

export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_RESOURCE_ATTRIBUTES=service.name=claude-code,env=local

Add these to your shell profile (~/.zshrc or ~/.bashrc) so they persist across sessions. Or place them in a project-level .env file and source it before running Claude Code in that project.

Verify in 30 seconds

Start Claude Code and trigger a single tool call. Ask it to list the files in the current directory or read a file. That generates at least one tool span and one token metric. Open urgentry and check the traces view. The span from the tool call appears within a few seconds of the tool completing.

If no data arrives, check two things: that the OTLP endpoint URL does not include the signal path, and that the port is reachable from the machine running Claude Code. A quick curl -v http://your-host:4318/v1/traces with an empty JSON body will tell you whether the endpoint is up.

What you see

Once data flows, urgentry gives you four views that matter for agent observability.

Token cost per session

The input and output token counters accumulate over a session. Query them by session ID or by time window to see which tasks burned the most tokens. A task that required many retries or pulled large files into context shows up as a spike in input tokens. A task that generated a lot of output (long code rewrites, large commits) shows up in output tokens. The ratio of input to output is also a signal: a high input-to-output ratio suggests the agent spent time reading context; a high output-to-input ratio suggests heavy generation.

Tool call frequency and patterns

Each tool call appears as a named span. You can see which tools Claude Code called most often in a session, in what order, and how long each took. A session that called the bash tool 40 times to run tests shows up distinctly from a session that read 40 files. Those patterns tell you something about how the agent approached the task.

Which tools are slow

Tool call duration is in every span. Sort spans by duration to find the tools that take longest. Long bash tool durations often mean the agent is waiting on test runs or build steps. Long file read durations on large files suggest the context window is filling with content that may not be relevant. Both are candidates for optimization: faster tests, more focused file reads, or task decomposition.

Error and retry patterns

Failed tool calls carry an error status in the span. A sequence of failed bash spans followed by a successful one is a retry pattern. A sequence of failed bash spans with no success is a stuck agent. Both are worth alerting on. The spans give you the raw material to detect these patterns without parsing Claude Code’s terminal output.

Wire alerts on degradation

Metrics without alerts are dashboards you check after the fact. The value of having token and tool data in a real backend is that you can alert on thresholds before a session runs away.

Token cost per task exceeds threshold

Set an alert on cumulative input token count per session. A session that exceeds a threshold is either working on a legitimately large task or stuck in a loop. Either way, you want to know. The threshold depends on your typical task size; start at 2x your median session token count and tune from there.

In urgentry’s alert configuration, target the claude.input_tokens counter grouped by session ID. Alert when the count for a single session exceeds your threshold within a rolling window.

Consecutive tool failures

A single tool failure is normal. Three consecutive failures in the same session suggest the agent is stuck or the environment is broken. Alert on a count of error-status tool spans within a session window. This fires before you would otherwise notice from the terminal output.

Session length spike

Alert on session duration when it exceeds your expected range. A task that should take five minutes of tool calls and stretches to thirty minutes is either stuck or has taken on more scope than intended. Session duration from the session metrics gives you that signal.

The privacy angle

Teams self-host this for a reason that goes beyond the technical setup.

Claude Code’s tool calls operate on your codebase. It reads files, runs commands, writes code, and calls MCP servers. The telemetry that describes those tool calls is metadata about your work: which files were read, which commands ran, how long each step took, where errors occurred. That metadata has shape. Someone looking at the tool call pattern for a session can infer what the agent was working on.

Sending that telemetry to a SaaS observability vendor means sending it to their data plane, under their data retention policies, accessible to their staff and processes. For a personal project, that is probably fine. For a team working on unreleased features, proprietary algorithms, or data in a regulated domain, it is a compliance question worth answering deliberately.

Self-hosting urgentry means the telemetry stays on infrastructure you own. The data retention policy is the one you set. The access controls are the ones you configure. No third party has access to the shape of your agent’s activity.

urgentry is licensed under FSL-1.1-Apache-2.0. The source is available. You can read what the binary does with the data it receives. That audit trail is not available with a closed-source SaaS backend.

What this doesn’t cover yet

Claude Code’s OTLP export covers the metrics and spans described above. It does not yet export reasoning chain spans: the internal steps the model takes to decide which tool to call, how to decompose a task, or why it chose a particular approach. Those would require Anthropic to instrument and export the model’s internal decision process, which is a harder problem and a separate product question.

What this means in practice: you can observe the agent’s external behavior (tool calls, token counts, session timing) but not its internal reasoning. For cost tracking, failure detection, and performance alerting, the external behavior data is the right layer to instrument. For debugging why the agent made a particular sequence of decisions, you still need to read the terminal output or the conversation transcript.

The OTLP spec is the contract here. Anthropic ships what it ships; the data arrives in standard OTLP format; urgentry receives it. When Anthropic adds new spans or metrics to the Claude Code exporter, they will appear in urgentry automatically, because OTLP is a standard wire format with no vendor-specific receiver changes required.

Frequently asked questions

Does Claude Code’s OTLP export work without any code changes?

Yes. Claude Code reads the standard OTEL_EXPORTER_OTLP_* environment variables. Set them before launching Claude Code and it begins emitting spans and metrics to your endpoint. No plugins, no wrappers, no SDK changes are required.

Which OTLP transport does Claude Code use?

Claude Code supports OTLP/HTTP. Set OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf (the default for OTLP/HTTP) or http/json. urgentry accepts both. Point OTEL_EXPORTER_OTLP_ENDPOINT at your urgentry host’s base URL on port 4318.

What metrics does Claude Code export?

Input token counts, output token counts, tool call counts and durations, model latency, and session activity metrics. Spans cover individual tool calls with timing. The metric names follow the claude.* namespace defined by Anthropic.

Do my prompts appear in the telemetry data?

No. Claude Code’s telemetry exports token counts, latency, and structural metadata about tool calls. It does not export prompt text or model output content. Your actual task descriptions and code changes remain local.

Can I run urgentry locally for development and point Claude Code at it?

Yes. urgentry runs on a laptop without any external dependencies. Start it with a local data directory, set OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318, and you have a local observability loop for development sessions. The binary stays resident at around 52 MB.

Sources and further reading

  1. Anthropic Claude Code monitoring documentation — the canonical reference for CLAUDE_CODE_ENABLE_TELEMETRY, the OTEL environment variables Claude Code reads, and the metric and span definitions.
  2. OpenTelemetry metrics data model specification — defines the counter, gauge, and histogram types that Claude Code uses to export token and session metrics.
  3. OpenTelemetry Protocol (OTLP) specification — the wire format spec for /v1/traces and /v1/metrics endpoints, endpoint URL conventions, and content types.
  4. Functional Source License 1.1 (FSL-1.1-Apache-2.0) — the license under which urgentry is distributed. Grants use rights; converts to Apache 2.0 after two years.
  5. urgentry compatibility audit — the published SDK and protocol compatibility matrix, including OTLP/HTTP ingest coverage.
  6. OpenTelemetry Collector documentation — reference for teams that want a Collector in front of urgentry for batching, filtering, or fan-out to multiple backends.

One binary. Local or remote. Full agent telemetry.

urgentry accepts OTLP/HTTP at /v1/traces and /v1/metrics in the same binary that handles error tracking. Set four environment variables before your next Claude Code session and token costs, tool spans, and session timing land in your own backend.