MCP server observability: latency, errors, tool usage.
An MCP server sits between an LLM agent and the tools that agent calls. When a tool runs slow, fails, or receives malformed arguments from the model, the agent stalls or degrades silently. Neither the Python nor TypeScript MCP SDK ships OpenTelemetry instrumentation. This guide shows how to add it yourself: how to wrap tool handlers with spans, capture tool call latency and errors, track argument validity without leaking PII, measure resource cache hit rates, and point the OTLP exporter at urgentry.
20 seconds. MCP servers expose tools and resources to agent hosts like Claude Desktop, Cursor, and Claude Code. The SDKs for Python and TypeScript do not emit OpenTelemetry spans automatically. You add spans by wrapping each tool handler with a tracer call: start a span when the handler begins, set attributes for the tool name, and record any exceptions before ending the span. The same OTLP exporter setup as any other Python or Node service ships the spans to urgentry on port 4318.
60 seconds. Four questions drive MCP server observability: which tools fire (usage distribution), how long they take (latency, because the LLM waits), how often they fail (error rate), and what argument shapes the agent sends (schema conformance). A fifth signal applies when your server exposes resources: the rate at which the agent re-reads the same resource, which indicates a caching or prompting problem upstream. All five land in OpenTelemetry spans and arrive in urgentry as queryable trace data.
The instrumentation gap nobody talks about: the MCP server sees the tool call but not the prompt that produced it. You can measure every tool invocation with microsecond precision, but the reasoning chain that led the LLM to call that tool with those arguments is invisible at the server layer. The last section of this guide covers that gap and why it has to be solved at the agent host, not at the server.
What an MCP server actually does
The Model Context Protocol defines a standard interface for LLMs to call external functions and read external data. An MCP server exposes two kinds of capabilities: tools and resources.
Tools are functions the agent can invoke. A tool has a name, a JSON Schema describing its arguments, and a handler that runs when the agent calls it. The handler can do anything: query a database, call an API, run a shell command, read a file. The agent sends a tool call request; the server runs the handler and returns the result.
Resources are data sources the agent can read. A resource has a URI and optional metadata. The agent reads a resource when it needs context it does not already hold: a file, a documentation page, a configuration object. Resources are read-only by definition in the protocol.
MCP servers connect to an agent host over one of two transports: stdio (the server runs as a child process of the agent host, communicating over standard input and output) or HTTP (the server runs as a standalone HTTP service, and the agent host sends JSON-RPC requests over HTTP). The transport affects how the server process runs, but it does not affect how you instrument it.
The agent host handles transport-level communication. Your instrumentation lives inside the tool handlers and resource handlers, where the actual work happens.
The four observability questions for an MCP server
Before writing any code, identify what you need to measure. For an MCP server, four questions cover the actionable surface.
1. Which tools fire and how often? A production MCP server often exposes ten or twenty tools, but agents tend to use a small subset repeatedly. Knowing the usage distribution tells you which tools deserve the most attention for performance and reliability work. A tool the agent calls once per session warrants less attention than one it calls thirty times.
2. How long do tool calls take? The LLM waits for tool results before generating its next response. A tool that adds 800 ms to every call adds 800 ms of perceived latency to every agent turn that uses it. Tool latency is not buffered or hidden from the user the way background service latency often is. Slow tools degrade the user experience directly.
3. How often do tool calls fail? A tool handler that raises an exception returns an error to the agent. The agent may retry, add the error to its context (which costs tokens), and try a different approach. A tool with a 10% error rate generates 10% more agent turns on every task that uses it, plus token cost for the error handling. You want to know about this before users complain.
4. What argument shapes does the agent send? LLMs produce tool call arguments from natural language. The arguments are JSON, and they often deviate from the tool’s declared schema: missing required fields, wrong types, extra fields the schema does not define. These mismatches cause validation errors inside the handler. Tracking schema mismatch rate by tool tells you which tools the model struggles to call correctly, which is a signal to improve the tool’s description or simplify its schema.
Instrument a Python MCP server with OpenTelemetry
Python MCP servers built with the mcp SDK expose tools with the @mcp.tool() decorator. Wrap the decorator body with a span to capture timing, tool name, and any exceptions.
Install the required packages:
pip install opentelemetry-api opentelemetry-sdk \
opentelemetry-exporter-otlp-proto-http
Initialize the OTel SDK once at startup, then use the tracer inside your tool handlers:
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resource import Resource
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.trace import StatusCode
import mcp
# Configure the resource so every span carries the server name.
resource = Resource.create({
"service.name": "my-mcp-server",
"mcp.server.name": "my-mcp-server",
})
exporter = OTLPSpanExporter(
endpoint=os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] + "/v1/traces",
)
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("mcp-server")
# Create the MCP server.
mcp_server = mcp.Server("my-mcp-server")
@mcp_server.tool()
async def search_documents(query: str, limit: int = 10) -> list[dict]:
"""Search documents by query string."""
with tracer.start_as_current_span("mcp.tool.search_documents") as span:
span.set_attribute("mcp.tool.name", "search_documents")
# Capture argument shape without leaking the query value.
span.set_attribute("mcp.tool.args.query.length", len(query))
span.set_attribute("mcp.tool.args.limit", limit)
try:
results = await _do_search(query, limit)
span.set_attribute("mcp.tool.result.count", len(results))
return results
except Exception as exc:
span.record_exception(exc)
span.set_status(StatusCode.ERROR, str(exc))
raise
@mcp_server.tool()
async def run_query(sql: str) -> list[dict]:
"""Run a read-only SQL query against the data warehouse."""
with tracer.start_as_current_span("mcp.tool.run_query") as span:
span.set_attribute("mcp.tool.name", "run_query")
span.set_attribute("mcp.tool.args.sql.length", len(sql))
try:
rows = await _execute_query(sql)
span.set_attribute("mcp.tool.result.row_count", len(rows))
return rows
except Exception as exc:
span.record_exception(exc)
span.set_status(StatusCode.ERROR, str(exc))
raise
Each tool call generates one span. The span name encodes the tool name so you can filter by span name in urgentry without parsing attributes. The mcp.tool.name attribute duplicates the name for programmatic filtering. The mcp.server.name resource attribute tags every span from this server, which matters when you run multiple MCP servers and need to separate their telemetry.
Instrument a TypeScript MCP server
TypeScript MCP servers built with the @modelcontextprotocol/sdk package register tools with server.tool(). Wrap the handler callback with a span using the OpenTelemetry JavaScript SDK.
Install the required packages:
npm install @opentelemetry/api @opentelemetry/sdk-node \
@opentelemetry/exporter-trace-otlp-http \
@opentelemetry/resources @opentelemetry/semantic-conventions
Initialize the SDK in a separate file that you import before anything else:
// instrumentation.ts — import this before any other module
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { Resource } from "@opentelemetry/resources";
import { ATTR_SERVICE_NAME } from "@opentelemetry/semantic-conventions";
const sdk = new NodeSDK({
resource: new Resource({
[ATTR_SERVICE_NAME]: "my-mcp-server",
"mcp.server.name": "my-mcp-server",
}),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + "/v1/traces",
}),
});
sdk.start();
Then wrap each tool handler in your server file:
import "./instrumentation";
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { trace, SpanStatusCode } from "@opentelemetry/api";
const tracer = trace.getTracer("mcp-server");
const server = new Server({ name: "my-mcp-server", version: "1.0.0" });
server.tool(
"search_documents",
"Search documents by query string.",
{ query: { type: "string" }, limit: { type: "number", default: 10 } },
async ({ query, limit = 10 }) => {
return tracer.startActiveSpan("mcp.tool.search_documents", async (span) => {
span.setAttribute("mcp.tool.name", "search_documents");
span.setAttribute("mcp.tool.args.query.length", query.length);
span.setAttribute("mcp.tool.args.limit", limit);
try {
const results = await doSearch(query, limit);
span.setAttribute("mcp.tool.result.count", results.length);
span.end();
return { content: [{ type: "text", text: JSON.stringify(results) }] };
} catch (err) {
span.recordException(err as Error);
span.setStatus({ code: SpanStatusCode.ERROR, message: String(err) });
span.end();
throw err;
}
});
}
);
server.tool(
"create_issue",
"Create a new issue in the project tracker.",
{
title: { type: "string" },
body: { type: "string" },
priority: { type: "number" },
},
async ({ title, body, priority }) => {
return tracer.startActiveSpan("mcp.tool.create_issue", async (span) => {
span.setAttribute("mcp.tool.name", "create_issue");
span.setAttribute("mcp.tool.args.title.length", title.length);
span.setAttribute("mcp.tool.args.body.length", body.length);
span.setAttribute("mcp.tool.args.priority", priority);
try {
const issue = await tracker.create({ title, body, priority });
span.setAttribute("mcp.tool.result.issue_id", issue.id);
span.end();
return { content: [{ type: "text", text: issue.url }] };
} catch (err) {
span.recordException(err as Error);
span.setStatus({ code: SpanStatusCode.ERROR, message: String(err) });
span.end();
throw err;
}
});
}
);
const transport = new StdioServerTransport();
await server.connect(transport);
The pattern matches the Python version: one span per tool call, span name encodes the tool name, attributes capture shape rather than content, exceptions go to span.recordException. The important detail in the TypeScript version is calling span.end() in both the success and the error path. The JavaScript OTel SDK does not automatically end a span when the callback throws; you must end it manually or the span stays open indefinitely.
Capturing tool argument shape without leaking PII
Tool call arguments often contain user data. A search tool receives the user’s search query. A create-issue tool receives the title and body the user typed. A send-email tool receives the recipient address and message content. Logging these values as span attributes sends user data to your observability backend, which creates a compliance problem and a privacy risk.
The useful signal is not the value but the shape: did the argument conform to the schema the tool declared? Was the required field present? Was the type correct?
Three approaches work in practice:
Capture length, not content. For string arguments, record the character count. For array arguments, record the element count. For numeric arguments, record the value directly (numbers rarely carry PII). Length tells you whether the agent is sending well-formed arguments without exposing the content.
span.set_attribute("mcp.tool.args.query.length", len(args["query"]))
span.set_attribute("mcp.tool.args.filters.count", len(args.get("filters", [])))
Capture schema validation result. Validate the incoming arguments against the tool’s JSON Schema and record whether validation passed. If it failed, record which field failed and why, but not the field value. This gives you the mismatch rate signal without exposing content.
import jsonschema
SEARCH_SCHEMA = {
"type": "object",
"required": ["query"],
"properties": {
"query": {"type": "string"},
"limit": {"type": "integer", "minimum": 1, "maximum": 100},
},
}
def validate_args(schema: dict, args: dict, span) -> bool:
try:
jsonschema.validate(args, schema)
span.set_attribute("mcp.tool.args.schema_valid", True)
return True
except jsonschema.ValidationError as exc:
span.set_attribute("mcp.tool.args.schema_valid", False)
# Record path and message, not the invalid value.
span.set_attribute("mcp.tool.args.schema_error_path", str(exc.absolute_path))
span.set_attribute("mcp.tool.args.schema_error_message", exc.message)
return False
Hash the argument payload. Compute a hash of the serialized arguments and record the hash. This lets you detect when the agent calls the same tool with identical arguments repeatedly (a sign of a retry loop or a caching problem) without storing the argument content.
import hashlib, json
args_hash = hashlib.sha256(
json.dumps(args, sort_keys=True).encode()
).hexdigest()[:16]
span.set_attribute("mcp.tool.args.hash", args_hash)
Use all three together: length for size signals, schema validation for correctness signals, and hash for repetition detection. None of them store values.
The cache-hit rate signal for resources
Resources are reads. An agent reads a resource when it needs context it does not already hold. If the agent reads the same resource URI twice in a single session, one of two things is true: either the resource changed between reads, or the agent lost track of what it already read.
The second case is a problem. An agent that re-reads the same documentation page three times per session is either poorly prompted (it is not being told to remember what it reads) or the model is not retaining context across turns effectively. Either way, each duplicate read costs time (the resource handler runs again) and tokens (the content goes back into context).
Track duplicate reads by maintaining a per-session set of resource URIs already read:
from collections import defaultdict
# Map from session_id to set of URIs read in that session.
_session_resource_reads: dict[str, set[str]] = defaultdict(set)
@mcp_server.resource("docs://{page}")
async def read_docs_page(page: str, session_id: str | None = None) -> str:
"""Read a documentation page."""
uri = f"docs://{page}"
with tracer.start_as_current_span("mcp.resource.read") as span:
span.set_attribute("mcp.resource.uri", uri)
span.set_attribute("mcp.resource.name", "docs")
if session_id:
already_read = uri in _session_resource_reads[session_id]
span.set_attribute("mcp.resource.cache_hit", already_read)
_session_resource_reads[session_id].add(uri)
try:
content = await _fetch_docs_page(page)
span.set_attribute("mcp.resource.content.length", len(content))
return content
except Exception as exc:
span.record_exception(exc)
span.set_status(StatusCode.ERROR, str(exc))
raise
The mcp.resource.cache_hit attribute on every resource span lets you compute the duplicate-read rate in urgentry: divide the count of spans where the attribute is true by the total resource read span count. A rate above 20% suggests a caching or prompting problem worth investigating.
Note the session ID dependency. MCP servers communicate over stdio or HTTP but the protocol does not mandate a session concept the server can observe directly. You may need to add a session identifier to your tool and resource calls as a convention, or derive it from connection identity. The specific approach depends on how your MCP server manages connections.
Point OTLP at urgentry
urgentry accepts OTLP/HTTP at port 4318, the same port as any standard OTLP receiver. Set these environment variables before starting your MCP server process:
export OTEL_EXPORTER_OTLP_ENDPOINT=https://your-urgentry-host
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_SERVICE_NAME=my-mcp-server
export OTEL_RESOURCE_ATTRIBUTES=mcp.server.name=my-mcp-server
The mcp.server.name resource attribute is the key tag. When you run multiple MCP servers (a common setup: one server for code search, one for issue tracking, one for documentation), this attribute separates their spans in urgentry’s traces view so you can filter to a single server.
For local development:
curl -fsSL https://urgentry.com/install.sh | sh
./urgentry serve --role=all
# OTLP now available at http://localhost:4318
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=my-mcp-server-dev
If your MCP server runs as a stdio child process (the agent host forks it), set the environment variables in the shell profile or the agent host’s environment configuration. For Claude Desktop, this means setting them in the MCP server entry in claude_desktop_config.json under the env key.
Set OTEL_EXPORTER_OTLP_ENDPOINT to the base URL without a signal path. The SDK appends /v1/traces automatically. Setting the full path causes double-appending and returns 404.
What to alert on
Four alert conditions cover the majority of MCP server failures worth knowing about before users notice.
1. Tool error rate above 5%. Compute the ratio of tool call spans with status ERROR to total tool call spans over a rolling five-minute window. Alert when any single tool exceeds 5%. Break down by mcp.tool.name so the alert fires on the specific tool that is failing, not on the aggregate rate. A single broken tool with a 100% error rate in an aggregate that looks like 3% fires here but would not appear in a threshold on the aggregate.
2. p95 tool latency over threshold. Set the threshold at 2x your baseline p50 for each tool, measured over the first week of production data. Each tool has a different latency profile: a filesystem read tool might baseline at 5 ms, while a web search tool might baseline at 600 ms. A single threshold across all tools produces false positives for slow tools and misses regressions in fast tools. Per-tool thresholds are worth the configuration effort.
3. Tool call volume drop. Alert when the call volume for a tool drops to zero for more than fifteen minutes during an active session window. A tool that the agent used to call regularly but has stopped calling may have been broken silently: the handler raised an exception the agent caught and decided to stop retrying, or the tool’s description changed and the model no longer selects it. A sudden volume drop is harder to notice than an error spike, and more likely to go unreported by users.
4. Argument schema mismatch rate. Alert when the rate of spans where mcp.tool.args.schema_valid is false exceeds 10% for any tool. A high mismatch rate means the model is struggling to generate correct arguments for that tool. The fix is usually to simplify the schema, improve the tool description, or add examples. An alert on this catches the regression before it compounds into error rates.
A worked dashboard
Four panels give you a useful operational view of an MCP server. Build them in urgentry’s dashboard interface by querying span attributes.
Tools called per session. Group spans by a session identifier (if you attach one to your spans) or by a time window, count spans by mcp.tool.name, and display as a bar chart. This shows which tools the agent uses most. A tool that appears rarely may be unnecessary; a tool that appears constantly is high-leverage for optimization.
Tool latency p95 by tool. Group spans by mcp.tool.name and compute the 95th percentile of span duration for each group. Display as a horizontal bar chart sorted by p95 descending. The tools at the top are the ones adding the most tail latency to agent sessions. A 2x increase in p95 for any tool since the prior day warrants investigation.
Tool error rate by tool. Group spans by mcp.tool.name, compute the ratio of ERROR-status spans to total spans for each group, display as a table sorted by error rate descending. A tool at 0% error rate needs no attention. A tool at 8% error rate needs attention today.
Resource cache hit rate. Filter to spans with the mcp.resource.cache_hit attribute, compute the percentage where the value is true. Plot over time. A stable low rate (under 15%) means the agent reads each resource once and retains the content. A rising rate means the agent is re-reading resources more frequently, which is a signal to investigate prompting or context window management at the agent host layer.
The thing nobody instruments
The MCP server sees the tool call. It does not see the prompt that produced it.
When the agent calls search_documents with a particular query, that query came from the LLM’s internal reasoning about a user message. The user message, the conversation history, the system prompt, and the model’s chain-of-thought all contributed to the decision to call that tool with those arguments. None of that is visible to the MCP server. The server receives the final output of the reasoning process: a function name and a JSON argument object.
This is a fundamental visibility gap, and it cannot be closed at the MCP server layer. The server is a called service, not an observer of the agent. To see the prompt that led to a tool call, you need instrumentation at the agent host: the process running Claude, Cursor, or your custom agent framework. Some agent frameworks emit spans that carry a conversation ID or a turn ID that you can correlate with your MCP server spans if both use the same trace context propagation. The Anthropic MCP spec does not yet mandate trace context propagation through the tool call protocol.
What this means in practice: MCP server observability tells you what happened (which tool, how long, success or failure, what argument shape). Agent host observability tells you why it happened (what prompt, what conversation state, what model decision). You need both layers to have a complete picture, and they require separate instrumentation at separate points in the stack.
Frequently asked questions
Do the Python and TypeScript MCP SDKs include built-in OTel instrumentation?
No. Neither the Python MCP SDK nor the TypeScript MCP SDK ships OpenTelemetry instrumentation out of the box. You add it yourself by wrapping tool handlers with span creation code, as shown in the examples in this guide. There is ongoing community discussion about adding instrumentation to the official SDKs, but as of May 2026 no such support has shipped.
What transport should I use for an MCP server that needs observability?
Both stdio and HTTP transports work with the instrumentation patterns in this guide. The transport choice does not affect how you create or export spans. Stdio servers run as child processes; HTTP servers run as standalone services. Both export OTLP to urgentry the same way, using the standard environment variables.
Is it safe to capture tool call arguments in spans?
Capture argument structure, not values. Record the JSON Schema validation result, the count of top-level keys, string lengths, or a hash of the argument payload. Do not log raw argument values because they often contain user data, API credentials, or personally identifiable information that should not land in an observability backend.
How do I distinguish an MCP tool call span from other spans in urgentry?
Set mcp.server.name as a resource attribute and mcp.tool.name as a span attribute on every tool call span. Use span names in the mcp.tool.* namespace. These conventions let you filter urgentry’s traces view to MCP spans only, and break down by server or by tool without custom parsing.
Can I run urgentry on my development machine and point my local MCP server at it?
Yes. urgentry runs on a laptop without external dependencies. Start it with a local data directory and it listens on port 4318 for OTLP/HTTP. Set OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 in your MCP server process and tool spans appear in urgentry within seconds of each tool call.
Sources and further reading
- Anthropic MCP specification — the canonical protocol reference for tools, resources, transports, and the JSON-RPC message format MCP servers implement.
- modelcontextprotocol.io documentation — the official MCP developer documentation, including SDK guides for Python and TypeScript and the tool schema conventions.
- MCP Python SDK — the official Python SDK. The
@mcp.tool()decorator and@mcp.resource()decorator are the instrumentation attachment points used in this guide. - MCP TypeScript SDK — the official TypeScript SDK. The
server.tool()handler registration is the attachment point for spans in the TypeScript examples. - opentelemetry-python — the Python OpenTelemetry SDK documentation covering TracerProvider setup, BatchSpanProcessor, and the OTLP exporter configuration used in this guide.
- opentelemetry-js — the JavaScript/TypeScript OpenTelemetry SDK documentation covering NodeSDK setup, startActiveSpan, and the OTLP HTTP exporter.
- Functional Source License 1.1 (FSL-1.1-Apache-2.0) — the license under which urgentry is distributed. Grants use rights; converts to Apache 2.0 after two years.
- urgentry compatibility matrix — the published protocol compatibility audit, including OTLP/HTTP ingest coverage at
/v1/traces.
One binary. MCP tool spans, errors, and resource reads together.
urgentry accepts OTLP/HTTP at /v1/traces in the same binary that handles error tracking. Tool call exceptions become issues. Latency and usage data land as span attributes. Point your MCP server’s OTel exporter at port 4318 and the data appears within seconds.