Guide AI agents & MCP observability ~9 min read Updated June 8, 2026

From error to PR: a self-hosted auto-fix loop with urgentry, MCP, and an AI agent

The loop teams are wiring this month: an error event lands, an AI agent reads it through MCP, the agent edits a sandboxed checkout, the tests run, and a draft pull request shows up in Slack. A human still merges. The loop replaces triage typing, not engineering judgement.

TL;DR

20 seconds. Wire your error tracker to an AI agent through MCP, give the agent read-only authority on the tracker and write authority only on a checkout, and you get the four-stage loop people are posting screenshots of in June 2026: detect, triage, draft, gate. urgentry plus a small MCP server plus Claude Code or Cursor is enough to start.

60 seconds. The earlier read-side MCP guide covered letting an agent investigate. This guide is the next step: letting the same agent draft the fix. The architecture has four moving parts. urgentry fires the event. A small MCP server gives the agent read tools over urgentry (issue, events, breadcrumbs, similar issues) and write tools over a sandbox (clone, edit, test, push, open PR). The agent runs the loop and stops at the PR. A human reviews the diff with the issue link attached and merges. Nothing the agent does touches production until a human pushes a button. The hard parts are not the agent or the MCP server; they are choosing what the agent is allowed to touch, scrubbing prompts before they leave your network, and not letting noisy issues poison the queue.

This guide covers the four stages of the loop, the MCP tool surface that makes it safe, the four failure modes that kill the experiment in week three, the trust gradient that tells you where to put the human gate, and the cost math that decides whether you self-host the model too.

The loop teams are actually building

In the first week of June 2026, public demos of "error in, PR out" loops started landing on X with screen recordings attached. The pattern is consistent across the demos: a small MCP server sits between an error tracker and an AI agent like Claude Code or Cursor, the agent uses the MCP tools to investigate and edit, and the human in the loop reviews a draft PR. One operator described going from a bug report to a merged fix in minutes; another product pitched the approach as "observability that’s meant never to be opened" because the agent does the opening for you.

The loop is not novel in concept. Engineers have been wiring webhooks to scripts for years. What is new is the agent vocabulary. A webhook delivers a payload. MCP gives the agent a set of tools it can call iteratively: pull the stack trace, list other issues sharing this fingerprint, fetch the breadcrumbs from the user session, check the release tag, look at the file the top frame points at. The agent decides what to ask next based on what the last answer told it. That iterative quality is what turns "automatic" into "useful."

Three things make this practical in mid-2026 that were not practical a year ago. MCP itself stabilized. Frontier models got good enough at file-aware patching that the test suite passing is a useful signal rather than a lottery. And self-hosted error trackers like urgentry made it cheap to expose the tracker as an MCP target without sending production stack traces through a SaaS vendor first.

The four stages

Every working version of this loop has the same four stages. The differences between teams are mostly about where they put the gate.

1. Detect

urgentry receives the event from the Sentry SDK or the OTLP collector, dedupes it against an existing issue by fingerprint, and either opens a new issue or increments the count on an existing one. Issues only enter the loop on the first occurrence of a fingerprint, or on a regression after a release tag bump. Every later occurrence of the same fingerprint just adds an event to the issue count; the agent never sees those individually. That single rule cuts the agent’s workload by one or two orders of magnitude on any real codebase.

2. Triage

A trigger (a webhook, a poll on the urgentry issues feed, an explicit Slack mention) hands the new issue ID to the agent. The agent calls MCP read tools to assemble context. The good ones call four to ten tools before they write anything: issue.get, issue.events.list, issue.breadcrumbs, issue.similar, release.get, file.read at the path the top frame points at. The output of this stage is a structured triage note. The agent stops here if the issue is ambiguous, if the stack trace lacks a clear in-app frame, or if the fingerprint hash matches a marked "do not auto-fix" tag.

3. Draft

The agent clones the repo at the release commit tagged on the event (not at HEAD), checks out a branch named after the issue ID, edits the file or files the triage step identified, and runs the test suite in a sandbox. If the tests pass, the agent commits, pushes, and opens a draft PR with the urgentry issue link in the description. If the tests fail, the agent gets one retry budget; if that retry also fails, the agent comments on the issue with the failing test output and stops. A persistent failure here is far more useful than a forced "fix" that breaks something else.

4. Gate

The PR opens as a draft. The human is paged through the same Slack channel that normally surfaces incidents. They read the diff, click into the linked urgentry issue if they need more context, and either approve or close. Some teams add a CI step that re-runs the full test suite in their normal CI environment as a second check before the merge button enables. The agent never merges. The agent never pushes to a protected branch. The agent never closes the urgentry issue; the merge does, through the existing issue-close-on-merge hook.

Why this needs to be self-hosted

The prompts in this loop contain a lot of things you would normally pay good money to redact. Stack traces interpolate user input into exception messages. Breadcrumbs carry URLs with query strings and form bodies. The triage note the agent assembles is a faithful summary of all of that. The patch the agent produces is, by definition, a diff against your private codebase. Every byte of this passes through whatever inference endpoint your agent uses.

If that endpoint is a SaaS frontier model, your provider sees it. Some teams are comfortable with that calculus, especially under enterprise agreements with strict no-training clauses. Many are not, especially in regulated industries, on customer-facing services that handle PII, or on internal tools where the codebase itself is the moat. For those teams the only honest version of this loop has urgentry self-hosted (so the source events stay in-network), an MCP server self-hosted (so the tool calls stay in-network), and a model running locally or in a controlled VPC (so the prompts stay in-network end-to-end). urgentry on a small VPS plus a local Llama-class model on a single A100 covers the path; the engineering work is small.

Even with a SaaS model, self-hosting urgentry meaningfully reduces the blast radius. The agent only sees what your MCP server hands it, and your MCP server can scrub aggressively before the JSON leaves the box. PII scrubbing at the tracker, redaction at the MCP boundary, and a deny-list of files the agent is never shown all become straightforward when you own both ends.

The MCP tool surface

The tool surface is the security model. The right surface gives the agent enough to triage and patch, and exactly zero authority beyond that. The wrong surface either fails (the agent cannot do useful work) or terrifies (the agent could break production with one tool call). A working starting point splits cleanly into read and write halves.

Read half (against urgentry, project-scoped token):

issue.get(id)               → issue metadata, fingerprint, first/last seen
issue.events.list(id)       → recent event payloads, capped
issue.breadcrumbs(event_id) → ordered breadcrumb trail for one event
issue.similar(id)           → other issues sharing a fingerprint prefix
release.get(version)        → commit SHA, deploy timestamp, environment

Write half (against a sandbox, not production):

repo.checkout(url, commit)  → clone at a specific commit, isolated workdir
repo.read(path)             → read a file inside the workdir
repo.edit(path, diff)       → apply a unified diff to the workdir
repo.test(command)          → run the test suite, capped wall time
repo.commit(message)        → commit the diff to a new branch
repo.open_pr(title, body)   → open as draft, never auto-merge

Two tools are deliberately absent. There is no issue.close; closing is done by the merge hook on the PR, which means the issue only closes when a human merges. There is no repo.merge; the merge button stays on the GitHub UI where it has been for fifteen years. Removing these two tools is what makes the rest of the surface safe to expose.

Four failure modes that kill the experiment

The teams who quietly stop running the loop after three weeks usually hit one of these four. None are unfixable; all are predictable.

Noisy issues poisoning the queue

Without filtering, the agent will burn tokens triaging the same five flaky-network errors every morning. Fix this at urgentry, not at the agent: tag known-flaky fingerprints with an auto-fix:no tag, and have your trigger skip any issue carrying that tag. The agent should never see them. Build a small habit of moving a fingerprint into the deny-list when the agent has bounced off it three times.

Multi-file fixes the agent cannot reason about

The single-file fix rate is high; the cross-file fix rate is low and drops further as the blast radius grows. Be explicit. Have the agent label PRs as scope:single-file or scope:multi-file. Auto-close the multi-file PRs after twenty-four hours if no human has engaged, and have the triage note attached to the issue instead. The human gets the work pre-summarized; the queue does not fill with stale draft PRs no one trusts.

Secret leakage in prompts

A stack trace with a connection string in it goes straight into the agent prompt. Scrub at the MCP boundary, not after the fact. Run every event payload through a redaction pass that strips known secret patterns (URLs with passwords, common token prefixes, anything matching your secret-scanner rules) before the agent ever sees the JSON. The same scrub belongs on breadcrumb URLs and query strings.

Regression risk from a passing test suite

"Tests passed" is not "fix is correct." If your test suite has weak coverage on the area the agent edited, a green run only tells you the agent did not break anything visible. The countermeasure is mechanical: require the agent’s PR to include either a new test that fails on the old code and passes on the new code, or a labeled justification for why a new test is not possible. The justification rule alone catches most cases where the agent is patching a symptom rather than a cause.

The trust gradient

The agent is doing things on a spectrum from "could not possibly cause harm" to "could ruin your week." Map your gates to that spectrum and the loop stabilizes; collapse it into one gate and you either over-gate trivial work or under-gate dangerous work.

  • Read the tracker. No gate. This is what humans do every morning anyway.
  • Comment a triage note on an issue. No gate. A bad triage note is recoverable; a good one saves five minutes.
  • Edit a sandboxed checkout. No gate. The sandbox is the gate.
  • Open a draft PR. No gate. The draft state is the gate.
  • Mark the PR ready for review. Light gate (CI re-run on a clean environment).
  • Merge the PR. Human gate. Always.
  • Deploy. Whatever gate you have today. The agent does not touch this.

The mistake is treating "merge" and "deploy" as the only meaningful gates and leaving everything before them ungated and unobservable. Each step in the agent’s work should be inspectable after the fact, ideally as a span in the same tracing pipeline you already feed to urgentry. The agent error tracking guide covers the OTel instrumentation; you want it on for this loop on day one.

Cost math, briefly

The dollar cost of an auto-fix attempt is usually somewhere between a few cents and a couple of dollars per issue, depending on model choice and how chatty your MCP server is. Multiply by issues per day and the bill is real but not unmanageable for most teams. The cost that actually matters is engineer attention.

Teams who measure this carefully report something like: the agent fixes 20 to 40 percent of single-file issues end-to-end, drafts useful starting points for another 30 to 40 percent, and is dead weight on the rest. The win is rarely the fix rate. The win is that every issue arrives at the human pre-investigated. The triage note alone, even on the issues the agent cannot fix, is worth more than the agent costs. Self-host the model and the dollar cost vanishes into the GPU bill you were going to pay anyway.

What ships first

The smallest useful version of this loop, from scratch, is an afternoon of work for one engineer. Stand up urgentry on a small VPS (see the $5 VPS guide). Write a 200-line MCP server that exposes the five read tools listed above against the urgentry HTTP API. Point Claude Code at it. Have it triage a real issue. Read the output, sharpen the prompts, then add the six write tools. By the end of the day you have a draft PR opening against a sandbox repo, and you have a list of the six things you want to tighten before pointing it at production.

The full version is mostly hardening: the redaction layer at the MCP boundary, the deny-list tagging in urgentry, the scope labelling on PRs, the test-or-justify rule in CI, the OTel spans on every agent tool call so you can see what it asked for and what it got back. Each of these is a couple of hours and each one matters more than the agent itself.

The loop is small. The discipline around it is what makes it last past week three.

Frequently asked questions

What is an auto-fix loop for error tracking?

It is a pipeline that turns an error event into a draft pull request without a human typing a prompt. The error tracker fires the trigger, an AI agent reads the issue through MCP, the agent edits a checkout, runs the test suite, and opens a PR. The human still merges. The loop replaces triage typing, not engineering judgement.

Why route this through MCP instead of a webhook?

A webhook gives the agent a payload. MCP gives the agent a vocabulary. With an MCP server in front of urgentry the agent can ask follow-up questions: pull the breadcrumb trail, list other issues sharing this fingerprint, check whether the release tag matches the current branch. A webhook can deliver one event; MCP lets the agent investigate one.

Is it safe to let an AI agent open PRs against my codebase?

Safe enough if the agent only ever produces PRs that a human merges. Treat the agent like an intern who can read your error tracker, edit a sandboxed checkout, and run tests. Do not give it merge rights, do not give it write tokens to production, and do not let it call urgentry write APIs. The gate is the merge button.

Does this need a self-hosted error tracker?

Strictly no, but the prompts the agent sees contain everything you would normally redact: stack traces with user input, internal endpoints, credentials interpolated into exception messages, your repository structure. If those leave your boundary, the agent provider sees them. Self-hosted urgentry plus a self-hosted model keeps that material in-network end-to-end.

What is the realistic auto-fix rate in 2026?

Public reports from teams running these loops land between 20 and 40 percent for single-file fixes with existing tests. Multi-file fixes drop into the single digits. The win is not the fix rate; it is that the agent does the triage typing on every issue, so the 60 to 80 percent it cannot fix arrive at the human pre-summarized.

Sources

  1. Model Context Protocol specification — the canonical reference for the MCP tool, resource, and prompt vocabulary the agent uses against the error tracker.
  2. Sentry MCP server documentation — reference implementation of the read-side of this pattern, useful as a tool-surface template even when pointing at urgentry instead.
  3. Claude Code documentation — the agent runtime most of the June 2026 public demos used, including the MCP client configuration the loop depends on.
  4. GitHub Pulls REST API — the canonical surface for opening draft PRs from the write half of the MCP tool list.
  5. Sentry SDK data handling guidance — the upstream rules for what to scrub before events leave the SDK, mirrored by the MCP-boundary scrubber recommended in this guide.

Your error tracker. Your MCP server. Your merge button.

urgentry is a single Go binary that speaks the Sentry SDK envelope on a $5 VPS. Wire an MCP server in front of it and your AI agent triages and drafts without leaving your network. Change one DSN to start.