Catching agent-introduced bugs: swallowed exceptions in AI-coded backends
AI agents write try/except blocks the way they learned to: cleanly, defensively, and without re-raising. The exception is gone. The endpoint returns 200 OK. Your error tracker is silent. The bug is in production for three weeks before a customer notices the wrong total.
20 seconds. AI coding agents have a measurable bias toward writing exception handlers that catch and discard. The endpoint returns a success code, the test passes, and the error tracker never sees the failure because nothing called capture_exception(). Four antipatterns account for most of it: empty except, bare except Exception with a default return, silent retries that exhaust without alerting, and the 200 OK with error body shape.
60 seconds. The fix is not bigger sampling at the tracker. The fix is at the handler. Every except block in agent-authored code should either log-and-re-raise or call capture_exception() with a context tag identifying that the error was caught. Tag agent-authored deploys with a release marker so you can compare swallowed-exception counts against a human-authored baseline. The June 2026 launch of @ZypherHQ's /backend-doctor is the first open-source linter specifically targeting this shape; the issue is real enough to have a tool category now.
This guide covers the four antipatterns, the linter rules that catch them at PR time, the tracker-side instrumentation that surfaces them in production, and the deploy-tagging pattern that lets you measure the actual blast radius of agent code on your error rate.
Why agent-authored code swallows exceptions
The training corpus is the explanation. Public code on GitHub contains an enormous volume of try/except blocks that exist to make a test green, smooth over a flaky integration, or quiet a noisy log. The shape is overrepresented relative to what a careful engineer would write today. An agent trained on that corpus learns the shape without inheriting the cost - because the cost lives in production incidents that never got written back to the public source.
Three forces compound the bias. First, agents optimize for tests that pass. A bare except Exception with a default return value will make a flaky integration test go green; the agent will write it because the reward signal points there. Second, agent code is rarely accompanied by a postmortem on what happens when the caught exception fires in production. The agent sees the catch, not the silence that follows. Third, the agents are often graded on PR-merge velocity, which rewards code that looks defensive over code that surfaces the actual failure mode.
The result, observed across several teams running agent-assisted backend work in 2026, is a measurable increase in the rate of caught-and-discarded exceptions in services where agents are doing significant authorship. The 200-OK-with-error-body shape, which urgentry's own catalog of silent ingest failures documents at the tracker layer, shows up at the application layer in the same way. Same wire shape, different code path.
The four antipatterns
Most agent-introduced silent failures fall into one of four shapes. Each is detectable at PR time with a linter, and each leaves a measurable signature in production once you know what to look for.
1. The empty except
The most basic shape, and the one even early-2024-era agents got better at avoiding - but it still slips through, especially in language-model-generated test fixtures and retry wrappers.
try:
result = upstream_api.fetch(user_id)
except:
pass
return result # NameError if the try block failed
The exception is gone. The local variable is undefined. Depending on where this lives, the caller sees a NameError a few frames up, an unset attribute, or a default that quietly enters the data flow. None of those are tracked back to the original failure.
2. The bare except with a default return
The shape that ships most often. The handler catches broadly, returns a sensible-looking default, and the calling code proceeds as if the operation succeeded.
def get_user_balance(user_id: str) -> int:
try:
return wallet_client.balance(user_id)
except Exception:
return 0
A failed RPC now returns zero. The downstream code computes a discount based on zero balance, applies a free tier, or skips a credit check. The customer sees the wrong number for three weeks before billing reconciles. The error tracker has nothing because capture_exception() never ran.
3. The silent retry that exhausts
The agent reaches for a retry decorator because the integration test was flaky. The decorator catches and retries N times. After N failures the decorator returns a default and the calling code never knows the retry budget was burned.
@retry(attempts=5, backoff=exponential)
def queue_publish(event):
return broker.publish(event)
def handler(req):
queue_publish(req.event) # returns None on exhaustion
return Response(status=200)
The endpoint returns 200. The event is on no queue. The downstream consumer never fires. The user got their confirmation page; the system did not actually do the thing. This is the same wire shape as the silent 200-OK at the tracker layer, just one level up the stack.
4. The 200 OK with an error body
The shape that survives even careful code review because the function signature returns a response object. The handler catches the exception, packs an error message into a successful response, and the calling code reads status_code == 200 as success.
def webhook(req):
try:
process_payment(req)
return JsonResponse({"ok": True}, status=200)
except PaymentError as e:
return JsonResponse({"ok": False, "error": str(e)}, status=200)
The caller, often another agent-authored service, checks the HTTP status and proceeds. The "ok": false branch is invisible to anything that looks only at status codes. The tracker never sees the PaymentError because the exception was caught locally.
What the linters catch (and what they miss)
PR-time tooling has caught up with the first two antipatterns. /backend-doctor from @ZypherHQ, launched June 5, 2026 as an open-source tool, specifically flags swallowed-exception shapes in agent-authored diffs. Python's ruff rule BLE001 ("do not catch blind exception") and S110 ("try-except-pass detected") cover the same ground for any Python codebase. JavaScript's eslint-plugin-no-swallow and TypeScript's @typescript-eslint/no-misused-promises hit the equivalent shapes in Node.
What linters do not catch is the third and fourth antipattern. A retry decorator that returns a default on exhaustion is, by static analysis, indistinguishable from a retry decorator that always returns a value. A handler that packs an error into a 200 response is, syntactically, returning a valid HTTP response object. Both shapes need a runtime signal to surface.
That runtime signal is what the error tracker provides - but only if the SDK is told. The tracker cannot see exceptions that were caught and never reported. It can see response-shape anomalies if you instrument for them. It can see retry budgets if you emit a span per attempt. The two-line fix in most cases is to add capture_exception() inside the except block, with a context tag noting the error was caught:
import sentry_sdk
def get_user_balance(user_id: str) -> int:
try:
return wallet_client.balance(user_id)
except Exception:
sentry_sdk.capture_exception(
level="warning",
tags={"caught": "true", "fallback": "zero_balance"},
)
return 0
Now the tracker has the event. The caught=true tag lets you filter handled-but-still-broken failures away from the noisy crash list. The fallback tag tells you which default value the user ended up with. The exception is no longer silent; it is a warning-level event in a filterable bucket.
The deploy tag that makes the blast radius visible
Telling whether agent-authored code is contributing disproportionately to silent failures requires a comparison point. The pattern that works: tag each deploy with a release marker that identifies the authorship mode, then look at the issue list filtered by release.
A minimal release-tagging pattern at deploy time:
RELEASE_TAG="2026.06.19-agent" # or "2026.06.19-human"
export SENTRY_RELEASE="$RELEASE_TAG"
# In code: every SDK init reads SENTRY_RELEASE automatically.
# Events are now bucketed by release.
With two weeks of data, the comparison is mechanical: count caught=true events per release, normalize by traffic volume, and compare. If the agent-tagged release is materially noisier on the caught-error bucket than the human baseline, that is the signal. The number you are looking for is not the absolute count - it is the ratio of caught-and-reported errors to total request volume, deploy-tagged so it is comparable.
The tracker is doing nothing special here. It is the same Sentry-SDK-compatible ingest path documented in the release-health guide. The technique is the deploy tag plus the caught custom tag, which together turn a measurement that did not exist into one you can act on.
The MCP-shaped variant
The same antipatterns show up in MCP servers, with one extra wrinkle: the tool-call response itself becomes the place where the swallowed error lands. An MCP server that wraps an upstream API in a tool call, catches the upstream failure, and returns "Sorry, I couldn't fetch that." as a tool result has done the application-layer equivalent of the 200-OK-with-error-body shape. The calling agent reads the tool result as successful (the tool call returned), proceeds, and the upstream failure is invisible.
The fix for MCP server code is the same as the fix for HTTP handlers: capture_exception() inside the catch, with a tool_name and caught=true tag. The MCP server observability guide covers the OTel-shaped instrumentation; the swallowed-exception fix is a one-line addition to that path.
What this is not
Two things to head off. First, this is not an argument against using AI agents for backend code. It is an argument for an extra-careful read of the exception-handling surface in agent-authored diffs, plus a measurement loop that catches what gets through. Teams shipping agent code who do both are not seeing higher overall incident rates; teams doing neither are.
Second, this is not specific to any one model or agent harness. The antipatterns predate AI agents - human engineers have been writing bare except Exception blocks since Python had exceptions. The shift in 2026 is the volume. An agent that drafts a hundred PRs a week multiplies whatever the corpus bias is by a hundred. The error tracker is what catches the bias before it ships.
Frequently asked questions
What is a swallowed exception, and why do AI agents write them?
A swallowed exception is one that the code catches but never reports - no log, no re-raise, no tracker event. Agents write them because the training corpus is full of try/except blocks that exist to make tests pass or smooth over flaky integrations. The agent learned the shape without learning the cost. The block compiles, the test goes green, the bug ships.
Why didn't my error tracker catch it?
Trackers capture what the SDK is told about. If the exception is caught and the handler returns a default value, the SDK never sees it. The fix is not on the tracker side - it is on the handler side. You either log-and-re-raise, or call capture_exception() inside the except block, or both. The tracker is the destination; the SDK call is the trigger.
Should I just disallow try/except in agent code reviews?
No. Some exception handling is legitimate. The pattern to flag is the empty except, the bare except Exception with a default return, and the pass statement inside a catch block. Those three shapes account for the vast majority of swallowed-exception incidents. Other handlers - retry-with-backoff, fall-back-to-cache, log-and-re-raise - are fine when they are deliberate.
How does this interact with the 200-OK silent-failure pattern at the tracker layer?
They are the same shape at two different layers. The application swallows the exception and returns 200 OK to its caller. The error tracker, on the ingest side, returns 200 OK without storing the event. Both look healthy on the wire and fail invisibly. The patterns that detect one also detect the other - response-shape monitoring and end-to-end probes.
Does urgentry have anything specific for agent-authored code?
Not as a named feature. The SDK call surface is identical to Sentry. What helps is the release-tagged event view: tag agent-authored deploys with a release marker, watch the issue list for that release for two weeks, compare the swallowed-exception count against a human-authored baseline. The tracker does not know who wrote the code; you make the comparison visible by tagging the deploy.
Sources
- @ZypherHQ's launch of
/backend-doctor— the June 5, 2026 announcement of an open-source linter targeting swallowed-exception shapes in AI-agent-authored backend code. Confirms the antipattern is named and tooled at the ecosystem level. - Ruff rule BLE001 (blind-except) — Python lint rule that flags bare
exceptandexcept Exception. The closest pre-existing tool to what/backend-doctorgeneralizes. - Ruff rule S110 (try-except-pass) — the rule that catches the empty-pass shape specifically; bandit-origin, ported into Ruff.
- Sentry Python SDK
capture_exception()reference — the canonical SDK call for reporting a caught exception with tags and level. The fix at the bottom of every antipattern in this guide. - Sentry release tagging documentation — the release-marker mechanism this guide uses to make the blast-radius comparison work. The same shape is supported by every Sentry-compatible backend, including urgentry.
The tracker that catches what the agent missed.
urgentry accepts the Sentry SDK envelope on a $5 VPS. Tag your agent deploys, watch the caught-error bucket, and the silent failures stop being silent. One DSN swap, full Sentry SDK compatibility, no SaaS event quota.