Guide Evaluation playbooks ~8 min read Updated June 6, 2026

Validating event content after the DSN swap: what to diff between Sentry and your self-hosted backend

Q: How do I actually pull events from both backends to compare?

Both Sentry and urgentry expose the same REST endpoint: GET /api/0/projects/{org}/{project}/events/{event_id}/. Authenticate with an organization auth token per side, request matching event windows, and pipe the JSON through a small diff script that whitelists known drift (event_id, received_at, server-ingest tags). The schema is stable enough that jq plus a structural comparator gets you 80% of the answer.

A 200 OK from your new backend means the event was accepted. It does not mean the event matches what Sentry would have stored. During a side-by-side evaluation, event counts can line up while event content quietly drifts. The six fields to diff, the mechanics, and the four checks that gate a cutover.

TL;DR

20 seconds. Event count parity proves delivery, not equivalence. Diff six fields on the top fifty issues — fingerprint hash, in-app frame set, exception type and value, breadcrumbs, release and environment tags, user context — and gate cutover on the four checks at the bottom of this guide. Pull events from both backends through the same REST endpoint and run a structural diff with a whitelist of expected drift.

60 seconds. During a parallel run, every SDK instance fans out to two backends — Sentry SaaS and your self-hosted candidate — and you assume the resulting issue lists should be near-identical. They often are not. Stack-frame classification diverges when one backend has a stale in-app pattern, fingerprints diverge when the grouping config drifts a minor version, breadcrumbs drop when the SDK transport buffers differently against a slower endpoint, and release tags vanish when the post-deploy CI step targets only one backend. None of this surfaces as an error in the SDK debug log; the events still ship, and both projects report healthy ingest. The drift shows up later, in runbooks that worked on Sentry and miss on the replacement.

This guide covers the two failure modes a count-only check misses, the six fields worth diffing, a worked mechanics section using curl and jq, what counts as acceptable drift, what drift actually reveals about your setup, and the four checks that decide whether your evaluation is ready to become a cutover.

Why content drift matters

The standard side-by-side checklist asks two questions: are the events arriving, and are the counts within tolerance. Both can pass while the events you stored differ in ways that change how your team responds.

One operator put it bluntly on X earlier this month: "self-hosted sentry nails the exceptions. the ones that bite are the requests that 200 and quietly return the wrong payload." The framing is about HTTP responses, but the same shape applies to error tracking. The envelope is accepted. The fingerprint quietly differs. The stack trace classifies the same frame as in_app: false on one side and in_app: true on the other. Your runbook says "page on the third occurrence of this issue" and the issue does not exist under that ID on the new backend, because the grouping put it elsewhere.

Count parity is a delivery proof. Content parity is an operational proof. A side-by-side evaluation needs both before you can flip the DSN for good.

The two failure modes a count-only check misses

A count-only check misses everything that happens after the backend acknowledges the envelope. Two failure modes dominate.

Classification drift. The same captured exception lands under a different issue group, a different in-app frame set, or with a different severity tag. Your dashboards and alert rules key off issue identity. When the same exception arrives but is grouped under a new issue, your alert rule that fires on "this issue passed 100 events in 5 minutes" never trips, because the event counter is split between two issues that should have been one.

Context loss. The event arrives with fewer breadcrumbs, a missing user tag, or no release annotation. This usually happens when the SDK transport applies different retry or queueing behavior to a slower or less-tested second backend, or when a post-deploy step (release creation, source-map upload, environment tagging) is wired into only one of the two pipelines. The event count is identical; the event detail is a worse version of itself on one side. You only see this when an on-call engineer opens the same issue on both backends and notices the breadcrumb list on one of them stops three steps short of the exception.

The six fields worth diffing

Diff these on the top fifty issues by event volume. They cover most of the drift we have seen in real evaluations, in roughly the order they break.

1. Fingerprint hash

Fingerprinting decides which events collapse into one issue. The default Sentry fingerprint hashes the exception type and the in-app stack frames. If either backend has a different grouping config version, or a different list of in-app modules, the same captured event hashes differently and lands under separate issues. The deeper mechanics are in the event grouping guide.

What to diff: the fingerprint array on the stored event, and the groupingConfig.id in the event metadata. If fingerprint is the literal [""] on both sides but the resulting group_id differs, the grouping config drifted.

2. Exception type and value

The exception type and value strings come straight from the runtime and should never differ for the same underlying error. When they do, the SDK on the slow side is hitting a serialization fallback — usually a payload-size limit that truncates the value, or a stack-walker that fails to resolve the type and writes Error as a placeholder.

What to diff: exception.values[0].type and exception.values[0].value. A truncated value (ending in …) on one side is your tell.

3. In-app stack frames

The in_app flag on each frame is what your UI uses to fold vendor noise. Drift here is the most common single-source-of-pain in evaluations because the in-app rules are tag-list configuration, not code. A team that maintains the rule on Sentry and forgets to mirror it on the replacement gets one well-folded stack trace and one wall of node_modules.

What to diff: the count of frames where in_app: true, the deepest in-app frame's module, and the in-app pattern list on the project settings page. The pattern list is what generates the flag.

4. Breadcrumbs

Breadcrumbs are the last N steps before the exception. The SDK captures them in a ring buffer and ships them with the event. The buffer size is fixed (usually 100), but the SDK transport can drop breadcrumbs when the event payload exceeds the per-backend size limit. urgentry, GlitchTip, Bugsink, and Sentry all advertise a 200 KB or 1 MB event ceiling; the exact number differs.

What to diff: the length of the breadcrumbs.values array, the category field of the final breadcrumb (it should be the one immediately preceding the exception on both sides), and the total payload size on the wire. If the breadcrumb count is identical but the payload size differs by 30%, something in the breadcrumb data is being scrubbed differently between the two pipelines.

5. Release and environment tags

The release tag is what powers crash-free-session math and regression detection. It is set by the SDK at init time from SENTRY_RELEASE or an equivalent. If your CI creates a release on Sentry via sentry-cli releases new after deploy but does not run the same call against urgentry, every event on the urgentry side will land under a release that has no associated commit metadata.

What to diff: the tags.release value and the existence of a release record by that name on each backend. A missing release record is a worse problem than a missing tag; it means the regression detector never engages.

6. User and request context

The user object on the event carries whatever identifier you set with Sentry.setUser(). PII scrubbing rules on each backend can strip parts of it. Sentry's data-scrubbing UI and urgentry's relay.toml equivalent are configured separately; a team that wrote a strict rule on Sentry and a permissive one on urgentry will accidentally retain emails on the new backend.

What to diff: the keys present under user, the keys present under request, and the redacted values. [Filtered] on one side and a real email on the other is a compliance issue, not a drift item.

The mechanics: dual-DSN, REST pull, structural diff

The setup is a parallel run where the SDK fans out to both backends, then a diff job pulls the stored events from each side and compares them.

Fan-out has two shapes. The first is a dual-DSN init in the SDK, where you call Sentry.init with the second DSN under a separate Hub. The second is an OTel Collector in front of the SDK, with a fanout exporter that sends the same OTLP stream to two Sentry-compatible OTLP ingest endpoints. The fanout pattern is the cleaner of the two because the SDK does not know there is a second backend; both sides receive the exact same event from the same transport.

The REST pull uses an endpoint both sides expose:

curl -H "Authorization: Bearer $SENTRY_TOKEN" \
  "https://sentry.io/api/0/projects/$ORG/$PROJECT/events/?statsPeriod=24h" \
  > sentry-events.json

curl -H "Authorization: Bearer $URGENTRY_TOKEN" \
  "https://errors.yourdomain.com/api/0/projects/$ORG/$PROJECT/events/?statsPeriod=24h" \
  > urgentry-events.json

Both responses are arrays of event summaries. Pair them by the SDK-generated event_id — the SDK sets the same UUID on both fan-out copies, so the same event resolves to the same ID on both backends. From there, a small jq script extracts the six fields and a diff tool surfaces mismatches:

jq -r '.[] | [.eventID, .fingerprint, .title, (.tags[]|select(.key=="release")|.value)] | @tsv' \
  sentry-events.json | sort > sentry.tsv

jq -r '.[] | [.eventID, .fingerprint, .title, (.tags[]|select(.key=="release")|.value)] | @tsv' \
  urgentry-events.json | sort > urgentry.tsv

diff sentry.tsv urgentry.tsv | head -50

For the per-event detail (stack frames, breadcrumbs, full user context) you hit the singular event endpoint instead: GET /api/0/projects/{org}/{project}/events/{event_id}/. Pull the worst offenders the summary diff surfaces and inspect them by hand. Fifty events is the right scope for the first pass.

What "match" should mean

Three categories of field are expected to differ. Whitelist them in your diff script before treating anything else as drift.

Backend-assigned identifiers. The event row's primary key, the issue ID, the project's internal ID, the ingest timestamp. Each backend writes these in its own namespace.
Server-ingest tags. server_name, ingest_node, regional tags added by the ingest layer. These describe the ingest server, not the event.
Transport metadata. The User-Agent of the SDK's HTTP client, the X-Sentry-Auth header nonce, anything in the envelope header that is request-level rather than event-level.

Everything else should match for events captured by the same SDK instance and shipped to both backends. If it does not, the diff is signal.

What drift reveals

The pattern of drift maps fairly cleanly to its cause.

Fingerprint hash diverges, frame count matches. Grouping config drift. Check the grouping config version on both project settings pages and align them.
In-app frame count differs. In-app pattern list drift. Mirror the project-level pattern list between backends; this is usually a one-time copy.
Breadcrumb count differs. Payload-size limit drift, or the SDK is hitting different transport buffers on the two endpoints. Check the per-event payload size on the wire with the SDK's beforeSend hook.
Release tag missing on one side. Post-deploy release creation is wired into only one pipeline. Add the equivalent releases new call to your CI step against the second backend.
User context redacted differently. PII scrubbing rules are out of sync. Export the rule list from one side and apply it to the other before cutover.

The point of the diff is not to declare a winner. It is to surface every place where the two backends behave differently and force a decision: align them, or accept the drift and update the runbook.

The four cutover gates

When the diff has been clean for seven consecutive days, four explicit checks decide whether the parallel run becomes a cutover.

Fingerprint hash stability on the top ten issues. The same ten issues by event volume should exist on both backends, each with the same fingerprint hash. If two of them differ, your grouping config is still off.
In-app frame count within one frame on the top fifty issues. Allow a tolerance of one frame to absorb minor SDK-version differences; anything beyond that is a configuration miss.
Release tag presence above 99% of events for the past seven days. The release tag is what your regression detector keys off. Below 99% means a deploy path is not creating the release on the new backend.
Alert-rule parity on a synthetic incident. Trigger a test exception that should match one of your existing PagerDuty or Slack rules and confirm both backends fire the alert with the same payload. This catches alert-rule drift that none of the field-level diffs would find.

Pass all four and the cutover is safe. Fail any of them and the parallel run needs another week.

Frequently asked questions

Why is matching event count not enough during a side-by-side evaluation?

Event counts confirm delivery. They say nothing about whether the same exception ended up under the same issue group, with the same stack frames flagged in-app, the same breadcrumb trail, and the same release tag. A backend can accept ten thousand envelopes a day and still produce a noticeably different issue list because grouping, frame classification, and tag propagation diverge silently. The 200 OK is acceptance, not equivalence.

Which fields should I diff first?

Fingerprint hash, in-app stack frame count, exception type and value, breadcrumb count and final breadcrumb category, release and environment tags, and the user.id tag. Those six fields cover roughly 95% of the divergence we have seen in real evaluations. Diff them on the top fifty issues by event volume — that is where drift compounds into business impact.

How do I actually pull events from both backends to compare?

Both Sentry and urgentry expose the same REST endpoint: GET /api/0/projects/{org}/{project}/events/{event_id}/. Authenticate with an organization auth token per side, request matching event windows, and pipe the JSON through a small diff script that whitelists known drift (event_id, received_at, server-ingest tags). The schema is stable enough that jq plus a structural comparator gets you 80% of the answer.

What counts as acceptable drift?

Event IDs and ingestion timestamps will always differ — those are assigned per backend. Server-ingest tags (server_name, ingest_node) will differ. SDK transport metadata (User-Agent, X-Sentry-Auth nonce) is request-level and not stored identically. Everything else — fingerprint hash, in-app frame set, breadcrumb shape, user context, release tag — should match for events captured by the same SDK instance fanning out to both backends.

What drift should block a cutover?

Four things should block: fingerprint hash divergence on the top ten issues, in-app frame count off by more than one on identical exceptions, missing release tags on more than 1% of events, and alert-rule mismatches that would page on one backend but not the other. Anything below those thresholds is acceptable drift you can document; anything above them means your urgentry deployment is grouping or routing differently than Sentry, and your runbooks will not transfer cleanly.

Sources

@m13v_ on X (June 1, 2026) — the framing on quiet 200s that motivated this guide: acceptance is not equivalence, and content-level validation is a different problem than delivery validation.
Sentry REST API: list a project's events — the project events endpoint that both Sentry and urgentry expose; the schema reference for the fields the diff script extracts.
Sentry SDK event payload reference — canonical definition of exception, breadcrumbs, user, and tags structures used in the field-by-field diff.
Sentry event grouping documentation — the grouping config versioning and in-app pattern mechanics that explain why fingerprint hash diverges between backends with different config.
OpenTelemetry Collector exporter configuration — the fanout pattern that lets you ship the same event stream to two backends without per-SDK dual-init.
urgentry compatibility matrix — source-scanned audit confirming the events API endpoint shape that makes a single diff script work against both backends.

Diff the events. Then flip the DSN.

urgentry exposes the same REST endpoint shape Sentry does, so the same diff script works against both. 218 API operations covered, single Go binary, SQLite by default. Run a seven-day parallel run, clear the four cutover gates, and the switch is one environment variable.

Install urgentry See the compatibility matrix