PII scrubbing for error events: edge, ingest, and storage
Error events are one of the easiest places to leak personal data. Three places to catch it before it becomes a compliance problem: at the edge with Sentry Relay, at ingest with the OpenTelemetry Collector, or at storage with the backend’s own rules. Each one has a different blind spot.
20 seconds. An error event quietly captures everything the SDK saw at the moment of failure: the request URL, the headers, the breadcrumb trail, the local variables. A surprising amount of that is PII. You can strip it in three places, and only one of them keeps the unredacted value out of your storage footprint.
60 seconds. Edge scrubbing means the SDK ships the event to a proxy you control — Sentry Relay for native envelopes, the OpenTelemetry Collector for OTLP — and the proxy rewrites the payload before forwarding to the backend. The unredacted value never reaches the error tracker. Ingest-time scrubbing happens inside the backend itself, after the event lands but before it goes to disk; faster to deploy, but the in-memory payload still contained the raw value for a few milliseconds and any debug log of that ingest path is a problem. Storage-level rules redact what the UI shows but do not change what is on disk. For GDPR teams, only edge scrubbing produces a defensible answer.
This guide covers where PII enters an error event in the first place, the three scrubbing layers and where each one fits, a worked Sentry Relay and OTel Collector config, the four PII patterns that survive the first pass, and where urgentry’s built-in scrub rules pick up the rest.
Where PII sneaks into an error event
Most teams expect to see PII in custom user context they set explicitly. That part is easy to govern, because someone wrote the code that sets it. The volume problem is everything the SDK captures automatically without anyone asking.
The high-volume sources, in rough order of how often they surprise people:
- Request URLs with email or token query parameters. An unsubscribe link with
?email=alice@example.com&token=...in the path is now in every breadcrumb that touched the request. - Form POST bodies. If the SDK captures request bodies on 4xx and 5xx responses, every failed login attempt now has a password in the event.
- Exception messages that interpolate user input.
ValidationError: email "alice@example.com" already in useis a verbatim PII string in the issue title. - Authorization headers. A misconfigured integration sometimes captures the full
Authorization: Bearer ...header. The SDK defaults try to catch this, and they miss when the header has a non-standard name. - Local variables in stack frames. If you enable
send_default_piior its equivalent and capture locals, every variable in scope when the exception fired ships with the event.
The pattern is the same in every case: the SDK is doing what it was told to do, and the developer who set up the SDK three releases ago did not predict the variable name cardholder_email would end up in scope at the moment a payment processor times out. Default-safe is a property of the ingest path, not the SDK config.
The three scrubbing layers
Where you do the scrubbing changes what you are protecting against. The trade-off is between deployment cost and what the answer looks like when someone files a data-subject-access request.
- Edge. A proxy in your network sees the event, rewrites it, and forwards a redacted copy. The error backend never sees the raw value. Costs you one extra hop and an extra service to operate. Sentry Relay and the OpenTelemetry Collector both live here.
- Ingest. The backend itself runs scrub rules on the way in, before the event lands in the events table. Cheaper to operate, because there is no second service. The raw value still existed in the backend process memory and any debug log.
- Storage. The backend stores the raw event and the UI applies redaction rules at read time, or a background job rewrites stored rows. Easiest to bolt on. Pretty much useless for compliance, because the row on disk is unchanged and so are the backups.
A team with one product and one region can get away with ingest scrubbing for a while. A team with regulated customers, an EU footprint, or an active legal team will end up at the edge eventually. Picking edge from the start saves the migration.
Edge scrubbing with Sentry Relay
Sentry Relay is an open-source Rust proxy that speaks the Sentry envelope format. It was built so enterprise customers could run a scrubbing layer inside their VPC before events leave for Sentry SaaS. The same binary works in front of any Sentry-compatible backend, including a self-hosted urgentry.
A minimal Relay config that strips email addresses and the password field anywhere they appear:
relay:
upstream: "https://errors.yourdomain.com/"
host: 0.0.0.0
port: 3000
processing:
enabled: true
pii_config:
rules:
strip-email:
type: pattern
pattern: '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
redaction: { method: replace, text: '[email]' }
strip-password:
type: redact_pair
keyPattern: '(?i)password|token|secret'
applications:
'$string': [strip-email]
'$object.**': [strip-password]
Point your SDK DSN at the Relay host instead of the urgentry host, and Relay forwards the scrubbed envelope onward. The SDK does not need to know Relay is there. The application code does not change.
The two practical gotchas. First, Relay needs the same TLS cert story as your backend — if Relay is internal-only and forwards over the public internet to urgentry, the trust chain is whatever you set up. Second, Relay’s pii_config applies the rules in the same engine Sentry uses in cloud, so the rule syntax is portable: a config that works against Sentry SaaS works against urgentry without changes.
Ingest-time scrubbing in the OpenTelemetry Collector
For teams sending OTLP instead of Sentry envelopes, the OTel Collector is the equivalent layer. Two processors do the work: attributes handles known keys, redaction handles unknown ones by pattern.
A Collector pipeline that strips auth headers and any attribute matching an email pattern:
processors:
attributes/strip-auth:
actions:
- key: http.request.header.authorization
action: delete
- key: http.request.header.cookie
action: delete
redaction/pii:
allow_all_keys: true
blocked_values:
- '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
- '\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b'
summary: debug
service:
pipelines:
traces:
receivers: [otlp]
processors: [attributes/strip-auth, redaction/pii, batch]
exporters: [otlphttp/urgentry]
The redaction processor is the one teams underuse. The default config blocks nothing. You have to enumerate the patterns you care about, and the regex engine is RE2, which means no lookahead. For most PII shapes that is fine. For the awkward shapes — say, an internal customer-ID format that resembles a normal alphanumeric token — you end up with a custom processor or a transform script.
One Collector instance can sit in front of multiple urgentry deployments. The same pipeline that scrubs OTLP traces also scrubs OTLP logs, which matters because logs are where unstructured user input usually shows up.
Storage-level rules and their limits
Every Sentry-compatible backend, urgentry included, ships some flavor of data-scrubbing rules that run at ingest. urgentry’s default rule set handles the common cases: credit card numbers, US Social Security numbers, common auth header names, anything that looks like a JWT. You can extend the rules per-project.
The honest framing of what storage-level rules buy you. They keep raw values out of the UI, out of search indexes, and out of the alert messages that go to Slack. That covers the operational risk where an on-call engineer scrolling through an issue sees something they should not see. It does not cover the audit risk where a regulator asks where the raw value was stored.
If your only requirement is reducing accidental exposure to your own team, storage scrubbing is enough and it is cheap. If a contract or a regulator is involved, you need the value to be gone before it arrives. That is what the edge layer is for.
Data residency: keeping events in-region
Scrubbing solves the question of what is in the event. Data residency solves the question of where it sits. They are different problems with overlapping tools.
The cleanest pattern is one urgentry deployment per region, fronted by a routing layer that picks the destination by the originating account or service. The DSN per service points at the regional ingest host. Events from EU customers go to the EU deployment and never traverse a US network path. The proof is the network diagram: the data plane is regional, full stop.
The harder pattern is one deployment with attribute-based routing inside the Collector. The Collector’s routing processor can send events to different exporters based on a resource attribute, which means one cluster can handle multiple regions if you trust the attribute. Operationally simpler, audit-wise more work, because every change to the schema is a change to your residency claims.
Teams whose customer base spans the EU and the US usually start with the second pattern and migrate to the first when the first EU customer asks for an architecture diagram.
Four PII patterns that survive the first pass
The default scrub rules across every tool — Sentry Relay, the Collector, urgentry — catch the obvious shapes. The interesting failures are the shapes that look almost like the obvious ones but slip the regex.
- Email addresses with plus tags.
alice+marketing@example.commatches the default email regex.alice+test@example.com.with a trailing period does not, because the regex anchors on a word boundary. Real email addresses in logs sometimes have trailing punctuation. - Phone numbers in unfamiliar formats. US-format scrubbers match
(555) 123-4567and miss+44 20 7946 0958. The fix is per-region patterns, not one global regex. - Internal IDs that double as customer identifiers. A
customer_idfield looks like a harmless integer until someone realizes it is the foreign key your support team uses to find people. The default rules will never catch this; you have to add it. - Free-text in exception messages.
raise ValueError(f"Invalid input: {form_data}")ships the entire form payload as a string. No regex catches everything because the shape is whatever the user typed. The fix is upstream: stop interpolating user input into exception messages.
The first three are config problems. The fourth is a code review problem, and it is the one that produces the worst incidents.
Where urgentry sits
urgentry handles the storage layer with the same scrub rule engine Sentry uses, which means existing data-scrubbing rules port across without rewriting. The default project includes the common PII patterns enabled.
For edge scrubbing, urgentry is a drop-in for Sentry SaaS as the upstream of a Sentry Relay deployment. Set Relay’s upstream to your urgentry host and the Relay config you already have keeps working. For OTLP traffic, the standard OTel Collector pipeline points at urgentry’s OTLP receiver and the redaction processor runs in front the same way it would in front of any OTLP backend.
The recommendation that holds across every tool we have looked at: put the scrubbing as close to the source as you can afford to operate, and write down the four PII patterns above as the test cases for whatever config you end up with.
Frequently asked questions
What counts as PII in an error event?
Anything that can identify a person on its own or in combination with other data: email addresses, phone numbers, IP addresses, full names, government IDs, session tokens, auth headers, and request bodies that echo form input. In practice the high-volume sources are request URLs with email query parameters, breadcrumb logs from form submissions, and exception messages that interpolate user input.
Why scrub at the edge instead of in the database?
Because once an event lands in storage, you have already created the compliance problem. Edge scrubbing means the unredacted value never leaves your VPC. Storage-level rules can mask the value in the UI, but the row on disk still contains it, your backups still contain it, and your replicas still contain it. Edge scrubbing is the only place where the answer to a data-subject-access request is a clean no.
Does Sentry Relay work with urgentry?
Yes. Sentry Relay is a generic envelope proxy. Point its upstream URL at your urgentry instance and it will forward scrubbed envelopes the same way it forwards to Sentry SaaS. The data-scrubbing config is independent of the destination.
Can the OpenTelemetry Collector replace Sentry Relay?
For OTLP traffic, yes — the attributes processor and redaction processor handle the same job. For native Sentry SDK envelopes you still need Relay, because the Collector does not parse the Sentry envelope format. Teams running both wire formats in production usually run both proxies side by side.
Do I need a region-specific deployment for GDPR?
Only if you want a clean answer about where the events sit. GDPR does not require EU-only storage, but it does require that you can answer the question. The cleanest pattern is one urgentry deployment per region with a routing layer in front. The harder pattern is one global deployment with attribute-based filtering, which works but creates audit work every time the schema changes.
Sources
- Sentry Relay documentation — configuration reference for the Relay PII processor, including the
pii_configrule grammar used in the worked example. - OpenTelemetry Collector redaction processor — source and config docs for the contrib processor that does pattern-based redaction inside the Collector pipeline.
- OpenTelemetry attributes processor — canonical reference for delete, hash, and update actions on known attribute keys.
- OneUptime: scrubbing PII from OpenTelemetry logs, traces, metrics — February 2026 walk-through of Collector-side scrubbing patterns referenced in the ingest section.
- Sentry GDPR best practices — the vendor’s guidance on data minimization, scrubbing, and the data-subject-access workflow, useful as a reference for what a defensible setup looks like.
- SigNoz PII scrubbing guide — an alternate take on Collector-level redaction for teams already running SigNoz alongside an error tracker.
Self-hosted error tracking that ships with sane scrub rules.
urgentry runs the same data-scrubbing engine as Sentry, accepts Relay-scrubbed envelopes upstream, and reads OTLP from the same Collector pipeline you already maintain. One Go binary. SQLite by default. The DSN is the only diff.