Guide Self-hosting & ops ~10 min read Updated May 30, 2026

Tag cardinality in self-hosted error tracking: what blows up your index

Every event a Sentry-compatible SDK ships carries tags. A handful of them are safe forever. A handful more quietly grow with your user base until the search index outweighs the events themselves and queries start timing out at 3 a.m.

TL;DR

20 seconds. Tags are the indexed fields on an error event. Their cost scales with how many distinct values you give them. Low-cardinality tags (environment, release, transaction) are cheap forever. High-cardinality fields (user.id, request_id, session_id) belong on the user object or in context, not in the tag index.

60 seconds. A tag index works by mapping each unique value to the set of events that have it. The map grows with the number of distinct values. environment has three values, ever. user.id has one new value per signup. After eight months of production, the second case can produce a tag table that is bigger than the events table — and a search UI that times out before it returns the first page of results. The fix is not exotic: use setUser instead of setTag('user.id', ...), attach request and trace IDs to context, and treat tag space as a small fixed budget rather than a place to dump useful identifiers. The cost difference shows up in index size, write lock contention on SQLite, autovacuum lag on Postgres, and alert evaluation latency in the rule engine. None of those failure modes prints a clean error message that says "this tag has too many values." You see slow queries first, then late alerts, then a dropped index at 3 a.m.

This guide covers what a tag actually is in Sentry's data model, the rule of thumb that keeps the index small, the concrete cost of getting it wrong on SQLite and Postgres, how to move high-cardinality data off the index without losing searchability, and the three signals that show up before search dies.

What a tag is, and what it is not

A tag in a Sentry-compatible event is a typed key/value pair the SDK attaches at capture time, intended for filtering and grouping in the UI. The backend stores tags in a searchable index. That index is the surface the issue list, the search bar, and the alert rule engine all read from.

Tags share an event with three other places you can put data, and the distinction is the whole point of this guide:

  • Tags. Indexed. Filterable from the search bar with tag:value syntax. Cost scales with cardinality.
  • User. The dedicated user field, set with Sentry.setUser({id, email, ...}). Stored on the event, displayed in the event detail header, searchable through the events table but not indexed by default.
  • Contexts. A JSONB blob for structured request, response, OS, runtime, and custom data. Searchable, not indexed.
  • Breadcrumbs. The event's last-N log entries before the exception. Stored as a JSON array on the event. Never indexed.

Code that calls Sentry.setTag('user_id', userId) is fundamentally different from code that calls Sentry.setUser({id: userId}). The first writes to the tag index. The second writes to the user field. They look interchangeable in development; they diverge by gigabytes in production.

Why cardinality is the cost

A tag index is a map from each unique tag value to the set of event IDs that carry that value. The size of the map grows with the number of distinct values, not the number of events. Ten million events with three environments is a tiny index. Ten million events with ten million user IDs is a ten-million-entry map plus the event-ID lists that hang off each entry.

The cost surfaces in three places:

  • Disk. Index pages on disk for every distinct value, plus the posting list that links values to events.
  • Memory. Hot portions of the index get cached in page cache. A wider index pushes other useful pages out.
  • Query planner. Selectivity statistics drift as cardinality grows. The planner starts choosing sequential scans when it should choose index scans, or vice versa.

The thing that is easy to miss: storage is the smallest of the three. Disk is cheap. What is not cheap is the moment the planner gives up on the index and the issue search page starts taking 30 seconds.

The five-tag rule of thumb

A short list of tags that scale forever, and a short list that do not:

Safe to tag:

  • environment — bounded by your environment count, typically three to five.
  • release — bounded by deploy frequency; you archive old releases.
  • transaction — bounded by route count.
  • server_name — bounded by host count.
  • runtime, browser.name, os.name — bounded by what your users actually run.

Keep out of the tag index:

  • user.id, user.email — grows with user count.
  • request_id, trace_id, correlation_id — one new value per request.
  • session_id — one new value per session.
  • Raw URLs with query strings — query parameters explode the value space.
  • Timestamps as tag values — every event becomes its own bucket.

The dividing line is mechanical. Ask whether the value space grows with users, requests, or events. If yes, it does not belong in tags. If no, tag away.

What unbounded tags actually cost

A few measured numbers from production self-hosted deployments:

On a Sentry self-hosted instance running for eight months, an operator on the public issue tracker reported a 240 GB ClickHouse tag table — larger than the events table itself — caused by a service that called setTag('user_id', ...) on every event. Snuba queries that previously returned in roughly 200 ms began timing out at the 30 s SLA boundary. Issue search became unusable before alerting was visibly broken, because alert rules query the same index path.

On urgentry running SQLite on a single $5 VPS, the issue table uses a covering index over (project_id, environment, release, fingerprint). Adding an unbounded tag like user_id to the index roughly doubles index size per million events stored. With 10 GB of available disk and a 5 events/second steady load, the math runs out in weeks, not months.

The failure mode in both cases is not "the database died." It is a sequence: issue search slows down first; alert rule evaluation lags next; the on-call engineer searches for "why are alerts late" and finds the rule engine waiting on tag-index queries that used to take 50 ms and now take 5 seconds. The remediation, in the moment, is almost always to drop the offending index — which removes the slow query and removes search on the field at the same time. Then the tag stops being collected and the lesson sticks.

SQLite versus Postgres: same B-tree, different failure shape

Both backends use B-tree indexes. The interesting differences are operational.

SQLite serializes writes through a single writer lock (or a WAL write lock in WAL mode). Each new tag value extends the index, and the time spent extending the index is time the writer holds the lock. At low cardinality, the extension is rare and the lock window is small. At high cardinality, every other event triggers an index extension, and the lock window stretches enough that incoming SDK requests start queuing in the ingest worker. The symptom on the SDK side is a slow climb in client-side queue depth and eventually backpressure: the SDK drops events because its in-process buffer is full.

Postgres uses MVCC, so a slow write does not block reads. The cost shows up in autovacuum. High-cardinality columns generate the same dead-tuple rate as low-cardinality ones, but the index pages those dead tuples touch are spread across far more pages. Autovacuum has to read and rewrite more pages to keep up. Once dead-tuple ratio crosses roughly 20 percent on the tag table, the planner's selectivity statistics start lying, and the planner picks the wrong access path. The symptom is one specific query — usually the issue search filtered by environment — getting an order of magnitude slower while the rest of the system looks healthy.

Neither backend prints "this tag has too many values." Both surface the problem as slow queries on the index path. The diagnostic shape is the same on both: look at index size growth relative to event count, look at the distinct-value counts per tag, look at planner choices on the issue search query.

Moving data off the tag index without losing it

The mistake worth avoiding is treating tag-cardinality discipline as a reason to drop useful identifiers. You can keep user.id on every event. You just do not put it in tags.

The Sentry event model gives you three other places to attach data, and all of them remain queryable:

// Don't: writes to the indexed tag map
Sentry.setTag("user.id", userId);
Sentry.setTag("request_id", requestId);
Sentry.setTag("trace_id", traceId);

// Do: writes to user + context, both stored, neither indexed
Sentry.setUser({ id: userId, email: userEmail });
Sentry.setContext("request", { id: requestId, trace_id: traceId });

The event detail view still shows the user ID, the email, and the request and trace IDs. The search bar still finds them — the query runs against the events table directly instead of through the tag index, which is slower per query but bounded by how often you actually search. For most teams, that is a handful of searches per week per engineer. The tag index, by contrast, is hit on every issue list refresh and every alert rule evaluation.

A second pattern that works: pick exactly one high-cardinality identifier you genuinely search by, and accept its index cost as a known budget. Some teams need fast customer_id lookup because their support workflow demands it. Tag that one. Then aggressively keep the rest out.

Detecting cardinality creep before it hurts

Three signals show up before search dies, in roughly this order:

  1. Index size growing faster than event count. A query that joins pg_indexes (or the SQLite equivalent sqlite_master + dbstat) against the events table tells you the ratio of index bytes to events stored. Healthy ratios stay flat over weeks. Climbing ratios mean a tag has gone unbounded.
  2. Tag-value autocomplete in the UI starts timing out. The autocomplete query is the cheapest read against the tag index — it asks "what values exist for this key, ordered by frequency, limit 20." When that starts taking more than a second, everything else against the same index is already slower.
  3. Alert rule evaluation lag in the daemon logs. Every alert rule queries the tag index on every event. When the rule engine starts emitting "rule eval > 500 ms" lines, alerts will start firing late.

urgentry surfaces all three on /metrics as Prometheus-format counters and histograms. For Sentry self-hosted, Snuba exports the equivalent through its own metrics endpoint.

What urgentry caps by default

The defaults are deliberately small enough that misuse is loud:

  • Tag values cap at 200 characters. Above that, the value moves to context automatically and emits a tag_truncated counter.
  • Tag keys cap at 32 per event. Above that, the SDK gets a 400 response with a structured rejection reason; the event itself ingests, but the extra keys are stripped.
  • The /metrics endpoint exposes urgentry_tag_distinct_values as a gauge per tag key. If user_id shows up there at 50,000 distinct values, the gauge tells you before issue search tells you.

What urgentry does not do is automatically detect that a tag is unbounded and drop it. The reason is that the tracker cannot tell whether a key with 50,000 distinct values represents a real product axis (a per-customer SaaS that genuinely wants to filter by customer) or a leak (user_id misnamed and dumped into tags). The decision belongs to the operator. The gauge tells you which keys are growing; you choose which ones move to context.

The cost discipline that matters

The hidden line items in self-hosted observability — storage retention policy, label cardinality discipline, upgrade time, on-call coverage — are operator-side problems that no vendor docs page enforces for you. Cardinality discipline is the cheapest of the four to get right. It costs one PR to change setTag('user_id', ...) to setUser({id: ...}) across a service, and the index never has to recover from the choice you did not make.

For an error tracker that is going to sit on a single VPS for years, the difference between disciplined tags and undisciplined tags is the difference between a tracker that gets faster as you tune it and one that gets slower as you use it.

Frequently asked questions

What counts as a tag in a Sentry-compatible error tracker?

A tag is an indexed key/value pair the SDK attaches to every event for filtering, grouping, and alerting. It is not user data (that goes on the user field), not request payload (that goes in context), and not breadcrumb log. Tags are the part of an event the backend stores in a searchable index, which is why their cardinality matters.

Why is user.id a bad tag?

Cardinality of user.id grows with your user base, one new value per signup. After a few months of production traffic, the tag index for user.id can outweigh the events themselves. Use Sentry.setUser({id}) instead. The id is still stored, still visible on the event, and still searchable, but it does not pay the index cost.

Does this matter on SQLite or only on Postgres?

Both. SQLite serializes writes; a tag with high cardinality lengthens the write lock window. Postgres uses MVCC, so writes do not block reads, but high-cardinality columns generate dead tuples that autovacuum has to chase. The failure shapes differ, the underlying cost does not.

Can I still search by user.id if I move it off the tag index?

Yes. Setting it on the user object or in context keeps it queryable through the events table directly. The query is slower than a tag-index lookup, but it is bounded by how often you actually need that search, which for most teams is a few times a week, not a few times a second.

What is a safe number of distinct values per tag?

A useful rule: a single tag should hold at most a few thousand distinct values across the time window you retain events. Three values (prod, staging, dev) is forever-cheap. A few hundred releases is fine. A few thousand routes is fine. Anything growing with user count, session count, or request count belongs in context, not in a tag.

Sources

  1. Sentry searchable properties — the canonical reference for which event fields are searchable, how tag values are indexed, and the syntax the search bar uses.
  2. Sentry SDK tags documentation — SDK-level guidance on what to attach as a tag, the distinction between tags and contexts, and the per-key value-length limits.
  3. Prometheus label cardinality guidance — the most-cited reference for label cardinality discipline in the wider observability ecosystem; the principles map directly to tag-index cost in an error tracker.
  4. SQLite query planner — how SQLite decides between index and sequential scans, including the selectivity estimates that drift as cardinality grows.
  5. Postgres index maintenance — the Postgres wiki page on index bloat, autovacuum behavior, and planner statistics that go stale on high-cardinality columns.
  6. @cloud_autopsies on hidden self-hosted observability costs — the X post that surfaced "label cardinality discipline" as one of four hidden line items operators end up paying for self-hosted observability stacks.

A tracker that gets faster as you tune it.

urgentry runs Sentry-SDK ingest on a single Go binary, SQLite by default, Postgres optional. Tag-cardinality gauges on the /metrics endpoint. 218 Sentry API operations covered. Change one environment variable and events start arriving.