Guide Self-hosting ~7 min read Updated April 11, 2026

Retention strategy: how long to keep error events.

Error event retention is a storage problem, a query speed problem, and occasionally a compliance problem, all at once. Set it too high and your database grows without bound. Set it too low and you lose the evidence you need to diagnose a regression that surfaced last month. This guide puts concrete numbers on the tradeoffs and describes the two-tier pattern that keeps costs flat without losing signal.

TL;DR

20 seconds. Keep 30 days of full event detail for triage and regression work. Keep aggregate counts (events per release, daily error rates per issue) forever, because they cost almost nothing. Delete attachments first, breadcrumbs second, raw events third. Never delete issue groups. On urgentry with SQLite, set URGENTRY_EVENT_RETENTION_DAYS=30 and let the nightly cleanup job handle the rest.

60 seconds. A minimal error event with a stack trace runs 2–8 KB in SQLite. A full payload with 50 breadcrumbs and request headers runs 8–20 KB. Attachments are the outlier: one 2 MB screenshot costs more than a hundred minimal events. At 400 events per second under load, urgentry generates roughly 34 GB per day of raw payload data before any compression. At a 30-day retention window, the steady-state database size for most small teams is 1–5 GB. At 90 days it climbs to 3–15 GB. Source maps are additive, not rotating, and need their own retention setting.

Sentry’s 90-day default is a product compromise between “long enough to investigate a slow regression” and “short enough to keep SaaS storage costs manageable.” Self-hosting removes the SaaS cost pressure but adds the disk cost pressure to your VPS bill. The right number for your deployment depends on your event volume, your investigation workflow, and whether any compliance regime applies to the data you collect.

What retention actually decides

Retention is a number, but it governs three things at once, and they pull in different directions.

Storage cost. Every event you keep costs disk space. SQLite rows are small, but they accumulate. A service emitting 50 errors per second generates roughly 25,000 events per minute, each 5–10 KB in the database. That is 7–14 GB per day of raw growth before compression, before the cleanup job runs. At 30-day retention, the database reaches a steady state. Without retention, it grows until the disk fills and the process dies with an I/O error at 2am.

Query speed. SQLite and Postgres index event tables by timestamp and issue group. A table with 90 days of events at moderate volume holds tens of millions of rows. Range queries across that table for issue trend charts or search results get slower as the row count grows. Retention is a performance control, not just a storage control. Deleting old events keeps query latency predictable.

What you can investigate next quarter. When a customer files a report about a bug that started last month, you want the events from last month. When a postmortem asks what error rate looked like in the week before a deploy, you want the data. Retention sets a hard floor on investigative history. Once events age past the retention window and are deleted, they are gone. There is no replay.

These three forces explain why retention is not a trivially obvious setting. Storage cost favors a shorter window. Investigative value favors a longer one. Query speed roughly tracks storage: shorter is faster. The right answer balances all three.

The three windows you actually need

In practice, how teams use error data falls into three time windows, each with a distinct purpose.

Last 7 days: triage. This is where active error work happens. An engineer opens an issue, looks at the most recent events, reads the stack traces, checks the breadcrumbs, and either fixes the bug or marks it a known issue. Nearly all of this happens within the last 24–48 hours of events. The 7-day window is a buffer for events that arrive late, for timezone-shifted teams, and for issues that spike on weekends.

Last 30 days: regression detection. A release goes out on the first of the month. On the fifteenth, the error rate for a specific exception climbs. An engineer compares error counts across releases, spots the regression, and bisects the deploy history. This pattern requires two to four weeks of event data: recent enough to show the trend, old enough to show the baseline before the regression started.

Last quarter: trend reading. Planning meetings, postmortems, and reliability reviews ask questions about error rates over 8–12 weeks. Which services degraded? Where did error volume grow? These questions do not require individual event payloads. They require aggregate counts per issue, per release, per day. The full event payload from 70 days ago adds almost nothing to a trend chart. The aggregate count from that day does.

This breakdown suggests a practical answer: keep full event payloads for 30 days, keep aggregate data forever. The storage cost of aggregate data is trivial compared to raw events.

How much storage one error costs

The numbers here are for SQLite, urgentry’s default. Postgres numbers are similar at the row level; the difference shows up in page overhead and index size at high row counts.

A minimal event contains a stack trace, an exception type, and a message. No breadcrumbs, no request context, no user data. In SQLite, that row runs 2–4 KB including index overhead.

A typical event from a web service contains 20–50 breadcrumbs (console logs, network requests, UI interactions captured by the SDK), request headers, query parameters, and a user context object. That payload runs 8–20 KB per event in the database.

A rich event from a mobile or desktop app adds OS context, device fingerprint, GPU info, and custom tags. Exclude attachments and you are still at 15–30 KB per event.

Attachments are the outlier. The Sentry SDK supports attaching arbitrary files to events: screenshots, log files, heap dumps, minidumps. A single 2 MB screenshot costs more storage than 100 to 1,000 minimal events. A 10 MB heap dump costs more than a full day of typical web service events at low volume. Attachments need separate retention rules. Treating them like regular events is the most common cause of runaway disk usage.

Source maps are additive. Source maps are uploaded per release and stored separately from the event database. They do not rotate with events. A team shipping five JS bundles per week at 8 MB each accumulates 2 GB of source maps per year without a pruning policy. The event database can be perfectly sized while the source map directory grows without bound.

Put numbers together for a medium-traffic service: 200 events per minute, typical payloads at 12 KB each. That is 2.4 MB per minute, 3.4 GB per day, 24 GB per week. At 30-day retention with the nightly cleanup job running, the steady-state database sits near 100 GB for this service. At 90 days, it reaches 300 GB. For a $5 VPS with 25 GB of SSD, 30-day retention at this volume overflows in hours. Retention is not a background concern; it determines whether the deployment fits the hardware.

Sentry’s default 90 days and why it’s a compromise

Sentry SaaS ships with a 90-day retention default on developer and team plans. The number is not derived from a study of how teams use error data. It is a product decision that balances two pressures.

On one side: longer retention improves perceived value. A user who opens Sentry and sees three months of error history feels they have data. A user who sees seven days of error history feels the tool is limited. SaaS retention length is partly a product marketing variable.

On the other side: Sentry’s multi-tenant ClickHouse and Snuba infrastructure has real storage costs per gigabyte per tenant. The 90-day default is the point where the perceived value gain of longer retention stops justifying the storage cost increase. Higher-tier plans extend the window; enterprise plans can push it further.

For Sentry self-hosted, the SaaS cost pressure disappears but the disk cost pressure arrives on your VPS bill. Teams who clone the Sentry self-hosted default of 90 days without calculating their event volume often hit disk space issues within months. The ClickHouse data directory grows at 20–50 GB per month under moderate ingest. By month six, a team that planned for 200 GB of disk is looking at a full volume, a paused Kafka ingest queue, and a crash.

The 90-day default is fine if your event volume is low enough that the storage fits your hardware. It is a silent mistake if your volume is higher. Calculate your steady-state size before picking a number.

urgentry’s retention model today

urgentry ships with a configurable retention policy controlled by a single environment variable. The default is 90 days, matching Sentry’s developer plan default. Set it lower with:

URGENTRY_EVENT_RETENTION_DAYS=30

The cleanup job runs nightly at 02:00 UTC by default (configurable). It hard-deletes event rows older than the retention window, then runs a VACUUM on the SQLite database to return freed pages to the OS. Without the vacuum step, SQLite holds the freed pages in its internal free list; the file size does not shrink until a vacuum runs.

Issue groups are not deleted by the retention cleanup. The group record (title, first-seen, last-seen, event count, status, assignments) survives event deletion. Event counts in the group record are recalculated after each cleanup run. You retain the shape of your error history even after the individual events are gone.

Source maps have a separate retention setting. Use URGENTRY_SOURCEMAP_RETENTION_DAYS to set a different window. A 180-day or 365-day source map window is often the right choice: source maps are smaller than events in aggregate and support symbolication of events at the outer edge of the event retention window.

Attachments have a separate setting. URGENTRY_ATTACHMENT_RETENTION_DAYS defaults to 30 days regardless of the event retention window. Set it lower (14 days or 7 days) if attachment volume is a concern.

For Postgres deployments, the cleanup job issues DELETE FROM events WHERE timestamp < $cutoff followed by a targeted VACUUM events. On large tables, this can take several minutes. Schedule the cleanup job during off-peak hours and monitor the pg_stat_user_tables view for n_dead_tup accumulation between runs.

The pattern that works

The two-tier approach separates event payload retention from aggregate retention. It keeps investigative capability intact at lower storage cost than a single long retention window.

Tier 1: high-resolution events, 30 days. Full event payloads, stack traces, breadcrumbs, request context, user data. This covers all active triage and the regression detection window. After 30 days, these rows are deleted.

Tier 2: aggregate counts, indefinite. Issue groups persist forever with their event count history. urgentry maintains daily rollup counts per issue and per release. These rollups cost almost nothing: a row per issue per day, roughly 100 bytes. Even at 10,000 active issues tracked daily for five years, the rollup table is under 2 GB.

What this means in practice: you can chart error rates for any issue across any time range, forever. You can see which releases introduced which error spikes. You cannot open the individual event payload from 90 days ago. For trend reading and planning, that is the right tradeoff. The individual payload from three months ago is almost never what a postmortem or planning meeting actually needs.

Implement this in urgentry with:

URGENTRY_EVENT_RETENTION_DAYS=30
URGENTRY_SOURCEMAP_RETENTION_DAYS=180
URGENTRY_ATTACHMENT_RETENTION_DAYS=14

The aggregate rollup data is always kept. There is no setting to delete it. This is the right default: rollup data is the institutional memory of your error history.

What to delete first when you run out of space

Disk pressure arrives faster than expected. A high-cardinality error storm, a misconfigured SDK sending duplicate events, or a feature launch with unexpectedly high error rates can fill a disk before the nightly cleanup job runs. When that happens, the deletion priority matters.

First: attachments. Attachments are the largest rows by far and the lowest-value data to retain. Screenshots from 14 days ago rarely help a debug session. Delete attachments aggressively. In urgentry, run the attachment cleanup directly:

urgentry admin purge-attachments --older-than 7d

Follow with a vacuum to reclaim the pages immediately.

Second: breadcrumbs. Breadcrumbs are stored as a JSON blob inside the event row. For older events where you still want the basic signal (exception type, message, release), you can strip breadcrumbs from the payload without deleting the event. This is a partial purge rather than a full delete.

Third: raw events. Once attachments and breadcrumbs are trimmed, delete old full event rows. Lower the retention window temporarily if the disk pressure is acute:

URGENTRY_EVENT_RETENTION_DAYS=14 urgentry admin run-cleanup --now

Restore the window to 30 days after the immediate pressure is resolved.

Never delete issue groups. Issue groups are the unit of work for your engineering team: triage decisions, assignments, status changes, alerts. Deleting a group record drops the history of who saw the issue, when it was first detected, and how it was resolved. There is no justification for deleting issue groups during a disk space emergency. Delete anything else first.

The compliance angle

Error event retention intersects with at least two compliance regimes: GDPR Article 17 right to erasure and SOC 2 Type II data retention controls.

GDPR right to deletion

GDPR Article 17 gives EU data subjects the right to request deletion of their personal data. In an error tracker context, personal data in an event payload can include: user ID (if the ID maps to an identifiable person), email addresses logged in breadcrumbs, IP addresses in request context, and any custom tags or extra fields your code attaches that contain identifying information. Stack traces and exception messages are not usually personal data, but they can be if your application logs user input into error messages.

The practical implication: when you receive a right-to-deletion request for a user, you need to find and delete every event payload that contains that user’s data. urgentry supports user-level event purge via the admin API. Running it on a large event table with a user ID filter can be slow if the user_id column is not indexed. Verify the index exists before a deletion request arrives, not during one.

Shorter event retention reduces GDPR exposure. A 30-day window means no event payload survives longer than 30 days. A right-to-deletion request for a user who last appeared in your system 45 days ago requires no event deletion because those events are already gone.

SOC 2 Type II

SOC 2 Type II auditors look at data retention controls from two directions. The first is whether you retain audit-relevant data long enough: system logs, access logs, and incident records for the audit period. The second is whether you retain personal data longer than stated in your data retention policy.

Error events are usually not the primary artifact auditors examine; system access logs and change records are. But if your privacy policy states that you retain error data for 30 days and your tracker is configured for 90 days, that gap is a finding. Write your retention policy to match your configuration, or configure your system to match your policy. Either direction is fine. Auditors care about consistency.

What to document

For either compliance regime, maintain a one-paragraph data retention policy that states: what categories of data the error tracker collects (event payloads, user identifiers, IP addresses), how long each category is retained, and the deletion mechanism. Point the policy at the specific configuration variables that enforce it. Update the policy when you change the configuration.

Frequently asked questions

How long should I keep error events?

Thirty days of full event detail handles 95% of triage and regression work. Keep aggregate counts (issue totals per release, daily error rates) forever; they cost almost nothing to store. If a compliance requirement demands 12 months of retention, store the aggregate data, not the raw payloads, and document the distinction in your retention policy.

Does deleting events also delete the issue group?

No, and this distinction matters. Issue groups are metadata records: title, first-seen, last-seen, event count, status, assignments. They survive event deletion. When you delete events older than 30 days, the group record remains intact with accurate counts. You lose the ability to open individual events from that period, not the ability to see that the error existed and how often it fired.

How much disk does one error event use?

A minimal event with a stack trace and no attachments runs 2–8 KB in SQLite. Add a full breadcrumb trail (50 entries) and the row grows to 8–20 KB. Attachments are the outlier: a single 2 MB screenshot costs more than 100 minimal events. Control attachment retention separately and set a shorter window than for raw events.

What counts as personal data in an error payload?

Under GDPR Art. 4, personal data is any information that identifies or can identify a natural person. In a typical error payload that means: user ID if it maps to a person, email addresses in breadcrumbs, IP addresses in request context, and any custom tags your code attaches. Stack traces and exception messages do not usually contain personal data, but they can if your code logs user input into exception messages. Scrub identifying data at the SDK level with beforeSend if you want to reduce exposure without shortening retention.

What is urgentry’s default retention window?

urgentry ships with a 90-day default, matching Sentry’s developer plan. Lower it to 30 days with URGENTRY_EVENT_RETENTION_DAYS=30. The cleanup job runs nightly and hard-deletes rows older than the window, followed by a SQLite VACUUM to return freed pages to the filesystem. Source maps use a separate variable and do not rotate with events.

Sources

  1. urgentry compatibility matrix — the 218/218 Sentry REST API operation coverage, SDK compatibility, and feature surface detail.
  2. FSL-1.1-Apache-2.0 license — the Functional Source License under which urgentry is released, converting to Apache 2.0 after two years.
  3. Sentry data management and retention documentation — Sentry’s published retention tiers, plan-level windows, and data deletion behavior.
  4. GDPR Article 17 — Right to erasure (‘right to be forgotten’) — the statutory basis for user deletion requests and the obligations that apply to data processors.
  5. AICPA SOC 2 Trust Services Criteria — the availability and confidentiality criteria that govern data retention controls in a SOC 2 Type II audit.
  6. SQLite PRAGMA auto_vacuum and VACUUM — how SQLite frees pages after row deletion and when a manual VACUUM is required to return space to the OS.

Set your retention window and let urgentry handle the rest.

urgentry ships with a configurable nightly cleanup job, separate retention knobs for events, source maps, and attachments, and a SQLite default that fits on a $5 VPS. One environment variable sets the window. The binary handles the rest.