Offloading the filestore: source maps, dSYMs, and minidumps on object storage
The volume that fills first on a self-hosted error tracker is rarely the events database. It is the filestore, where release artifacts and minidumps stack up at hundreds of megabytes per build. The local filesystem is the wrong place for them. S3-compatible object storage with a bucket lifecycle policy is the right one.
20 seconds. Debug artifacts (source maps, dSYM bundles, ProGuard mappings, minidumps) live in the filestore. By default the filestore is a local directory, which fills the disk and migrates poorly. Switch the backend to s3 with an endpoint_url and the files live in MinIO, Cloudflare R2, Backblaze B2, or Wasabi instead. The bucket's lifecycle policy handles retention.
60 seconds. On a busy iOS, Android, or game build pipeline, the filestore directory adds 100–500 MB per release. Sentry's own cleanup command does not delete external storage and the local files it does delete leave fragmentation behind. The cleaner path is filestore.backend = s3 with endpoint_url pointed at a co-located MinIO or an external object store. The bucket's lifecycle policy is then the retention mechanism. Three gotchas: cleanup does not touch external storage (GitHub issue self-hosted/#2701 has been open for over a year), egress can dwarf storage if the symbolicator pulls files on every event, and cross-AZ S3 fetches show up as symbolication latency on cold caches.
This guide covers what the filestore actually holds, the three backend options, the worked S3-compatible config, the lifecycle policy that replaces sentry cleanup, three operational gotchas, and where urgentry's single-binary model differs.
Where the disk actually goes on a self-hosted tracker
The volume labelled "events database" is not the one that fills first. On a Sentry, GlitchTip, or Bugsink self-host under real production load, the order is:
/data/files(the filestore) — source maps, dSYMs, ProGuard mappings, minidumps, attachments- The events database (Postgres or SQLite)
- The Snuba/ClickHouse store on Sentry only — events, sessions, transactions
- Kafka log segments on Sentry only — queues
The filestore wins by an order of magnitude on any team that ships mobile or game builds, because each release uploads symbol bundles measured in tens to hundreds of megabytes. A small iOS app's unstripped dSYMs run 30–150 MB. A Unity build with IL2CPP symbols sits at 200 MB to 1 GB per platform per release. An Unreal cooked build with debug info attached can clear 2 GB per platform. Source maps for a modern Next.js or Vite frontend average 5–20 MB per build, which sounds small until you remember a team shipping ten times a day produces 50–200 MB of source-map output daily, retained until you delete it. After eight weeks, the filestore is the largest thing on the disk by a wide margin.
The Sentry support team's own answer to "how do I free up space on my self-host" points at /data/files/ first. The discussion thread calls it "typically the largest amount of storage data." Anyone who runs a self-hosted tracker for more than a few months ends up here.
What the filestore actually holds
Four classes of artifact, each with a different growth shape.
Source maps. Minified JavaScript bundles ship to production; the map is the index back to original source. The SDK needs the map to symbolicate stack traces, which is the difference between a readable line of code and a wall of vendor.abc123.js:1:48172. Source maps are bound to releases. They run 1–30 MB each for typical Vite, Webpack, or Rollup builds, and a frontend team on continuous deployment produces hundreds per week.
dSYM bundles (Apple). Debug symbols for iOS, macOS, watchOS, and tvOS. Apple emits one dSYM per binary — the app, every framework, every dependency. For an app of reasonable size, the bundle is dozens of files at 30–150 MB. The native symbolicator needs all of them to resolve crashes back to function names and line numbers. Skip the upload and every Apple crash is a wall of hexadecimal addresses.
ProGuard mappings (Android) and PDB files (Windows/Unity). Android R8 shrinking and obfuscation produces a mapping file per build. PDB files do the same for Windows-native binaries and Unity's IL2CPP output. Mapping files are smaller than dSYMs (often 5–20 MB) but they pile up one per build, and you keep many builds active in parallel across release branches.
Minidumps and attachments. When a native crash happens, the SDK captures a minidump — a partial process snapshot — and posts it as an event attachment. Minidumps run 200 KB to 2 MB each. They are write-once and read rarely, only at processing time. Other attachments — screenshots, view hierarchies, user-uploaded log files — have the same shape and the same access pattern.
The common shape across all four: bursty writes at release time, rare reads only during symbolication, and an indefinite retention policy by default. That is exactly the access pattern object storage is built for.
Three backend options
The filestore is pluggable. The three live options are:
1. filesystem (the default). Writes go to a local directory at /data/files or /tmp/sentry-files, depending on the install path. Simple and cheap. The failure mode is that the directory fills and ingest blocks until you mount a larger volume. The directory does not migrate easily because the application maintains internal references to the file paths.
2. s3 against AWS. Writes go to an S3 bucket. Storage is effectively unlimited. The failure mode is AWS egress costs and IAM scope creep. If the symbolicator runs in a different region, every dSYM lookup pulls bytes across a region boundary at AWS rates.
3. s3 against an S3-compatible endpoint. Same code path, different host. MinIO on a separate volume, Cloudflare R2, Backblaze B2, Wasabi, or your own Ceph cluster all work because all of them speak the S3 wire protocol. This is the option most self-hosters land on after the local filesystem fills and the AWS bill arrives.
The configuration shape between options 2 and 3 is identical. The only difference is one URL.
The S3-compatible config
The filestore block in sentry.conf.py for an S3-compatible backend:
SENTRY_OPTIONS["filestore.backend"] = "s3"
SENTRY_OPTIONS["filestore.options"] = {
"access_key": os.environ["FILESTORE_S3_ACCESS_KEY"],
"secret_key": os.environ["FILESTORE_S3_SECRET_KEY"],
"bucket_name": "sentry-filestore-prod",
"endpoint_url": "https://minio.example.com",
"region_name": "us-east-1",
"addressing_style": "path",
}
The endpoint_url is the parameter that turns "AWS S3" into "any S3-compatible store." For Cloudflare R2, it points at https://<accountid>.r2.cloudflarestorage.com. For Backblaze B2, https://s3.us-west-002.backblazeb2.com. For Wasabi, https://s3.us-east-1.wasabisys.com. For MinIO, whatever address you bound the MinIO server to.
The addressing_style: "path" setting matters for non-AWS endpoints. AWS S3 defaults to virtual-hosted style (<bucket>.s3.amazonaws.com), which requires DNS to resolve every bucket name. Self-hosted MinIO and several other compatibles do not support virtual-hosted style by default. Path-style (<endpoint>/<bucket>) avoids the DNS dependency and is the safer choice. Setting it wrong is the most common cause of a "bucket not found" error against a bucket you can see in the MinIO console.
After a restart, fresh writes go to the bucket. Files already on the local disk stay where they are. Sentry does not migrate them. You either copy them up manually with aws s3 cp --endpoint-url=... against your endpoint, or you accept that the old artifacts age out as their releases age out. For most teams, the second option is enough.
The cleanup gotcha that motivates a lifecycle policy
Sentry's self-hosted documentation tells you to run sentry cleanup --days N periodically. The cleanup command iterates the events database, the issue store, and a few internal queues. It does not iterate the filestore. The developer documentation for external storage says it plainly: when files are stored on external storage, "the cleanup command won't delete files." The path forward is the storage provider's own retention mechanism.
For an S3-compatible bucket, that mechanism is a lifecycle policy. A reasonable starting policy for a filestore bucket:
{
"Rules": [
{
"ID": "expire-release-artifacts-after-90-days",
"Status": "Enabled",
"Filter": {"Prefix": "release-artifacts/"},
"Expiration": {"Days": 90}
},
{
"ID": "expire-attachments-after-30-days",
"Status": "Enabled",
"Filter": {"Prefix": "attachments/"},
"Expiration": {"Days": 30}
}
]
}
Three things to note. The prefix structure matches how Sentry organizes files in the bucket: release artifacts (source maps, dSYMs) under release-artifacts/, attachments and minidumps under attachments/. The retention windows differ because the access patterns differ: source maps and dSYMs are needed any time you process a stack trace against a release, which can lag the release by weeks. Minidumps are needed only during the triage window. And the bucket does the deletion; the tracker does not have to know.
A 30-day events database retention paired with a 90-day filestore lifecycle policy together produce a coherent story. Events that reference a deleted source map will fail to symbolicate, but those events were already going to age out at the events layer first. The mismatched windows are a feature, not a bug. GitHub issue self-hosted/#2701 has been open for over a year asking for a built-in debug-files retention policy; the bucket lifecycle approach is the working alternative until that ships.
Three operational gotchas worth knowing
Egress costs are usually higher than storage costs. A symbolicator that pulls a dSYM bundle from R2 or B2 for every native crash, with a cold cache, can move tens of gigabytes a day. R2 has zero egress. B2 charges $0.01/GB after a generous free tier. AWS S3 charges $0.09/GB for public-internet egress and $0.01–$0.02/GB cross-AZ. Put the bucket on AWS and the symbolicator on Hetzner, and you pay public-internet rates per fetch. Co-locate them, or pick a zero-egress backend up front.
Cross-AZ S3 fetches show up as symbolication latency. Round-trip to a same-region bucket is 20–50 ms cold and single-digit ms warm. Cross-region is 80–300 ms cold. Symbolicator caches mitigate this in the steady state, but the cold-cache case — first event on a new release, or a fresh worker pod that hasn't seen the bundle yet — is real and observable. If the dashboard says symbolication slowed down after the bucket move, this is the first place to look.
The upload path is not the read path. sentry-cli debug-files upload writes to the filestore through Sentry's web API. The symbolicator reads directly from the configured backend. A common misconfiguration is upload success against the bucket but reads against an old local path, because the symbolicator container was not restarted with the new filestore.options. Symptom: uploads succeed, symbolication still fails with "debug file not found." Fix: restart the symbolicator with the same filestore.backend config the web container has, and verify with sentry config get filestore.backend inside both containers.
Where urgentry sits today
urgentry takes the single-binary path that makes the filestore choice simpler. There is one volume, one Go binary, and SQLite at the database layer by default. Debug artifacts go to a directory next to the database file. The trade is fewer moving parts at the cost of less granular tuning.
The filestore lives on the same disk as the events database and the same retention logic applies, but at the OS layer rather than at the bucket layer. A periodic prune by modification time is the equivalent of a bucket lifecycle policy. urgentry ships this as a built-in command rather than relying on the operator to set it up. For most self-hosters whose monthly artifact growth is below 10 GB, that is enough; sizing the disk is the cheaper operational move and the failure modes from the Sentry world — cleanup not touching external storage, symbolicator pointed at the wrong path, cross-region egress — do not apply.
For teams whose artifact growth exceeds that, an S3-compatible backend is on the urgentry roadmap as a P2 item. Today, the bridge pattern is a reverse proxy that mirrors uploads to a bucket while the tracker reads from local disk. That preserves the single-binary model for reads, which is the latency-sensitive path, while letting the bucket absorb the volume.
Five mistakes to avoid
1. Mounting the filestore on the same volume as the events database. When the filestore fills, the database loses its disk too, and ingest blocks. Use a separate volume so a runaway source-map upload does not take ingest down.
2. Backing the filestore with tmpfs. Several blog posts recommend tmpfs for "fast" filestore access. tmpfs is volatile. A reboot is a full debug-file wipe. Symbolication breaks for every active release until you re-upload.
3. Skipping the bucket lifecycle policy. Without one, the bucket grows monotonically. AWS bills you for the storage. R2 has zero egress but still charges for storage. Backblaze charges for both. You discover this when the bill arrives, not before.
4. Setting addressing_style to virtual against MinIO. MinIO supports virtual-hosted style only with additional DNS config. Path-style works out of the box. "Bucket not found" against a bucket you can see in the console is almost always this.
5. Forgetting the symbolicator restart after the backend change. The web container picks up the new config because that is the one you deployed. The symbolicator container often has its own deployment lifecycle. Update only one and uploads go to the bucket while reads still hit the old path.
Frequently asked questions
What is the filestore on a self-hosted error tracker, and what does it hold?
The filestore is the pluggable storage layer that holds release artifacts and event attachments: JavaScript source maps, Apple dSYM bundles, Android ProGuard mappings, Windows PDB files, native minidumps, and user-uploaded attachments. By default it lives on the local filesystem under /data/files. It is the directory that fills the disk first on any team shipping mobile or game builds.
Why does sentry cleanup not delete debug files when they are stored in S3?
The cleanup command iterates the events database and internal queues. It does not iterate external storage. Sentry's self-hosted documentation explicitly says cleanup won't delete files when an external backend is configured. The path forward is the bucket's own lifecycle policy: expiration rules on the prefix where the filestore writes.
Should I use MinIO, R2, Backblaze B2, or AWS S3 for the filestore?
All four accept the same S3 API. MinIO co-located with the tracker is the cheapest and lowest-latency option for teams already running their own infra. Cloudflare R2 wins on egress (zero) when the symbolicator and bucket are in different networks. Backblaze B2 is cheapest at rest. AWS S3 makes sense only when the rest of the stack is on AWS, because cross-AZ and public-internet egress will otherwise dominate the bill.
What's a reasonable retention window for source maps and minidumps?
Source maps and dSYMs need to outlive the issues that reference them. A 90-day expiration on release-artifacts/ matches Sentry's default event retention. Minidumps and attachments are read only during triage; a 30-day expiration on attachments/ is enough and substantially cheaper. Mismatched windows are fine because the artifacts have different access patterns.
Does urgentry support an S3-compatible filestore backend?
urgentry stores debug artifacts next to the events database on the same volume today, which is simpler to operate and fits the single-binary model. An S3-compatible backend is on the roadmap as a P2 item. For teams whose monthly artifact growth is below 10 GB, sizing the disk is the cheaper move. For teams beyond that, a reverse proxy that mirrors uploads to a bucket is the working bridge until the native backend ships.
Sources
- Sentry self-hosted: external storage — the developer-docs section that describes the S3 backend, names the cleanup exclusion, and points operators at bucket-level retention.
- Sentry server filestore configuration — the canonical reference for
filestore.backendandfilestore.options, including theendpoint_urlparameter that enables S3-compatible backends. - Sentry developer docs: file storage service — describes the abstraction layer, what the filestore stores, and the relationship between the web container and the symbolicator on reads.
- getsentry/self-hosted #2701 — the open feature request for a built-in debug-files retention policy on self-hosted Sentry; the bucket lifecycle policy is the working alternative.
- getsentry/self-hosted #3621 — the long-running issue tracking S3-compatible custom repositories for debug information files.
- getsentry/support discussion #109 — the support-team thread that names
/data/files/as the typical largest disk consumer on a self-hosted install. - Sentry forum: S3-compatible object storage — community thread on pointing the S3 backend at MinIO and similar endpoints, with the addressing-style gotcha discussed by operators.
- Sentry iOS debug-files documentation — reference for dSYM bundle sizes and the upload path that lands in the filestore.
One binary. One disk. One config file.
urgentry handles debug artifacts in the same single-binary process that handles ingest. No separate filestore service, no symbolicator container to forget to restart, no bucket policy to write. SQLite by default, Postgres optional, and a built-in prune for old artifacts when the disk gets tight.