Guide Self-hosting ~9 min read Updated April 20, 2026

Backup, restore, and disaster recovery for error data.

Error data has a short useful life when you have it and a long painful absence when you lose it. This guide covers what to back up in an urgentry deployment, how to back it up for both SQLite and Postgres, how to restore it under pressure, and the three failure modes that bite teams who thought their backup was good.

TL;DR

20 seconds. urgentry stores three things you need to back up independently: the event database (SQLite or Postgres), source map files (filesystem or S3), and attachment blobs (filesystem or S3). Losing any one of them produces a different failure mode. For SQLite, run Litestream for continuous replication and a daily .backup snapshot. For Postgres, configure WAL archiving and daily base backups.

60 seconds. The worked setup for a single-VPS urgentry deployment: Litestream replicates the SQLite WAL to an S3-compatible bucket every few seconds, restic snapshots the source map directory daily, and a cron job verifies that both succeeded. Retention at 30 days keeps storage costs under $5 per month for most teams. The total setup time is about an hour. The pain of not doing it arrives all at once when the disk fails.

Disaster recovery is not the same as backup. A backup answers the question: can I get my data back? Disaster recovery answers: can I get my service back, and how fast? The two require different preparation. This guide covers both, and explains why the restore drill is more important than the backup config itself.

What you are actually backing up

urgentry stores data in three places. Each one has a different backup strategy and a different failure mode if it goes missing.

The event database

The event database is the core store: every event, every issue group, every release, every project configuration, every user account. For the default deployment, this is a SQLite file at /var/lib/urgentry/urgentry.db. For multi-instance deployments or teams that have switched to Postgres, it is a Postgres database.

Losing the event database loses everything. Issue history, error counts, release tracking, alert rules, team memberships, DSNs, integration credentials. Recovering from a lost event database without a backup means starting over.

Source map storage

Source maps are stored separately from the event database. By default, urgentry writes them to a filesystem directory (configurable as URGENTRY_SOURCEMAPS_DIR). In S3-backed deployments, they go to a configurable bucket prefix.

The event database records a reference to each source map: a filesystem path or an S3 object key. If you restore the event database but not the source maps, urgentry has the reference but not the file. Stack traces appear raw rather than symbolicated. This is not catastrophic, but it makes the event history significantly less useful.

Attachment blobs

Attachments sent with events (screenshots, log files, heap dumps) live on the filesystem or in S3, depending on your URGENTRY_ATTACHMENTS_STORAGE configuration. The same divergence problem applies: a restored event database that references missing blobs shows broken attachment links.

Configuration files

Your urgentry configuration (environment variables, systemd unit file, reverse proxy config, Litestream config) is not stored in the event database. Include it in a separate backup or in version control. After a host failure, having the data is not enough if you cannot reconstruct the service configuration.

What is recoverable without a backup

To be direct: if you lose the event database with no backup, you lose all event history permanently. urgentry has no way to reconstruct events from SDKs. The SDKs send and forget. There is no replay mechanism for events that were already ingested and then lost. The only recovery path is from a backup.

How much error data is worth keeping

Most teams want 30 to 90 days of high-resolution event data: full stack traces, breadcrumbs, request context, user data. Beyond 90 days, the value of individual events drops sharply. What retains value longer is aggregate data: error counts per release, issue trend lines, regression markers against deploy timestamps.

urgentry’s built-in retention policy (configurable via URGENTRY_EVENT_RETENTION_DAYS) handles the event-level purge automatically. Setting it to 90 days gives you full event detail for three months. Setting it lower, to 30 days, reduces storage growth proportionally.

The storage budget for the event database on a typical small team (50 engineers, a few production services, moderate error rate) runs 1 to 5 GB per month of raw SQLite growth before retention kicks in. At a 30-day retention window, the database stabilizes at 1 to 5 GB. At 90 days, it stabilizes at 3 to 15 GB. These are rough numbers; a single high-cardinality error storm can add several gigabytes in a day.

Source maps are additive, not rotating. Every deploy adds new source maps. Without a pruning policy, source map storage grows indefinitely. A team that ships five deploys per week with 10 MB of JS bundles per deploy accumulates roughly 2.5 GB per year of source maps. Configure a source map retention policy keyed to release age, not event retention.

The practical answer for most teams: 30-day event retention in the database, 90-day source map retention, and backup storage at 30-day retention. That combination keeps storage costs under $10 per month for the backup side on an S3-compatible provider.

SQLite backup procedure

SQLite on a single-writer host is the default urgentry shape. The backup strategy has two layers: continuous replication for minimal data loss, and periodic snapshots for point-in-time recovery.

Litestream for continuous replication

Litestream replicates SQLite WAL frames to an S3-compatible destination in near-real time. It runs as a separate process alongside urgentry and has no impact on ingest throughput. If the host disk fails, you restore from the Litestream replica with at most a few seconds of data loss.

Install Litestream and write the config file:

# Install Litestream (Debian/Ubuntu)
curl -fsSL https://github.com/benbjohnson/litestream/releases/download/v0.3.13/litestream-v0.3.13-linux-amd64.deb \
  -o litestream.deb
dpkg -i litestream.deb
# /etc/litestream.yml
dbs:
  - path: /var/lib/urgentry/urgentry.db
    replicas:
      - type: s3
        bucket: urgentry-backups
        path: db/urgentry.db
        region: auto
        endpoint: https://<accountid>.r2.cloudflarestorage.com
        access-key-id: ${LITESTREAM_ACCESS_KEY_ID}
        secret-access-key: ${LITESTREAM_SECRET_ACCESS_KEY}
        # Retain WAL snapshots for 30 days
        retention: 720h
        # Sync interval: how often WAL frames are pushed to the replica
        sync-interval: 10s

Run Litestream as a systemd service so it survives reboots and restarts alongside urgentry:

systemctl enable --now litestream
# Verify replication is active:
litestream snapshots s3://urgentry-backups/db/urgentry.db

Point-in-time snapshots with .backup

For periodic snapshots independent of Litestream, use the SQLite .backup command rather than a file copy. This matters because of WAL mode.

In WAL mode (which urgentry enables by default), the SQLite database on disk consists of the main database file (urgentry.db) and a write-ahead log file (urgentry.db-wal). A naive file copy with cp or rsync captures these two files at different instants. The main file and the WAL can be at different points in a transaction boundary, producing a corrupt or inconsistent backup. The .backup command performs an online hot backup that handles WAL correctly:

# Daily snapshot via cron: safe backup with .backup command
# Add to /etc/cron.d/urgentry-backup

0 3 * * * root sqlite3 /var/lib/urgentry/urgentry.db \
  ".backup '/var/backups/urgentry/urgentry-$(date +\%Y\%m\%d).db'" \
  && restic backup /var/backups/urgentry/ \
  && find /var/backups/urgentry/ -name "urgentry-*.db" -mtime +1 -delete

The WAL mode file copy gotcha

Many teams discover the WAL mode problem the hard way: they restore from a file copy, start urgentry, and see a corrupt or partially empty database. The reason is the gap between the main file state and the WAL state at copy time.

The rule: never use cp, rsync, or tar to back up a live SQLite database in WAL mode without first taking a .backup snapshot or using Litestream. Litestream is safe because it operates at the WAL frame level and handles the consistency boundary internally. A plain file copy is not safe.

Postgres backup procedure

Teams running urgentry against Postgres (required for multi-instance HA deployments) have more backup options and more decisions to make.

Logical backup with pg_dump

pg_dump produces a logical export of the urgentry database: SQL statements that recreate the schema and insert all data. It is the simplest backup form and the easiest to restore from.

# Daily pg_dump to compressed file
pg_dump \
  --host=localhost \
  --username=urgentry \
  --dbname=urgentry \
  --format=custom \
  --compress=9 \
  --file=/var/backups/urgentry/urgentry-$(date +%Y%m%d).pgdump

# Restore from a pg_dump backup:
pg_restore \
  --host=localhost \
  --username=urgentry \
  --dbname=urgentry_restore \
  --clean \
  --if-exists \
  /var/backups/urgentry/urgentry-20260522.pgdump

The downside of pg_dump alone is recovery granularity: you can only restore to the point of the last dump. If the dump ran at 03:00 and the disk failed at 14:30, you lose 11.5 hours of events.

Physical backup with pg_basebackup

pg_basebackup takes a physical copy of the entire Postgres data directory. Combined with WAL archiving, it enables point-in-time recovery to any moment between base backups.

# Physical base backup (run as the postgres user or with equivalent permissions)
pg_basebackup \
  --host=localhost \
  --username=replicator \
  --pgdata=/var/backups/postgres/base \
  --format=tar \
  --gzip \
  --progress \
  --checkpoint=fast

WAL archiving for point-in-time recovery

Enable WAL archiving in postgresql.conf to capture every WAL segment as it completes. Combined with a base backup, this lets you restore to any point between the base backup and the present:

# In postgresql.conf:
wal_level = replica
archive_mode = on
archive_command = 'aws s3 cp %p s3://urgentry-wal/%f'
# Or with restic:
# archive_command = 'restic backup %p --tag wal --repo s3:s3.amazonaws.com/urgentry-wal'

Managed Postgres vs self-managed

Managed Postgres providers (RDS, Cloud SQL, Neon, Supabase) handle base backups and WAL archiving automatically. On RDS, continuous backups and point-in-time recovery within the retention window come enabled by default. The decision is not whether to use them, but whether to verify that they are configured correctly and that you have tested a restore.

Self-managed Postgres requires you to configure, monitor, and test every layer of the backup stack yourself. The overhead is real. For teams already on Kubernetes with managed Postgres, the managed backup path is the lower-risk choice and the one this guide recommends.

Source map and attachment storage backup

Source maps and attachments live outside the event database. Their backup strategy depends on where urgentry is configured to store them.

Filesystem storage: rsync and restic

If urgentry writes source maps and attachments to the local filesystem, include those directories in your backup scheme. rsync handles incremental copies well; restic handles deduplication and encryption for remote destinations.

# restic backup of source maps and attachments to S3-compatible storage
export RESTIC_REPOSITORY="s3:s3.amazonaws.com/urgentry-blobs-backup"
export RESTIC_PASSWORD="your-restic-password"
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"

restic backup \
  /var/lib/urgentry/sourcemaps \
  /var/lib/urgentry/attachments \
  --tag urgentry-blobs

# Prune snapshots older than 30 days:
restic forget --keep-within 30d --prune

S3 storage: lifecycle policies

When urgentry stores blobs in S3, the bucket itself holds the source of truth. The backup strategy shifts to S3 cross-region replication and lifecycle policies.

Enable S3 versioning and configure cross-region replication on the bucket urgentry writes to. This protects against accidental deletion and single-region outages. Lifecycle policies control cost by expiring old versions:

# Example S3 lifecycle rule (JSON, apply via AWS console or CLI)
# aws s3api put-bucket-lifecycle-configuration \
#   --bucket urgentry-blobs \
#   --lifecycle-configuration file://lifecycle.json

{
  "Rules": [
    {
      "ID": "expire-old-sourcemaps",
      "Status": "Enabled",
      "Filter": { "Prefix": "sourcemaps/" },
      "Expiration": { "Days": 90 },
      "NoncurrentVersionExpiration": { "NoncurrentDays": 7 }
    },
    {
      "ID": "expire-old-attachments",
      "Status": "Enabled",
      "Filter": { "Prefix": "attachments/" },
      "Expiration": { "Days": 30 },
      "NoncurrentVersionExpiration": { "NoncurrentDays": 3 }
    }
  ]
}

When blob storage diverges from the event database

The dangerous case is a restore where the event database and the blob storage are from different points in time. If you restore a SQLite backup from 24 hours ago but leave the current source maps in place, urgentry will have references to source maps that did not exist 24 hours ago alongside missing references to source maps that were uploaded in the last 24 hours. The result is partial symbolication: some stack traces resolve correctly, others do not.

The safer approach: back up blob storage with the same retention and the same restore target as the event database. When you restore the event database to a given point in time, restore the blob storage to the same point. Restic snapshots tagged with a timestamp make this correlation possible.

A worked backup setup

This is the setup for a single-VPS urgentry deployment with SQLite, filesystem blob storage, and a 30-day retention window. Storage cost runs approximately $3 to $5 per month on Cloudflare R2 or Backblaze B2 at typical event volumes.

Components

  • Litestream for continuous SQLite WAL replication to R2 (sync every 10 seconds, 30-day WAL retention)
  • restic for daily snapshots of the source map and attachment directories to R2
  • cron to orchestrate both and alert on failure

The cron schedule

# /etc/cron.d/urgentry-backup
# Runs as root. Adjust paths to match your installation.

# Daily restic backup of source maps and attachments at 03:00 UTC
0 3 * * * root \
  RESTIC_REPOSITORY="s3:https://<accountid>.r2.cloudflarestorage.com/urgentry-backups" \
  RESTIC_PASSWORD_FILE="/etc/urgentry/restic-password" \
  AWS_ACCESS_KEY_ID="$(cat /etc/urgentry/r2-key-id)" \
  AWS_SECRET_ACCESS_KEY="$(cat /etc/urgentry/r2-secret)" \
  restic backup \
    /var/lib/urgentry/sourcemaps \
    /var/lib/urgentry/attachments \
    --tag urgentry-blobs-$(date +\%Y\%m\%d) \
    && restic forget --keep-within 30d --prune \
    || echo "urgentry restic backup FAILED $(date)" | mail -s "urgentry backup failure" ops@example.com

# Weekly restore drill: verify Litestream can restore at 04:00 UTC Sunday
0 4 * * 0 root \
  litestream restore \
    -o /tmp/urgentry-restore-check.db \
    s3://urgentry-backups/db/urgentry.db \
    && sqlite3 /tmp/urgentry-restore-check.db "SELECT count(*) FROM events LIMIT 1;" \
    && rm /tmp/urgentry-restore-check.db \
    || echo "urgentry Litestream restore check FAILED $(date)" | mail -s "urgentry restore check failure" ops@example.com

Storage cost estimate

At a 30-day retention window, a typical small team running urgentry accumulates roughly 2 to 8 GB of backed-up data across the SQLite WAL replica, source map snapshots, and attachment snapshots. On Cloudflare R2 (free egress, $0.015 per GB stored), that is $0.03 to $0.12 per month for storage plus negligible API call costs. Even at 50 GB of total backup data, the monthly cost is under $1 on R2.

Restore drill

A backup you have never restored is a hypothesis. The restore drill converts it into evidence. Run this drill at least monthly.

SQLite restore from Litestream

# Step 1: Provision a staging host (can be a $5 VPS or a local VM)
# Step 2: Install urgentry and Litestream on the staging host

# Step 3: Restore the database from the Litestream replica
litestream restore \
  -o /var/lib/urgentry/urgentry.db \
  s3://urgentry-backups/db/urgentry.db

# To restore to a specific point in time (if you know when the data diverged):
litestream restore \
  -o /var/lib/urgentry/urgentry.db \
  -timestamp "2026-05-20T14:30:00Z" \
  s3://urgentry-backups/db/urgentry.db

# Step 4: Restore source maps from the most recent restic snapshot
restic restore latest \
  --repo "s3:https://<accountid>.r2.cloudflarestorage.com/urgentry-backups" \
  --target / \
  --include /var/lib/urgentry/sourcemaps

# Step 5: Start urgentry against the restored database
URGENTRY_BASE_URL=https://staging.errors.example.com \
  urgentry serve --role=all

# Step 6: Send a test event and confirm it arrives
curl -X POST https://staging.errors.example.com/api/<project-id>/store/ \
  -H "X-Sentry-Auth: Sentry sentry_version=7,sentry_key=<your-dsn-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "event_id": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4",
    "platform": "python",
    "level": "error",
    "message": "restore drill test event"
  }'

What a successful restore looks like

After the restore and the test event, open the urgentry UI at the staging URL. You should see:

  • All existing issues from the production database, with correct event counts and timestamps.
  • The test event visible in the issues list as a new or grouped event.
  • Stack traces for historical events that had source maps uploaded: those traces should symbolicate correctly if the source map restore succeeded.
  • No errors in the urgentry log about missing database tables or migration state.

If symbolication fails for a release that should have source maps, the source map restore either missed that release or the timing divergence described earlier applies. Investigate before marking the drill as passed.

Disaster recovery vs backup

Backup answers: can I get the data back? Disaster recovery answers: can I get the service back, and in how long?

The two numbers that matter in DR planning:

  • RPO (Recovery Point Objective). How much data loss is acceptable? With Litestream syncing every 10 seconds, RPO is under one minute in practice. With daily pg_dump alone, RPO is up to 24 hours. Write down the RPO your team is willing to accept and verify your backup strategy achieves it.
  • RTO (Recovery Time Objective). How long can the error tracker be down? For a side project, RTO might be 24 hours. For a production system where error tracking is part of an on-call workflow, RTO might be 30 minutes. The RTO number shapes how much infrastructure you need: a pre-configured standby host reduces RTO significantly compared to provisioning a new VPS from scratch.

The staging instance approach is the practical middle ground between a full hot standby and nothing. Keep a staging host that mirrors production configuration, restore to it occasionally for drills, and know that it can become the production host in an emergency within the time it takes to point DNS at it. For most small urgentry deployments, this is the right shape: simple, cheap (one extra $5 VPS), and rehearsed.

For teams with formal DR requirements (SOC 2 availability controls, internal SLAs), write down the RTO and RPO, validate the backup strategy achieves the RPO, and time the restore drill to validate the RTO. That documentation is what auditors ask for.

The three failure modes nobody documents

These failures appear in postmortems but not in backup guides. All three are common. All three are preventable.

1. Silent corruption of the SQLite file from an OS crash

An OS crash or power loss while SQLite is mid-write can leave the database in a state where it opens without error but returns subtly wrong data. The WAL file may be partially written. The checksum Litestream uses for replication may have accepted a partially-written frame.

The symptom is not a crash on startup. urgentry starts, serves requests, and appears healthy. The symptom is wrong event counts, missing issues, or a query that returns zero results for a date range that should have data. By the time you notice, the corrupt frames may have replicated to your backup.

The mitigation: run PRAGMA integrity_check; against the SQLite database weekly, either as a cron job or as part of the restore drill. Litestream’s WAL-level replication does not protect against logical corruption that exists in valid WAL frames. The integrity_check pragma does.

# Add to weekly cron:
sqlite3 /var/lib/urgentry/urgentry.db "PRAGMA integrity_check;" | grep -v "^ok$" \
  && echo "SQLite integrity check PASSED" \
  || echo "SQLite integrity check FAILED" | mail -s "urgentry integrity failure" ops@example.com

2. S3 lifecycle policies deleting source maps you still need

S3 lifecycle policies are set-and-forget configurations. The team configures a 90-day expiration on source maps, ships without incident for a year, and then investigates a production regression that surfaced in an error from 95 days ago. The source maps for that release are gone. The stack trace is unreadable. The regression is harder to diagnose.

The fix is not to keep source maps forever. The fix is to set lifecycle retention to match the actual retention window you plan to use for post-incident investigations, not the retention window for recent development work. For most teams that means separate lifecycle rules: 30 days for attachments, 90 days for source maps, and a longer window for source maps associated with releases that are still in production.

urgentry does not currently tag source maps by release in S3 (as of v0.2.11). The practical approach is a longer flat retention on source maps (180 days) and a cron job that deletes source maps for releases that urgentry itself has marked as expired or purged.

3. The restored database references source map blobs that are not in the backup

This is the timing divergence problem from the source map section, stated as a failure mode. A team loses their production VPS, restores the SQLite database from a Litestream replica, and starts urgentry. Issues appear. Event history looks intact. Then they open an event and see a raw stack trace with no file names, no line numbers.

The SQLite backup captured the state of the event database as of the restore point. The source map backup ran at a different time, either before or after. The event database contains references to source maps uploaded between the two backup points, and those source maps do not exist in the source map backup.

There is no automated recovery from this. The symbolication gap is permanent for those events. The prevention is synchronizing backup timing and including a post-restore symbolication check in your drill: pick three recent events with stack traces, confirm they symbolicate, and fail the drill if they do not.

Frequently asked questions

Do I need to back up urgentry if I only use it for development?

For a purely local or throwaway development instance, no. For any instance where the event history has value, including staging environments used for debugging production issues, run at minimum a daily backup. The cost is low and the recovery path is straightforward.

What are the RTO and RPO I should plan for with Litestream?

With Litestream syncing every 10 seconds, RPO in practice is under one minute. RTO depends on provisioning time: under ten minutes on a pre-configured standby host, 30 to 60 minutes from a fresh VPS. Write down both numbers and test them in your restore drill.

Can I use the SQLite .backup command alongside Litestream?

Yes, and you should. Litestream gives you continuous low-RPO replication. The .backup snapshot gives you point-in-time archives you can keep offline or in cold storage. Run both. Use .backup for the snapshot, not a file copy, because a file copy of a WAL-mode database taken while urgentry is running can produce an inconsistent backup.

Do I need to back up source maps separately from the event database?

Yes. Source maps live on the filesystem or in S3, not inside the event database. The event database stores only a reference (path or object key) to each source map. Restoring the event database without restoring the corresponding source maps produces correct event listings but unreadable stack traces. Back both up and restore both together.

How do I verify that a backup actually works?

Restore to a staging host. Download the Litestream snapshot or pg_dump export, start urgentry against it, and send one test event. Confirm the event appears with correct stack trace symbolication. Run this monthly. A backup you have never restored is a hypothesis, not a guarantee.

Sources and further reading

  1. Litestream configuration referencesync-interval, retention, replica types, and restore flags.
  2. Litestream S3 replica guide — setup for AWS S3, Cloudflare R2, Backblaze B2, and DigitalOcean Spaces.
  3. PostgreSQL pg_dump documentation — format options, --clean, --if-exists, and custom format semantics.
  4. PostgreSQL continuous archiving and PITRarchive_command, base backups, and point-in-time recovery procedure.
  5. restic documentation — backup, restore, forget, and prune commands; S3-compatible backend configuration.
  6. SQLite online backup API — the semantics of .backup vs file copy in WAL mode.
  7. SQLite PRAGMA integrity_check — what the pragma checks and what results indicate corruption.
  8. FSL-1.1-Apache-2.0 license text — the source-available license under which urgentry is distributed.
  9. urgentry compatibility matrix — the 218/218 Sentry API operation coverage referenced in this guide.

Ready to set up backup for urgentry?

urgentry runs as a single binary with SQLite by default, which makes Litestream the lowest-friction continuous backup path available. The full setup described in this guide takes about an hour and costs under $5 per month in backup storage.