Guide Self-hosting & ops ~8 min read Updated June 10, 2026

Rate limits and quotas on a self-hosted error tracker: surviving the buggy deploy

A bad release can fire eighteen thousand events into your tracker in ninety seconds. Four layers can absorb that burst, and only one of them talks back to the SDK in a way that lets the application keep running.

TL;DR

20 seconds. The Sentry SDK is built to be told no. When the backend returns 429 with an X-Sentry-Rate-Limits header, the SDK stops sending for the duration the header specifies. A self-hosted tracker that emits this header correctly survives flood incidents without taking the application down with it.

60 seconds. Four layers can drop events before they reach storage: SDK-side sampling and dedupe, the reverse proxy in front of ingest, the application-layer quota inside the tracker, and the storage-tier queue. Each layer fails differently. SDK drops are intentional. Reverse-proxy drops are invisible to the SDK and trigger blind retries. Application-layer drops with X-Sentry-Rate-Limits are the only ones the SDK can react to cleanly. Queue drops happen after a 200 OK and are the silent killer. Pick the layers deliberately, in that order, and your tracker stays useful during the exact incident you need it most.

This guide covers the rate-limit envelope contract, the four-layer model, how Sentry self-hosted and urgentry expose each layer, a worked 18k-event flood example, and three antipatterns that turn a flood into an outage.

Why an error tracker needs rate limits at all

The classic failure shape: a 6:47 PM Sunday deploy ships a NullPointerException in a hot code path that runs on every request. Your service handles three hundred requests per second. Within ninety seconds, every running instance has fired the same exception sixty times. That is eighteen thousand events queued for your error tracker.

Without rate limits anywhere in the chain, the tracker sees eighteen thousand POSTs in a minute and a half. The ingest process climbs to a hundred percent CPU. The events queue grows faster than the writer drains it. The Postgres index on fingerprint_hash stalls. The tracker either crashes, falls behind, or starts dropping events silently. The SDK keeps retrying because nothing told it to stop.

The application is now in worse shape than it was before. The deploy is broken and the tracker is also broken, so you cannot see how broken the deploy is. The first thing you do during the incident is restart the tracker, which means the events from the first ninety seconds are gone forever.

Rate limits at every layer exist to prevent this exact failure mode. The point is not to throw events away; the point is to throw them away on purpose, in a way the SDK can hear, so that the application keeps running and the next ninety seconds of events still land.

The X-Sentry-Rate-Limits envelope contract

When a Sentry-compatible backend wants the SDK to stop sending events, it returns a 429 status code with the X-Sentry-Rate-Limits header. The SDK reads the header, marks itself as paused for the indicated duration, and stops attempting deliveries.

The header format is documented in the Sentry SDK spec:

X-Sentry-Rate-Limits: <duration_seconds>:<categories>:<scope>:<reason_code>

A real example, returned when a project hits its per-minute event budget:

HTTP/1.1 429 Too Many Requests
X-Sentry-Rate-Limits: 60:error:project:quota_exceeded
Retry-After: 60

The fields decode as follows:

  • duration_seconds. How long the SDK should hold off. The example above tells the SDK to wait sixty seconds before retrying.
  • categories. Which event categories the limit applies to. error for exceptions, transaction for performance spans, session for release-health events. An empty value means all categories.
  • scope. The level at which the limit is enforced. project, organization, or key (per-DSN). The SDK uses this to decide which other in-flight events to pause.
  • reason_code. Optional. quota_exceeded, rate_limited, or a custom string the backend chose. The SDK logs this for the debug output but does not act on it.

Multiple limits can be returned, comma-separated. 60:error:project:quota_exceeded,300:transaction:organization:rate_limited tells the SDK to pause error events at the project scope for one minute and transaction events at the organization scope for five.

The SDK side of this contract is consistent across sentry-python, @sentry/node, sentry-go, and the Java starter. They all parse the header the same way and they all back off without you doing anything in application code.

The four-layer model

Drops can happen in four places between the application and durable storage. Each one has a different cost and a different signal back to the SDK.

Layer 1: SDK-side sampling and dedupe

The cheapest place to drop an event is before it leaves the application. The Sentry SDK has two relevant levers:

  • sample_rate on captureException. A float between 0 and 1 that drops a uniform random fraction of error events. Setting sample_rate=0.1 means the SDK sends one in ten.
  • beforeSend with a dedupe key. A function that runs on every event and can return None to drop it. The classic dedupe pattern is to compute a hash of (transaction, exception_type, hostname), keep a small in-memory cache of recent hashes, and drop events whose hash you have seen in the last sixty seconds.

SDK-side drops are invisible to the backend. Nothing arrives, nothing is logged, nothing pages. The trade is that you lose visibility before the tracker has a chance to apply policy. Reserve this layer for incidents you have already chosen to silence, not as a default flood defense.

Layer 2: Reverse proxy

The reverse proxy in front of your tracker (nginx, Caddy, Traefik) can drop requests by IP, by URL pattern, or by raw rate. A typical nginx block for the ingest path:

limit_req_zone $binary_remote_addr zone=ingest:10m rate=500r/s;

location /api/ {
    limit_req zone=ingest burst=1000 nodelay;
    limit_req_status 429;
    proxy_pass http://urgentry_backend;
}

This caps a single source IP at five hundred requests per second with a burst of a thousand. Anything over returns 429 without ever touching the tracker process. The tracker stays healthy.

The cost: nginx does not know how to emit X-Sentry-Rate-Limits with the right scope and category fields. The SDK sees a bare 429 and retries on the next event without the backoff the contract was designed to give it. You stopped the immediate flood, but the SDK is going to throw another flood at you the moment the burst window expires.

Reverse-proxy limits are a sledgehammer. Use them as a backstop against CPU exhaustion under raw connection floods, not as the primary quota enforcement layer.

Layer 3: Application-layer quota

This is the only layer that talks the protocol the SDK was built for. The tracker inspects the incoming envelope, identifies the project and DSN, checks the in-memory rate counter, and either accepts the event or returns 429 with a correctly formed X-Sentry-Rate-Limits header.

Sentry self-hosted exposes this through its SENTRY_QUOTAS and SENTRY_RATELIMITER Python settings, backed by Redis. The defaults are off; you set per-project budgets in the project settings UI or in sentry.conf.py. urgentry exposes the same shape directly in project settings as an events-per-minute and events-per-day budget per project, no Redis dependency required.

A reasonable starting policy for a single-team self-host:

  • Per project, per minute. Twice your historical peak. A service that normally fires fifty errors a minute in a bad hour gets a hundred. The buggy-deploy flood hits this limit immediately and the SDK backs off.
  • Per project, per day. Three to five times your daily P95. Catches sustained leaks like a logging recursion that fires a steady stream for hours.
  • Per organization, per minute. A safety net at roughly ten times the per-project limit. Catches the case where every project is firing at once.

When the limit is hit, the SDK gets a clean 429, the back-off window starts, the application is not impacted, and the events from the next minute (after the back-off expires) still land. This is the layer that does the work.

Layer 4: Storage-tier queue

The last layer is inside the tracker after it has already returned 200 to the SDK. Events sit in a bounded in-memory or on-disk queue waiting to be written. If the queue saturates, the tracker has to choose between dropping new events, dropping old events, or blocking the writer.

urgentry uses a bounded SQLite WAL queue with a configurable depth (INGEST_QUEUE_DEPTH, default 10,000). When it saturates, the default policy is to drop the oldest waiting event so that the most recent signal survives. Sentry self-hosted uses Kafka for the same role with a much larger buffer but the same eventual failure mode.

The danger here is that the SDK already saw 200 OK. As far as it is concerned, the event landed. The drop is silent from the SDK side and only visible in the tracker's own logs. This is the silent-200 failure mode covered in detail in The 200 OK that silently ate your events. Quota at layer 3 exists precisely so that layer 4 never has to drop.

A worked example: 18,000 events in 90 seconds

The Sunday-night flood from the opening, walked through the layers.

At 6:47:00, the deploy goes out. At 6:47:05, the first instance starts firing. By 6:47:30, all instances are at peak, generating roughly two hundred events per second. At 6:48:30 the team notices and rolls back. Total event count: just over eighteen thousand.

  • No rate limits. All eighteen thousand events arrive. The Postgres index on fingerprint_hash stalls at 6:47:45. Ingest CPU hits 100% at 6:47:50. The tracker becomes unreachable at 6:48:10. On restart, twelve thousand events are gone. The team has thirty seconds of visibility into a ninety-second incident.
  • Reverse proxy only. The 500 r/s limit kicks in at 6:47:20. nginx returns bare 429s for the remaining seventy seconds. The SDK retries every two to three seconds in a tight loop, generating another eight thousand wasted requests. The tracker survives but the application's outbound HTTP pool saturates from the retry storm, contributing to its own degradation.
  • Application quota at 200 events per project per minute. The first three hundred events land. At 6:47:30, the project hits its per-minute budget. The tracker returns 429 with X-Sentry-Rate-Limits: 30:error:project:quota_exceeded. All SDKs back off for thirty seconds. At 6:48:00 the window resets, the team has already noticed, and the rollback at 6:48:30 lands cleanly. Total events stored: about six hundred, all from the start of the incident where the signal matters most. Tracker never goes above 30% CPU.

Layer 3 turns the same incident into a survivable one. Layers 1 and 2 are useful complements but cannot do this job alone.

Three antipatterns that turn a flood into an outage

The patterns to avoid, each observed in the wild:

Setting the per-minute quota equal to peak normal traffic. If your peak normal traffic is fifty events per minute and you set the quota at fifty, every legitimate peak hits the limit. The SDK starts backing off during your busiest hour, which is exactly when you cannot afford to lose visibility. Set the quota at twice the peak. Use the per-day budget to catch sustained leaks, not the per-minute one.

Relying on the reverse proxy for the quota. nginx limits stop the bytes but never tell the SDK to slow down. The SDK keeps retrying every couple of seconds for the entire incident. You see the flood end in your tracker logs and assume the SDK has gone quiet, but the application is still spending CPU on retries until the next deploy. Always pair a reverse-proxy limit with an application-layer limit so the SDK gets a clean back-off signal.

Setting the storage queue depth too high. A queue depth of a hundred thousand sounds generous until you notice that the queue is now holding seven minutes of flood data. By the time the writer drains it, the incident is long over and you are wasting CPU writing stale events. Keep the queue at one to two minutes of writer throughput. If you need a larger buffer, that is what the application quota in layer 3 is for, and it costs zero memory and zero disk.

What urgentry exposes today

urgentry's quota knobs, in order of how often you will touch them:

  • Per-project event budget, per minute and per day. Project settings UI, or urgentry quota set --project=<id> --per-minute=200 --per-day=20000. Defaults are off; set them at install time.
  • Per-DSN budget. Same shape, scoped to a single DSN inside a project. Useful when one service in a multi-service project is the suspected source.
  • Global ingest queue depth. INGEST_QUEUE_DEPTH environment variable, default 10,000. Set it to one to two minutes of your normal write rate.
  • Reverse-proxy templates. The Caddy and nginx templates in Reverse proxy configs ship with conservative limit_req blocks that you can tune.

All four return correctly formed X-Sentry-Rate-Limits headers at the application layer. The SDK does the right thing without configuration in application code.

Frequently asked questions

What is the X-Sentry-Rate-Limits header?

It is a structured response header the Sentry SDK respects when a backend returns 429. The format is duration:categories:scope:reason_code, so 60::organization tells the SDK to back off all event categories at the organization scope for sixty seconds. Every Sentry-compatible backend can emit it, and the SDK will queue or drop events client-side rather than retrying in a tight loop.

Should I rate-limit at the reverse proxy or at the application?

Both, for different reasons. The reverse proxy protects the ingest process from CPU exhaustion under raw connection floods. The application layer enforces business-level quotas like per-project event budgets. Reverse-proxy limits drop bytes without context; application limits emit X-Sentry-Rate-Limits so SDKs back off cleanly.

What happens to events that exceed my quota?

It depends on which layer drops them. Reverse-proxy drops return 429 or 503 with no SDK guidance and the SDK retries on the next event. Application-layer drops return 429 with X-Sentry-Rate-Limits and the SDK holds off for the indicated duration. Queue-saturation drops happen silently inside the backend after a 200 OK has already been returned.

Does urgentry let me set per-project quotas?

Yes. urgentry exposes per-project event-per-minute and event-per-day budgets in project settings, mirroring the Sentry self-hosted shape. Hitting the budget returns 429 with X-Sentry-Rate-Limits set to the remaining window, so the SDK pauses without dropping events on the floor.

Will SDK-side sampling fix a buggy deploy that floods the tracker?

Partially. tracesSampleRate cuts performance spans but does not touch error events. To shed errors at the SDK you need beforeSend with a dedupe key or a sample_rate on the error capture itself. Both are blunt tools and should be reserved for incidents you have already chosen to silence, not as a default flood defense.

Sources

  1. Sentry SDK rate-limiting spec — the canonical reference for X-Sentry-Rate-Limits header parsing and SDK back-off behavior.
  2. Sentry envelope format — the wire shape the SDK uses and the categories the rate-limit scope refers to.
  3. Sentry self-hosted — reference implementation for SENTRY_QUOTAS and SENTRY_RATELIMITER backed by Redis.
  4. nginx limit_req module — documentation for the limit_req_zone and burst directives used in the reverse-proxy section.
  5. The 200 OK that silently ate your events — companion guide on the layer-4 failure mode that quota at layer 3 exists to prevent.

Quota knobs that talk the protocol the SDK was built for.

urgentry emits a correctly formed X-Sentry-Rate-Limits header at the application layer. Per-project event budgets per minute and per day, no Redis dependency, on a $5 VPS.