Guide Observability ~10 min read Updated April 24, 2026

Continuous profiling in 2026.

Continuous profiling moved from “interesting experiment” to production standard over the past two years. pprof converged as the universal format. eBPF profilers reached stability. The OTel profiling SIG advanced toward a first stable signal. This guide covers what continuous profiling is, how the formats and tools evolved, which open-source backends lead the space, and where urgentry sits in a stack that delegates profiling to Pyroscope or Parca.

TL;DR

20 seconds. Continuous profiling captures stack traces from a running application at a fixed sample rate, stores the profiles over time, and lets you compare CPU, memory, or goroutine state between any two time windows. The dominant open-source backends are Pyroscope (now under Grafana stewardship) and Parca (formerly from Polar Signals). Both write and read the pprof format. Both ship eBPF agents for zero-code-change profiling.

60 seconds. Ad-hoc profiling captures one profile on demand, usually during an active incident. Continuous profiling captures profiles on a schedule, every 10 to 60 seconds, and stores them alongside your metrics and traces. The storage cost is low because profiles compress well and sampling rates are modest. The payoff is that you can answer “what was the CPU distribution two minutes before that latency spike?” without having instrumented for it in advance. The pprof format, originally a Go toolchain artifact, became the lingua franca for profiling data. eBPF profilers sample the Linux kernel call stack with no application changes. In-process SDKs instrument the runtime from inside the process. Both produce pprof output.

Where urgentry sits. urgentry handles errors and traces. It does not do profiling, and profiling is not a near-term roadmap item. If you run urgentry for error tracking and need continuous profiling, deploy Pyroscope or Parca alongside it. Connect the two tools via a shared deploy ID or release tag so you can navigate from an urgentry error to the Pyroscope flame graph covering the same incident window.

What continuous profiling is

A profiler collects stack traces from a running process by interrupting it at regular intervals and recording the call stack at that moment. Each sample captures which functions are on the stack and how deep the call chain is. Aggregate thousands of samples and you see which functions consume the most CPU time, which callers allocate the most memory, and which goroutines or threads pile up.

The distinction between ad-hoc and continuous profiling is in how and when the collection happens. Ad-hoc profiling is deliberate: an engineer runs go tool pprof or py-spy against a live process for 30 seconds when an incident is already underway. That captures the state during the window you chose, which may or may not be when the problem is worst. Continuous profiling runs a collector permanently, sampling every process on a schedule and shipping the results to a storage backend. The profile for any past time window is available without any prior decision to collect it.

Three profile types cover the vast majority of production use cases:

  • CPU profiles. Measure wall-clock or on-CPU time consumed by each function in the call graph. A CPU profile answers: where is the program spending its time?
  • Memory (heap) profiles. Measure allocations in bytes or object count, weighted by the code path that allocated them. A heap profile answers: who is allocating memory, and how much?
  • Goroutine or thread profiles. Capture the state of every goroutine or thread at a point in time. A goroutine profile answers: which call paths are blocked, and on what?

Sampling-based profilers (the dominant type) interrupt the process at a fixed Hz rate, typically 10 to 100 samples per second. The overhead is proportional to sample rate and stack depth, usually under 1% CPU for rates at or below 100 Hz. Instrumented (or tracing-based) profilers record every function entry and exit, which produces exact counts but imposes 5 to 20% overhead and is only practical for short captures. Continuous profiling infrastructure uses sampling.

The pprof format and why everything converged on it

Go shipped with a profiler from its first public release. The Go toolchain included go tool pprof, a command-line tool for capturing and visualizing CPU and memory profiles. That tool consumed a binary format called pprof, defined as a protobuf schema in github.com/google/pprof.

The pprof format encodes a profile as a set of sample records, each containing a stack of location IDs, a set of value pairs (CPU nanoseconds, allocation bytes, allocation count), and a timestamp range. Locations map to functions, which map to binary symbols. A profile file is self-contained: it carries the symbol table needed to display the flame graph without access to the original binary.

Several forces converged to make pprof the default format for the broader profiling ecosystem. First, Go’s adoption in backend infrastructure meant that many teams already understood the format and the tooling. Second, the format is well-specified and has a stable protobuf definition, which made it easy for non-Go profilers to write output that Go tooling could visualize. Third, Pyroscope and Parca both chose pprof as their storage format, which created a network effect: any profiler that writes pprof works with both backends.

eBPF profilers sealed the convergence. Tools like Parca Agent, the Pyroscope eBPF agent, and Polar Signals’ predecessor all adopted pprof output for their cross-language profiles. A single eBPF profiler running on a Linux node captures stack traces from Go, Java, Python, and Rust processes on that node and writes a pprof file per process. Backends that accept pprof files accept profiles from all those runtimes without format negotiation.

The OTel profiling SIG took the same position. The OTLP profiles signal, still in development as of mid-2026, embeds the pprof format as the profile payload within the OTel envelope. Rather than inventing a new profile representation, the SIG wrapped pprof in the OTel data model. The implication for operators: if you adopt OTel profiles, the underlying data is still pprof, and existing tooling that understands pprof can process it.

eBPF profilers vs in-process profilers

The profiler itself can live in two places: outside the application process (eBPF, in the Linux kernel) or inside the application process (a language SDK or agent embedded in the runtime).

eBPF profilers

eBPF (extended Berkeley Packet Filter) is a Linux kernel subsystem that allows sandboxed programs to run in kernel space in response to events. Profiling agents use eBPF to attach to perf events: the kernel delivers a callback at every timer interrupt, and the eBPF program captures the user-space call stack of whichever process was running at that moment.

The properties that make eBPF profilers attractive for continuous profiling:

  • Zero application changes. An eBPF profiler runs as a DaemonSet on each Kubernetes node and captures profiles from every process on that node. No SDK, no sidecar inside the pod, no recompilation.
  • Cross-language visibility. One eBPF agent profiles Go, Java, Python, and Rust processes simultaneously. In-process profilers are language-specific.
  • Low overhead. eBPF programs execute in kernel space at sample points. The agent does not run in your application’s address space and does not compete for the same CPU cores in the same way.
  • Frame unwinding complexity. Native frame unwinding from the kernel requires DWARF debug info or frame pointers. Go binaries include frame pointers by default (since Go 1.12). JVM and Python runtimes require interpreter-specific unwinding logic that eBPF agents implement with varying success.

Parca Agent and the Pyroscope eBPF agent both operate this way. They run outside the application, require no changes to the application container, and produce pprof output per-process.

In-process profilers

In-process profilers embed a sampling agent inside the application runtime. For Go, this means starting a goroutine that calls the runtime profiling API (runtime/pprof or the pprof HTTP handler) on a schedule and shipping the resulting pprof bytes to a backend. For Python, it means a thread that calls sys._getframe() at regular intervals. For JVM languages, it means a JVMTI agent that samples the JVM thread stacks.

The trade-offs compared to eBPF:

  • Simpler stack unwinding. The runtime already knows its own stack layout, so in-process profilers get accurate frames with no DWARF dependency.
  • Language-specific. Each runtime needs its own SDK. A Go service and a Python service require separate agents.
  • SDK dependency. Adding an in-process profiler adds a library dependency and a configuration step per service. This is not always feasible across a heterogeneous fleet.
  • Richer runtime metadata. The in-process profiler can capture Go-specific data: goroutine IDs, block profiles, mutex contention profiles. An eBPF profiler cannot read these from the kernel.

Pyroscope ships both. Its eBPF agent handles the no-code-change case; its Go, Python, Java, and Ruby SDKs handle the in-process case for teams that want richer data or work on runtimes where eBPF frame unwinding is incomplete.

Where the tools sit in 2026

The continuous profiling tool landscape consolidated since 2023. Three names dominate the open-source space.

Pyroscope

Pyroscope started as an independent open-source project and was acquired by Grafana Labs in 2023. It remains open source under the AGPL license (server) and Apache license (SDKs). Grafana merged Grafana Phlare, its own short-lived profiling backend, into Pyroscope in late 2023. The result is a single Pyroscope codebase that Grafana ships as a standalone tool and as part of Grafana Cloud.

Pyroscope’s architecture is a pull-and-push hybrid. In push mode, language SDKs or the eBPF agent send pprof payloads to the Pyroscope server on a schedule. In pull mode, the server scrapes the /debug/pprof HTTP handler that Go’s net/http/pprof exposes. Both paths store profiles in Pyroscope’s columnar store with label-based indexing similar to Prometheus. The query interface uses PromQL-like label selectors against a flame graph response format.

Parca

Parca was incubated by Polar Signals, a company focused entirely on continuous profiling. Polar Signals runs a commercial cloud service on top of Parca. The Parca server and Parca Agent are fully open source under the Apache license. Parca Agent is a eBPF profiler that runs as a DaemonSet; Parca Server accepts pprof profiles via a gRPC API and stores them in a column-oriented format using Apache Parquet.

Parca has invested more deeply than Pyroscope in the open-source standardization path. Polar Signals engineers drove much of the work on the OTel profiling SIG and on the pprof format improvements in github.com/google/pprof. If you care about the future OTel profiles signal, Parca is the implementation closest to that standard.

Grafana Phlare (now merged)

Grafana Phlare launched in late 2022 as Grafana’s independent profiling backend. After the Pyroscope acquisition, Grafana merged Phlare into Pyroscope. There is no longer a separate Phlare product. Teams that ran Phlare in 2023 migrated to Pyroscope. If you see Phlare references in older documentation, read them as pointing to what is now Pyroscope.

Sentry profiling

Sentry added profiling as a feature of its SaaS product in 2023. The Sentry profiling feature is not available in the Sentry self-hosted distribution at general availability as of mid-2026. It requires the Sentry SaaS plan. For teams that self-host error tracking, Sentry profiling is not an option unless they also adopt Sentry SaaS for the profiling signal, which creates a split tool situation. The per-profile pricing on Sentry SaaS adds meaningful cost at scale.

The OTel profiling SIG status

The OpenTelemetry profiling SIG has been active since 2022. The goal is to define a stable OTel signal for profiling data, the same way the project defined stable signals for traces, metrics, and logs. As of mid-2026, the state is:

  • OTLP profiles spec: published as a working draft. The spec defines the ExportProfilesServiceRequest protobuf message, which embeds pprof profile bytes alongside OTel resource attributes. The signal is in the proto repository under an experimental path.
  • Go SDK: the OTel Go SDK has a profiling exporter in beta. It wraps the runtime/pprof API and ships profiles to an OTLP endpoint on a schedule. Not yet part of the stable SDK.
  • Collector support: the OTel Collector Contrib repository has a profilesreceiver and profilesexporter in alpha. Configuration exists but the API is not stable.
  • What is stable: nothing in the OTel profiling signal is marked stable as of mid-2026. The spec is experimental, the SDKs are beta, and the Collector components are alpha.

The practical implication: do not build production pipelines that depend on OTel profiling signal stability yet. Use Pyroscope or Parca natively. Both backends already accept pprof directly and have stable APIs. When the OTel profiles signal reaches stability, both backends will add OTLP ingest, and the migration path from direct pprof to OTLP will be an exporter configuration change, not an instrumentation rewrite.

The SIG’s work is worth watching because it sets the eventual standard. Teams that already label their profiles with OTel-compatible resource attributes (service.name, service.version, deployment.environment) will have a smoother path to the unified pipeline when the signal stabilizes.

When you need continuous profiling

Four situations make continuous profiling the right tool and not a nice-to-have.

Latency spikes you cannot reproduce

A service shows p99 latency spikes every 90 minutes. The spikes last 30 seconds. By the time an engineer connects a profiler, the spike is over. Continuous profiling captures the CPU profile for the 30-second window automatically. You open the flame graph for that time range and see that 70% of CPU went to a regex compilation call inside a hot path that was not cached. The spike was deterministic; you just needed the profile from the right 30 seconds.

Memory growth in steady state

A service’s RSS grows by 50 MB per hour with no change in traffic. Heap profiles captured continuously show the growth path: a cache struct that accumulates entries with no eviction policy, allocated by a function called from three different call sites. The continuous heap profile shows you when the growth started (correlated with a deploy) and which call site contributes most.

Goroutine leaks

The goroutine count on a Go service climbs from 200 to 12,000 over 6 hours before the service OOMs. A continuous goroutine profile shows you the stack trace common to 11,800 of those goroutines: they are all blocked on a channel send with no receiver. The continuous timeline shows you that the goroutine count started climbing exactly 2 minutes after a specific deploy.

CPU regression after a deploy

CPU usage on a service increases 40% after a deploy. The flame graph comparison between the pre-deploy and post-deploy time windows shows a new function appearing in the hot path: a JSON marshaling call that was not there before. A code change introduced unnecessary serialization on every request. The profile makes the regression visible in 5 minutes; a code review alone might take hours.

When you don’t need continuous profiling

Continuous profiling solves specific problems. For other situations, the operational overhead is not worth it.

Small services with predictable load

A service handles 5 requests per second with stable CPU usage and no growth in memory. The chance of a latency spike or resource regression that requires a profile to diagnose is low. The cost of deploying and operating a profiling backend for that service, plus the storage for profiles, exceeds the expected diagnostic value. Deploy profiling when you have a signal that something is wrong.

Errors-first investigations

An exception is thrown on a specific code path. The error tracker shows the exception type, the message, the stacktrace, and the trace context. The cause is visible in the error itself: a nil pointer dereference, a failed database query, a validation error. Adding profiling data to this investigation adds noise. The profiler shows you time distribution; it does not show you why a specific call returned an error. Use error tracking for error investigations; reach for profiling when the problem is latency or resource consumption with no clear error signal.

Where urgentry sits in this stack

urgentry is an errors and traces backend. It accepts Sentry SDK events and OTLP/HTTP traces, groups exceptions into issues, and links issues to distributed traces. Profiling is not in scope for urgentry in 2026, and it is not on the near-term roadmap.

The intended stack for a team that uses urgentry for error tracking and needs profiling:

  • urgentry for errors and traces. One Go binary at 52 MB resident. Accepts Sentry SDK and OTLP/HTTP.
  • Pyroscope or Parca for continuous CPU, memory, and goroutine profiles. Deploy the eBPF agent as a DaemonSet for zero-code-change coverage, or add the language SDK for richer runtime data.

Connect the two tools via the deploy or release ID. urgentry links errors to the deploy version via service.version on the OTLP resource or the release tag on the Sentry SDK event. Configure your Pyroscope or Parca agent to attach the same version label. During an incident, you can navigate from an urgentry error (which shows the error, stack, and trace context) to Pyroscope (which shows the CPU flame graph for that service version during the same time window) without rebuilding context.

This is the correct separation of concerns. urgentry does error triage well. Pyroscope or Parca do profiling well. The two tools talk via shared metadata, not via direct integration. Neither tool needs to understand the other’s data model.

A worked example: Go service with urgentry and Pyroscope

The scenario: a Go HTTP service that processes image uploads. You instrument it with urgentry for error tracking and Pyroscope for CPU profiling. An incident occurs: p99 latency on the upload endpoint rises from 200 ms to 4 seconds. No errors appear in urgentry. You need the profiling signal.

Step 1: instrument the service

Configure the Sentry SDK for urgentry error tracking:

import "github.com/getsentry/sentry-go"

sentry.Init(sentry.ClientOptions{
    Dsn:         "https://key@urgentry.example.com/1",
    Release:     "image-service@1.4.2",
    Environment: "production",
})

Add the Pyroscope Go SDK for CPU and memory profiles:

import "github.com/grafana/pyroscope-go"

pyroscope.Start(pyroscope.Config{
    ApplicationName: "image-service",
    ServerAddress:   "http://pyroscope.example.com:4040",
    Tags: map[string]string{
        "version": "1.4.2",
        "env":     "production",
    },
    ProfileTypes: []pyroscope.ProfileType{
        pyroscope.ProfileCPU,
        pyroscope.ProfileAllocObjects,
        pyroscope.ProfileAllocSpace,
        pyroscope.ProfileInuseObjects,
        pyroscope.ProfileInuseSpace,
        pyroscope.ProfileGoroutines,
    },
})

Both tools now receive data from the same service with the same version tag. urgentry groups any exceptions by type and release. Pyroscope stores CPU and memory profiles every 10 seconds, labeled with version=1.4.2.

Step 2: the incident

Latency rises. urgentry shows no new errors: no panics, no nil pointer dereferences, no failed downstream calls. The service is not throwing exceptions; it is just slow.

In urgentry, you check the traces view. Spans for the upload endpoint show the overall duration but no error status. The spans show the handler spent 3.8 seconds in a child span named process-image. The span carries no exception event. The function ran but was slow.

Step 3: the profile answers it

Open Pyroscope. Select the image-service application, the version=1.4.2 tag, and the CPU profile for the incident time window. The flame graph shows 92% of CPU time in one function: image/jpeg.Decode. Drill into callers: the upload handler calls image/jpeg.Decode without passing a reader limit. An upload of a 400 MB JPEG triggers a full in-memory decode on the CPU.

The fix is a size guard before decoding. urgentry confirms no errors existed (the bug was not a crash, it was a performance cliff). Pyroscope shows exactly where the CPU went. Neither tool alone is sufficient; both together resolve the incident in under 10 minutes.

After the fix, compare the Pyroscope flame graph between version=1.4.2 and version=1.4.3. The image/jpeg.Decode call drops from 92% to 18% of CPU. The fix worked and the data confirms it.

Frequently asked questions

Does urgentry do continuous profiling?

No. urgentry handles errors and traces. For continuous profiling, deploy Pyroscope or Parca. Connect the two tools by tagging both with the same release or version identifier, so you can correlate an urgentry error with the Pyroscope flame graph for the same service version and time window.

What is the difference between pprof the tool and pprof the format?

go tool pprof is the profiler and visualizer bundled with the Go toolchain. The pprof format is the protobuf-based profile serialization defined in github.com/google/pprof. eBPF profilers like Parca Agent write the pprof format without using the Go tool. The format is language-agnostic; the tool is Go-specific. When you see “pprof” in the context of Pyroscope or Parca, it almost always refers to the format, not the Go tool.

Can I run an eBPF profiler without changing my application code?

Yes. The Parca Agent and the Pyroscope eBPF agent both run as Kubernetes DaemonSets and sample every process on the node from kernel space. No SDK, no sidecar inside your pod, no recompilation. The requirement is Linux kernel 4.18 or later and, for Go services, frame pointers enabled (the default since Go 1.12).

Is the OTel profiling signal stable in 2026?

Not yet. The OTLP profiles spec is experimental, the Go SDK implementation is in beta, and the OTel Collector profiling components are in alpha as of mid-2026. Do not build production pipelines that depend on this signal’s stability. Use Pyroscope or Parca natively. Both will add stable OTLP ingest when the spec reaches stability, and the migration path will be an exporter configuration change, not an instrumentation rewrite.

When should I reach for profiling instead of error tracking?

Reach for profiling when you have a latency or resource regression that does not produce error events. Error tracking tells you that something threw an exception and where in the code it happened. Profiling tells you where CPU time or memory went. Start with errors and traces; add profiling when the root cause is not visible in them. The worked example above is the canonical case: a performance cliff with no exception, where only the flame graph reveals the cause.

Sources

  1. Pyroscope documentation — Grafana’s continuous profiling backend; covers eBPF agent setup, language SDK configuration, and the Pyroscope query API.
  2. Parca documentation — Polar Signals’ open-source continuous profiling project; covers Parca Agent eBPF profiling and the Parca Server storage format.
  3. OTel profiling SIG — OTLP profiles spec — the experimental protobuf definition for the OTel profiles signal; tracks status of the ExportProfilesServiceRequest message.
  4. github.com/google/pprof — the canonical pprof format definition and the pprof CLI tool; the protobuf schema that Pyroscope, Parca, and eBPF profilers all write.
  5. Gregg, Brendan. Learning eBPF. O’Reilly Media, 2023 — the definitive technical reference for eBPF, including the perf event sampling mechanism that eBPF profilers build on.
  6. FSL-1.1-Apache-2.0 license text — the license under which urgentry is source-available.
  7. urgentry compatibility audit — the full account of urgentry’s Sentry SDK and OTLP compatibility coverage.

Handle errors with urgentry. Profile with Pyroscope or Parca.

urgentry is the errors and traces backend: one Go binary at 52 MB resident, accepting Sentry SDK events and OTLP/HTTP traces. Add Pyroscope or Parca for profiling and link both tools via a shared release tag. Each tool does one job well.