Distributed tracing

BugSnag’s end-to-end performance monitoring solution.

Distributed Tracing allows you to track performance across your entire system. By tracing requests from user interactions in your client applications right through your distributed system, you can:

  • Establish the root cause of performance bottlenecks, such as slow running database queries or inefficient query patterns
  • Analyze how backend errors might be impacting your end-user experience
  • Visualize the paths that requests take through your distributed services

Distributed Tracing in BugSnag is built on top of OpenTelemetry, which is an open-source observability framework. OpenTelemetry SDKs can be used to instrument your applications and send telemetry data – traces in the case of BugSnag – to a backend system for processing and storage.

Getting started

A Trace shows you the path of a request through your application(s) and often starts with a user interaction – for example in a website or mobile app. If you have already instrumented your client side applications with a BugSnag Performance SDK then you are already some of the way there!

As a widely adopted observability solution, you may already have instrumented your server applications with OpenTelemetry SDKs. In which case, it is straightforward to modify your existing SDK configuration to export traces to BugSnag. Otherwise, to find out how to add OpenTelemetry SDKs to your server applications, see our quick start integration guides.

A span is the building block for traces, representing a unit of work with a start and end time from which a duration can be calculated. A trace is hierarchical: all spans with the same trace_id are part of the same trace, but spans can also be parents of other spans. For example, a database query might be made as part of an incoming HTTP request.

In order to maintain a trace across services you must ensure that its context gets propagated. OpenTelemetry SDKs default to doing this automatically, but BugSnag Performance SDKs require some simple configuration to enable it. Read our guide on enabling trace propagation for more information.

Once you have configured your applications to send data to BugSnag you will likely want to refine your configuration to ensure that you make the most of BugSnag’s functionality. We have plenty of guidance in our docs to help you with this, but we recommend reading the following as a starting point:

Exporting traces to BugSnag

Trace exporters are used in OpenTelemetry SDKs to send traces from your application to a trace consumer, such as BugSnag. Several exporters can be configured if required, to send traces to multiple backends.

For more advanced use cases or larger systems with higher volumes of traffic, we recommend setting up an OpenTelemetry collector. Collectors are powerful agents for receiving all of your trace data and making decisions – such as which traces to sample and where to send the data. Check out our docs on using collectors with BugSnag for setup guidance.

For smaller apps, or for users that are just getting started with OpenTelemetry, we recommend sending trace data directly to BugSnag.

Most OpenTelemetry SDKs can be configured to export traces to BugSnag either using environment variables or in your app code. We provide platform-specific guidance for this in our integration guides.

Traces can be received by BugSnag via either gRPC or HTTP (protobuf or JSON). In most cases the simplest way to send traces to BugSnag is to export an environment variable with your BugSnag project’s dedicated OpenTelemetry endpoint before running your OpenTelemetry instrumented app:

export OTEL_EXPORTER_OTLP_PROTOCOL="grpc"
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://<PROJECT_API_KEY>.otlp.bugsnag.com:4317"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://<PROJECT_API_KEY>.otlp.bugsnag.com:4318/v1/traces"

For more configuration options see the OpenTelemetry OTLP Exporter Configuration documentation.

Spans sent from your server will consume your unmanaged span quota in BugSnag. Once your unmanaged quota is exhausted, you will not have any further span coverage for that day – it is therefore important that you set an appropriate sampling rate

You can configure the proportion of your organization’s spans that are available for unmanaged usage under Organization settings > Performance > Unmanaged span quota. You can find more info in our unmanaged quota docs.

Span batch size

The maximum payload size for BugSnag’s OpenTelemetry trace endpoints is 1MB. OpenTelemetry payload sizes can be controlled via the batch size in the SDK or collector configuration.

The appropriate batch size will largely depend on the number of attributes getting added to your spans and therefore you may need to experiment to find the appropriate setting. The batch size can be controlled by setting an environment variable; we generally recommend 200 as a good starting point, which is lower than the default of 512 spans per batch.

export OTEL_BSP_MAX_EXPORT_BATCH_SIZE=200

The number of payloads that are rejected for being oversize can be found under Settings > Span usage > Unmanaged.

Enabling trace propagation

OpenTelemetry SDKs automatically propagate the trace context between instrumented services to allow spans to be connected into traces across distributed systems.

For our client-side SDKs some simple configuration is required to ensure that this information is only sent to servers under your control and that are instrumented with OpenTelemetry. Full instructions can be found in the platform-specific documentation for the Trace Propagation configuration option:

In order to see distributed traces throughout your system, remember that all your server-side apps will need to be instrumented with either BugSnag Performance or OpenTelemetry SDKs!