Performance Monitoring: Sentry SDK API Evolution

The objective of this document is to contextualize the evolution of the Performance Monitoring features in Sentry SDKs. We start with a summary of how Performance Monitoring was added to Sentry and to SDKs, and, later, we discuss lessons learned in the form of identified issues and the initiatives to address those issues.

Introduction

Back in early 2019, Sentry started experimenting with adding tracing to SDKs. The Python and JavaScript SDKs were the test bed where the first concepts were designed and developed. A proof-of-concept was released on April 29th, 2019 and shipped to Sentry on May 7, 2019. Python and JavaScript were obvious choices, because they allowed us to experiment with instrumenting Sentry’s own backend and frontend.

Note that the aforementioned work was contemporary to the merger of OpenCensus and OpenTracing to form OpenTelemetry. Sentry’s API and SDK implementations borrowed inspiration from pre-1.0 versions of OpenTelemetry, combined with our own ideas. For example, our list of span statuses openly match those that could be found in the OpenTelemetry specification around the end of 2019.

After settling with an API, performance monitoring support was then expanded to other SDKs. Sentry's Performance Monitoring solution became Generally Available in July, 2020. OpenTelemetry's Tracing Specification version 1.0 was released in February, 2021.

Our initial implementation reused the mechanisms we had in place for error reporting:

  • The Event type was extended with new fields. That meant that instead of designing and implementing a whole new ingestion pipeline, we could save time and quickly start sending "events" to Sentry, this time, instead of errors, a new "transaction" event type.
  • Since we were just sending a new type of event, the SDK transport layer was also reused.
  • And since we were sharing the ingestion pipeline, that meant we were sharing storage and the many parts of the processing that happens to all events.

Our implementation evolved such that there was a clear emphasis on the distinction between Transactions and Spans. Part of that was a side effect from reusing the Event interface.

Transactions resonated well with customers. They allowed for important chunks of work in their code to be highlighted, like a browser page load or http server request. Customers can see and navigate through a list of transactions, while within a transaction the spans give detailed timing for more granular units of work.

In the next section, we’ll discuss some of the shortcomings with the current model.

Identified Issues

Coming soon.

You can edit this page on GitHub.