SE Radio 556: Alex Boten on Open Telemetry
Software engineer Alex Boten, author of Cloud Native Observability with Open Telemetry, joins SE Radio host Robert Blumen for a conversation about software telemetry and the OpenTelemetry project. After a brief review of the topic and the OpenTelemetry project's origins rooted in the need for interoperability between telemetry sources and back ends, they discuss the open telemetry server and its features, including transforms, filtering, sampling, and rate limiting. They consider a range of topics, starting with alternative topologies with and without the telemetry server, server pipelines, and scaling out the server, as well as a detailed look at extension points and extensions; authentication; adoption; and migration.
Software engineer Alex Boten, author of Cloud Native Observability with Open Telemetry, joins host Robert Blumen for a conversation about software telemetry and the OpenTelemetry project. After a brief review of the topic and the OpenTelemetry project’s origins rooted in the need for interoperability between telemetry sources and back ends, they discuss the OpenTelemetry server and its features, including transforms, filtering, sampling, and rate limiting. They consider a range of topics, starting with alternative topologies with and without the telemetry server, server pipelines, and scaling out the server, as well as a detailed look at extension points and extensions; authentication; adoption; and migration.
Show Notes
Related Episodes
Transcript
Transcript brought to you by IEEE Software magazine. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.
Robert Blumen 00:00:16 For Software Engineering Radio. This is Robert Bluman. Today I have with me Alex Boten. Alex is a senior staff software engineer at LightStep. Prior to that, he was at Cisco. He’s contributed to open-source projects in the telemetry area, including the OpenTelemetry project. He’s the author of the book, Cloud Native Observability with OpenTelemetry, and that will be the subject of our conversation today. Alex, welcome to Software Engineering Radio.
Alex Boten 00:00:50 Hello. Thank you for having me. It’s great to be here.
Robert Blumen 00:00:52 Would you like to add anything about your background that I didn’t mention?
Alex Boten 00:00:57 I think you captured most of it. I’ve been contributing to OpenTelemetry for a little bit over three years. I’ve worked on various components of the project as well as the specification, and I’m currently a maintainer on the OpenTelemetry Collector.
Robert Blumen 00:01:11 Great. Now on Software Engineering Radio, we have covered quite a lot of telemetry-related issues, including Logging in episode 220, High Cardinality Monitoring, which was 429, Prometheus Distributed Tracing and episode 455, which was called Software Telemetry. So, listeners can definitely listen to some of those in our back catalog to get more general information. We’ll be focusing more in this conversation about what OpenTelemetry brings to the table that we have not already covered. Let’s start out with, in the telemetry space, where could you situate OpenTelemetry? What is it similar to? What is it different? What problem does it solve?
Alex Boten 00:02:02 That’s a great question. So, I think the problem that OpenTelemetry aims to solve — and we’ve already seen it happen in the industry today — is it changes how application developers instrument their application, how telemetry is generated, and how it’s collected, and then transmitted across systems. And if I were to think of what it’s similar to the first thing that comes to mind are the projects that really caused it to emerge, which are OpenCensus and OpenTracing, which are two other open-source projects that were formed a little bit earlier. I think it started in maybe 2017, 2016, to provide a standard around producing distributed tracing. And then also OpenCensus also addressed a little bit around metrics and log collection.
Robert Blumen 00:02:50 What was going on in the telemetry area prior to those projects that created the need for them, and what did they do?
Alex Boten 00:02:57 Yeah, so I think, if you think of telemetry as the domain in software, it’s been around for a really long time, right? Like, people as early as the earliest of computer scientists wanted to know what their computers were doing. And earlier in the days of having a single machine, it was fairly easy to print some log statements and look at what your machine was doing. But as the industry grew, as the Internet of Things picked up, as systems became larger and larger to address the increasing demand, I think systems became inherently more complex. And we’ve seen an evolution of what software telemetry really became. So, if you think of earlier we were able to log data on a single system. As people had to deploy multiple systems, a need for centralized logging came along so that you can aggregate and do aggregate searches on logs.
Alex Boten 00:03:54 And that became really costly. And then we saw an increase in folks wanting to capture more meaningful metrics from their systems where they could create dashboards and do queries, whereas it was cheaper than going through and analyzing log data. And I think the thing that I’ve seen happen in the last 20 years is every time there was a new maybe paradigm around the type of telemetry that systems should emit, there has been a chance for innovation to take place, which is great to see, but if you’re an end user who’s just trying to get telemetry out of a system, out of an application, it’s a really frustrating process to have to go and reinstrument your code every few months or every few years, depending on what the flavor of the day is. And I think what OpenCensus and OpenTracing and OpenTelemetry tried to capture is addressing the pain that users have when it comes to instrumenting their code.
Robert Blumen 00:04:49 What is the relationship of OpenTelemetry to other systems out there, such as Zipkin, Jaeger, Graylog, Prometheus?
Alex Boten 00:05:00 So the relationship that OpenTelemetry has with the Zipkin, the Jaegers and the Prometheus of the world is really around providing interoperability between those systems. So, an application developer would instrument their code using OpenTelemetry, and then they can emit that telemetry data to whatever backend systems they want. So, if you wanted to continue using Jaeger, you could definitely do that with an application that’s instrumented with OpenTelemetry. The other thing that OpenTelemetry tries to do is it tries to provide a translation layer so that folks that are maybe today emitting data to Zipkin or to Jaeger or to Prometheus can deploy a collector within their environments and then translate the data from a specific format of those other systems into the OpenTelemetry format, so that they can then emit the data to whatever backend they choose by simply updating the configuration on their Collector without having to go back to their applications who may be legacy systems that nobody wants to modify anymore and still be able to send their data to different destinations.
Robert Blumen 00:06:06 Is OpenTelemetry then an interoperability standard, a system, or both?
[...]