rss-bridge 2026-02-04T19:36:00+00:00

SE Radio 706: Yechezkel "Chez" Rabinovich on Observability Tool Migration Techniques

Yechezkel "Chez" Rabinovich, CTO and co-founder at Groundcover, joins SE Radio host Brijesh Ammanath to discuss the key challenges in migrating observability toolsets. The episode starts with a look at why customers might seek to migrate their existing Observability stack, and then Chez explains some approaches and techniques for doing so. The discussion turns to OpenTelemetry, including what it is and how Groundcover helps with the migration of dashboards, monitors, pipelines, and integrations that are proprietary to vendor products. Chez describes methods for validating a successful migration, as well as metrics and signals that engineering teams can use to assess the migration health.

Brought to you by IEEE Computer Society and IEEE Software magazine.

Yechezkel “Chez” Rabinovich, CTO and co-founder at Groundcover, joins SE Radio host Brijesh Ammanath to discuss the key challenges in migrating observability toolsets. The episode starts with a look at why customers might seek to migrate their existing Observability stack, and then Chez explains some approaches and techniques for doing so. The discussion turns to OpenTelemetry, including what it is and how Groundcover helps with the migration of dashboards, monitors, pipelines, and integrations that are proprietary to vendor products. Chez describes methods for validating a successful migration, as well as metrics and signals that engineering teams can use to assess the migration health.

Brought to you by IEEE Computer Society and IEEE Software magazine.

Show Notes

Related References

groundcover Blog | Check Our Latest Articles

OpenTelemetry

Related Episodes

SE Radio 556: Alex Boten on Open Telemetry

SE Radio 507: Kevin Hu on Data Observability

SE Radio 675: Brian Demers on Observability into the Toolchain

SE Radio 591: Yechezkel Rabinovich on Kubernetes Observability

Transcript

Transcript brought to you by IEEE Software magazine.

This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Brijesh Ammanath 00:00:18 Welcome to Software Engineering Radio. I’m your host, Brijesh Ammanath. Today I will be discussing Observability Tool Migration Challenges with Yechezkel Rabinovich, also known as “Chez.” Chez is the CTO and Co-founder at Groundcover, which provides full stack observability for Kubernetes. Chez was previously the chief architect at the Healthcare Security Company Cyber MDX and spent eight years in the cybersecurity division of the Israeli Prime Minister’s office. Chez is a repeat guest at Software Engineering Radio having previously spoken about Kubernetes Observability in Episode 591. Chez, welcome to the show.

Yechezkel Rabinovich 00:00:56 Thank you. Thank you for having me.

Brijesh Ammanath 00:00:57 Today we’ll be talking about what are the challenges and what are the techniques one uses to migrate observability tools. Before we jump into the topics, I just wanted to touch on a few episodes where we have covered observability, previous teams, software engineering radio, so Episode

556 where Alex Boten spoke on Open Telemetry, Episode 507 where Kevin Hu spoke about Data Observability and Episode 675 where Brian Demers spoke on Observability into the Tool Chain. Quite good episodes to refer back to understand more details about observability. Let’s start with what drives the need for firms to migrate from the existing observability tool set? Do you have a story or an example where a firm needed to migrate from their existing observability tool set?

Yechezkel Rabinovich 00:01:47 Yeah, actually a lot. Basically, every customer of Groundcover, not all of them, but most of them are coming from some legacy vendors. Most R&Ds already use observability platform. So, when they want to move to the new observability platform, more modern that rely on bring your own cloud and EBPF, the main challenge is how do we migrate all the hard work we did as an R&D organization, right? You can think about it as dashboards, right? Most of R&D has some kind of top 10, top 20, we often see hundreds of dashboards across the entire R&D. It could be monitors. So, imagine that you rely on specific alerts to wake you up at night to make sure your software is actually behaving as you expect it to be and that your customers get their SLAs. And it also could be integrations that you did or even observability pipeline configurations. All those configurations and integrations have been in the work for maybe five, 10 years from different people in the organization. Some of them already left. So even the knowledge of what we have is sometimes missing. So, all of that makes the decision to move forward to new observability platform a lot more challenging.

Brijesh Ammanath 00:03:17 Right. So, it’s a challenging problem, but what is the primary driver? Do you have an example where a customer came to you to Groundcover and said, because of these reasons, we wanted to move from our legacy observability tool set?

Yechezkel Rabinovich 00:03:32 Yeah, we recently had a customer with a thousand monitors, which they didn’t know how they built it. So, the people that created those monitors are no longer in the organization. So, imagine how frightened it is to now trying to migrate those monitors without knowing if what you’re doing is actually the right thing. There is no way. So, let’s take a simple example of think about monitors that alert when specific log happens. It could be like a free text search, it could be specific attribute with specific value, let’s say status error. So now when you migrate those monitors, imagine that you have a mistake and basically you don’t know that you are not covered by your monitors. So that’s scary. This specific customer, their requirement was to have a fully automated process that will make sure their monitors are being migrated successfully. And to do that, it goes really deep. You need to understand the log structure and the transformation that each log is going through to ensure that those monitors are still covering you and basically that you can sleep good at night

Brijesh Ammanath 00:04:55 Thousand monitors. That’s quite a lot. Is that the norm? Do you have quite a few customers with that many dashboards and monitors?

Yechezkel Rabinovich 00:05:02 I think the average customer has few hundreds of monitors and at least dozens of dashboards. I think it really depends if the customers are using infrastructure as code to generate those monitors because it’s usually very hard to create it manually. But yeah, we’re seeing numbers of hundreds of monitors. We have customer with 5,000 monitors. That’s the high bar, like I think the biggest we have is 10,000 monitors.

Brijesh Ammanath 00:05:29 Wow. If you’ve taken all that effort to build thousand monitors and you know that migrating is going to be quite challenging, why go into that effort? What was the reason of saying that this just no longer works for me?

Yechezkel Rabinovich 00:05:43 Yeah, that’s a good question and it comes back to the reason why we started Groundcover. We were on the other side using some legacy vendor and basically you have two main challenges. One is that you need to generate the data to have all the data you need to instrument your application. I’m talking about APM, open-telemetry or whatever SDK you’re using that’s really cool, but you don’t know what you don’t know. So, if you didn’t instrument your application, you basically don’t have the information thatís actually what’s going on. That’s one reason you want more data. And the other reason that it’s very, very expensive old way of doing observability is basically send all the data to some SaaS provider. But the reality is that it’s very expensive, not just even from licensing fee. Think about the egress fee that you need to pay to send all that data and that lead to some kind of a vicious cycle.

[...]

Original source