Episode 544: Ganesh Datta on DevOps vs Site Reliability Engineering
Ganesh Datta, CTO and cofounder of Cortex, joins SE Radio's Priyanka Raghavan to discuss site reliability engineering (SRE) vs DevOps. They examine the similarities and differences and how to use the two approaches together to build better software...
Ganesh Datta, CTO and cofounder of Cortex, joins SE Radio’s Priyanka Raghavan to discuss site reliability engineering (SRE) vs DevOps. They examine the similarities and differences and how to use the two approaches together to build better software platforms. The show starts with a review of basic terms; definitions of roles, similarities and differences; skillsets for each role, including which is technically more demanding. They discuss tooling and metrics that SRE and Devops teams focus on, including whether custom automation scripts are more a DevOps or an SRE stronghold. The episode concludes with a look at typical good and bad days for DevOps and SRE and touches on career progression for each role.
Show Notes
Related Links
Related SE Radio Episodes
- SE Radio 276: Björn Rabenstein on Site Reliability Engineering
- SE Radio 513: Gil Hoffer on Applying DevOps Practices to Managing Business Applications (Devops)
- SE Radio 457: Jeffery D Smith on DevOps Anti Patterns (Devops Anti patterns)
- SE Radio 313: Conor Delanbanque on Hiring and Retaining DevOps (Hiring and retaining Devops)
- SE Radio 268: Kief Morris on Infrastructure as Code (Infrastructure as code)
- SE Radio 288: Francois Raynaud on DevSecOps (DevSecOps)
SE Radio theme: “Broken Reality” by Kevin MacLeod (incompetech.com — Licensed under Creative Commons: By Attribution 3.0)
Transcript
Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.
Priyanka Raghavan 00:00:16 Welcome to Software Engineering Radio, and this is Priyanka Raghavan. In this episode, we’re going to be discussing the topic DevOps versus SRE, the differences, similarities, how they can work together for building successful platforms. Our guest today is Ganesh Datta, who is the CTO and co-founder of Cortex. Ganesh has an active interest in the areas of SRE and DevOps, primarily from spending many years working with both these SRE and DevOps teams and now is a co-founder of a company that develops a platform for the latter. I also saw that Ganesh contributes a lot to this magazine called DevOps.com, where he’s written on topics such as metrics reviews of Open-Source libraries, and also discussing testing strategies. So, welcome to the show Ganesh.
Ganesh Datta 00:01:03 Thanks so much for having me.
Priyanka Raghavan 00:01:05 At SE Radio, we’ve actually done quite a lot of shows on DevOps and SRE. We’ve done a show for example, episode 276 on Site Reliability Engineering, episode 513 on DevOps Practices to Manage Business Applications. We also did an episode 457 on DevOps Anti-Patterns and then there was also show episode 482 on Infrastructure as Code. So, a ton of stuff, but we never looked at, say, the differences between DevOps and SRE and I thought this would be a perfect show to do. So, that’s why we’re having you here. But before we jump into that, I’m going to actually dial it back and ask you if you could just explain in your own words what you think DevOps is for our listeners.
Ganesh Datta 00:01:47 When I think about DevOps, there’s obviously a lot of confusion between DevOps and SRE and there’s people that kind of do a little bit of both. And so it’s definitely a very open term, and I think the one thing that we always to say is, you don’t necessarily to shoehorn yourself into one or the other. There’s a lot of people that overlap, but when I think about DevOps is literally in the name, right? It’s developer operations. It’s everything around how do we increase engineering efficiency, engineering productivity, how do we enable developers to operate and work their best? And that comes down to everything from tooling to pipelines to build systems to deployment systems to all that kind of stuff I think is really owned by the DevOps team. And so, anything that when you think about development team operating their services, like, that is exactly what DevOps falls under, right?
Priyanka Raghavan 00:02:32 And so how about SRE then? What could you say about site reliability engineering?
Ganesh Datta 00:02:37 Yeah, I think it’s interesting because when you think about SRE, they sometimes do a lot of things that DevOps, well you would, you would think DevOps does, around pipelines and things that. But when I think about SRE it’s more from the lens of reliability. They’re thinking about are the processes that we have in place leading to better outcomes when it comes to reliability and uptime and those kinds of business metrics. And so SRE is mostly focused on defining and enforcing standards or reliability, building the tooling to make it easier for engineers to adopt those practices. And I think that’s where some of the overlap comes in. We’ll talk about that later, obviously. But anything that comes from a reliability or post-production lens I think falls under the SRE umbrella.
Priyanka Raghavan 00:03:15 So, there’s also this, I think a couple of videos and maybe articles where I’ve read where they typically define it as class SRE implements DevOps. That’s one thing that I’ve seen. Well, what’s your take on that?
Ganesh Datta 00:03:28 That’s a really interesting way of putting it. I think it’s true to some extent when I think about SRE, it’s when I think about Ops, you can break it down to pre-production, to production, and post-production. Those three are all totally fair parts of the system and I think SRE generally lives in that kind of post-prod environment where they’re defining those standards obviously those are the things you have to build into your systems beforehand. But mostly they’re thinking about, hey, once things are live, when things are out, do we have visibility? Are we doing the right things? And so, I like to think most SRE teams live in that world and so they, it’s kind of SRE implements post-prod ops implements DevOps. So, maybe another tree down where in reality it should be SRE implements DevOps because you should be a) working together and b) kind of working across a stack. So, yeah, I really that, that way of putting it.
Priyanka Raghavan 00:04:16 So, the other question I’ve been meaning to ask is that there’s a lot of confusion in the roles, but you’ve kind of broken it down for us here, but there’s also these other new roles that I keep seeing in many companies. For example, this infrastructure engineering or Cloud engineer, are these also different names for the same thing?
[...]