rss-bridge 2026-02-13T17:00:00+00:00

Even your voice is a data problem

Recorded last December at AWS re:Invent, Ryan welcomes CEO and co-founder of Deepgram, Scott Stephenson, for a conversation on advancing voice AI technology.

February 13, 2026

Even your voice is a data problem

Recorded last December at AWS re:Invent, Ryan welcomes CEO and co-founder of Deepgram, Scott Stephenson, for a conversation on advancing voice AI technology.

Credit: Alexandra Francis*

They cover how Deepgram is improving speech-to-text and text-to-speech capabilities using deep learning to take on challenges posed by dialects and noisy environments and the moral and ethical considerations voice AI companies have to make when it comes to voice cloning and synthetic data training.

Deepgram builds accurate, scalable, and affordable large scale voice AI for speech recognition, generation, and AI Agents.

Connect with Scott on LinkedIn, Twitter, or email him at Scott@Deepgram.com

TRANSCRIPT

[Intro Music]

Ryan Donovan: Hello, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ryan Donovan, and today we're talking about voice AI. And my guest [is] Scott Stephenson, founder and CEO of Deepgram. So, welcome to the show, Scott.

Scott Stephenson: Thanks for having me.

Ryan Donovan: Top of the show, we like to get to know our guests. How did you get involved in software and technology?

Scott Stephenson: So, I was a particle physicist. I built deep underground dark matter detectors, and in my physics training is when I first came across serious coding and doing development work. And from a physicist perspective, though, everything is just a tool, so it's , 'okay, I'm learning this thing in order to accomplish tasks.' So, that's when I first got the Deep experience. But it was TI 83 calculators and programming in basic, while you're bored in the back of your math class, and that's what got me going in that mindset. But then the real exposure to how the sausage is made was in physics training and physicists, because of this tool mentality, they think, ' yeah, just hurry through, and whatever.' And so, then they get this reputation as writing bad code, which is mostly true, but this is something that me and my Co-founder, who is also a physicist, rebel against a little. We're like, I think it's better to understand the error handling and, all this stuff, because you do all this work going down this one path, and then, now you can't reuse that somewhere else. And so, that was the first Deep experience that I had with it when I was in my PhD program working with my Co-founder now, who's CTO of Deepgram. And now, of course, we build AI models for speech-to-text, text-to-speech, voice agents, and all of that development mindset, [and] engineer mindset really comes in handy for that.

Ryan Donovan: So, you found a Deepgram 10 years ago or so. Where was the line between that and then speech-to-text/text-to-speech. Why that problem?

Scott Stephenson: So, I was, not kidding, in a James Bond layer deep underground in a government-controlled region of China, this is just what it was. And I was the sole American graduate student there because the US was about to partner—this is like 2010-2011. Relations were good. Capitalism was flourishing in China, et cetera. It was on the upswing, but then Xi Jinping took over as ruler of China, and then things soured. But in that brief window where things were looking good, I was a graduate student, and we were cooking up this idea of a particle physics experiment in China, because we had heard about the world's tallest dam being built. It's called the Jinping Dam, still is the largest. They have the three gorgeous dam, but then they have the Jinping Dam. It's a lesser-known one in Western China, but it's the tallest dam in the world. It's also a unique dam where it's a standard dam, but then it has a secondary dam where it diverts a river through a mountain, and then the mountain is the dam. Okay?

Ryan Donovan: Oh, okay.

Scott Stephenson: This is important because when you're diverting a river through a mountain, that means that there's a tunnel going through, and now you have all this rock above you. And so, in particle physics, you're always trying to run away from cosmic radiation. We're constantly bombarded by radiation. If you were to build a detector on the surface of the Earth, it would light up like a Christmas tree. And so, you try to find a shield, and for most cosmic radiation, you have to go deep underground to have a sufficient shield. But we hear about this marble mountain in China, where it's two miles underground, and we're like, 'oh my God, that's such a great place to do it;' and we somehow convinced the Chinese government and US government that this is a good idea. And it was, but the relationships soured, but I still stayed as a graduate student there. So, I was the sole American graduate student working with this extremely fast-moving, basically, startup—I know now, [but] I didn't know then, but a startup detector. We were starting from nothing, and then it was like, '25 million, 25 people, four years, go.' That's a startup, basically.

Ryan Donovan: Yeah.

Scott Stephenson: We are building this detector that has waveform digitization in it, and they're extremely sensitive detectors called photomultiplier tubes. These are in PET scans and that type of thing, but they can sense individual photons, but they're an analog device. And what you do is you digitize the waveform, and that waveform is like at one nanosecond, or 10 nanosecond time steps. So, it's like extremely fast. But, so there's a ton of data coming through, and it's super noisy. But if you do it the right way, with the right models, with everything the right way, then you can determine in the detector, where did a particle come in, bash off something, and scatter, deposit energy, deposit light charge, that type of thing. And then, you could figure out what type of interaction it was. Was it a background radiation thing? Was it dark matter? This is what we were looking for. And it turns out that setup works extremely– that way of thinking, having real-time models that are looking at waveforms at massive scale with extremely low latency work really well for audio. But this isn't what we were thinking at the time. It's like [this] pessimistic scientist– physicist mindset. We were like, 'hey, let's just keep going down the path that we're going down.' But I just thought, man, it was so cool what we gotta do. We were deep under ground, we're in this James Bond layer, we're whatever– why isn't there a documentary crew here? Why isn't somebody recording this? We're gonna look back years from now and say, we wish we had some recollection of this. So, we built these devices to make backup copies of our lives, just record audio all day, every day. And so, we ended up with over a thousand hours doing that, just recording all the time. And after we did the data-taking run for the experiment, we were also taking data for that, basically. And we came back to it. So, we were uploading all that data to an S3 bucket. It was just accruing. But if you've ever tried to listen to a long-form recording of your life, it's extremely boring. Not a lot going on. And so, you actually wanna find the hits, you want the highlights, you want the highlight reel.

Ryan Donovan: Yeah.

[...]

Original source