SE Radio 681: Qian Li on DBOS Durable Execution/Serverless Computing Platform
Qian Li of DBOS, a durable execution platform born from research by the creators of Postgres and Spark, speaks with host Kanchan Shringi about building durable, observable, and scalable software systems, and why that matters for modern applications. They discuss database-backed program state, workflow orchestration, real-world AI use cases, and comparisons with other workflow technologies.
Li explains how DBOS persists not just application data but also program execution state in Postgres to enable automatic recovery and exactly-once execution. She outlines how DBOS uses workflow and step annotations to build deterministic, fault-tolerant flows for everything from e-commerce checkouts to LLM-powered agents. Observability features, including SQL-accessible state tables and a time-travel debugger, allow developers and business users to understand and troubleshoot system behavior. Finally, she compares DBOS with tools like Temporal and AWS Step Functions.
Brought to you by IEEE Computer Society and IEEE Software magazine.
Qian Li of DBOS, a durable execution platform born from research by the creators of Postgres and Spark, speaks with host Kanchan Shringi about building durable, observable, and scalable software systems, and why that matters for modern applications. They discuss database-backed program state, workflow orchestration, real-world AI use cases, and comparisons with other workflow technologies.
Li explains how DBOS persists not just application data but also program execution state in Postgres to enable automatic recovery and exactly-once execution. She outlines how DBOS uses workflow and step annotations to build deterministic, fault-tolerant flows for everything from e-commerce checkouts to LLM-powered agents. Observability features, including SQL-accessible state tables and a time-travel debugger, allow developers and business users to understand and troubleshoot system behavior. Finally, she compares DBOS with tools like Temporal and AWS Step Functions.
Brought to you by IEEE Computer Society and IEEE Software magazine.
Show Notes
Related Episodes
- SE Radio 596: Maxim Fateev on Durable Execution with Temporal
- SE Radio 351: Bernd Rucker on Orchestrating Microservices with Workflow Management
- SE Radio 223: Joram Barrez on the Activiti Business-Process-Management Platform
- SE Radio 198: Wil van der Aalst
Other References
- R3: Record-Replay-Retroaction for Database-Backed Applications (co-authored research, presented at VLDB 2023)
Transcript
Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.
Kanchan Shringi 00:00:19 Hello everyone. Welcome to this episode of Software Engineering Radio. Our guest today is Qian Li. Qian Li is a co-founder and Chief Architect at DBOS. Before founding DBOS, Qian completed her PhD in Computer Science at Stanford in 2023. Her PhD research has focused on abstractions for efficient and reliable Cloud Computing. Qian is also the Co-op of the South Play Systems Club, which is an independent talk series focusing on systems programming. Before we actually talk a little bit more about DBOS and its research origins, I’d like to point listeners to Episode 596, which is Maxim Fateev on Durable Execution with Temporal. There are also a few other episodes we have done on related topics and I’ll put those in the show notes. These are Episode 351, 223, and 198. So happy to have you here, Qian, to talk about DBOS and durable execution. Welcome to the show. Would you like to add something to your bio before we jump right in?
Qian Li 00:01:22 Thanks for inviting me to the show. Yeah, so originally DBOS started as a joint research project. It’s a collaboration between Stanford and MIT. So we started the project since 2020. It was also led by a Postgres creator, Prof. Mike Stonebraker, and the Spark creator and database co-founder Matei Zaharia. So during the research project we really tested the capability of databases and see how databases can help you create reliable programs and how databases can help you make your programs more observable and debuggable. So during the research project we built several prototypes and we wrote several papers and when we presented it, people were really excited about the capability that DBOS can bring. So when we graduated in 2023, we decided to, based on the research project co-fund DBOS. And DBOS as a company, right now we focus on building durable software and we believe that all software should be reliable and observable and scalable by default. So now DBOS stands for Durable Backend Observable and Scalable.
Kanchan Shringi 00:02:31 So you mentioned the use of databases for basically running your applications, but why is that a new concept? Everybody uses databases for running apps or most people do. So what is the difference here? What is the secret sauce that you are talking about?
Qian Li 00:02:49 Yeah, so it’s true that people have been storing their business critical data in databases for like 30, 40 years. But the new concept is that we also want to persist program’s execution state in a database like here’s your program and it has multiple steps and we want to persist the steps output and the input into the database so that if your program crashes or machine failed, we’ll be able to resume from exactly where it left off. So the idea is to, in addition to application data, we also store your program execution state in the database, especially if you’re having long running and dynamic workflows in your programs. You really don’t want to restart from scratch every time you hit a system error, or you have a machine failure.
Kanchan Shringi 00:03:40 So Qian, maybe you can explain to listeners what exactly do you define a workflow as opposed to a service?
Qian Li 00:03:48 Yeah, so to get started I think we can talk about what is a workflow in this context. So traditionally people think workflows as state machines you have to define a deck, stuff like that. But actually in DBOS, anything can be a workflow. So to give a concrete example, a workflow is a sequence of operations or function calls. So very typical example in like job execution is let’s say, checkout service. So if you’re implementing a checkout service, usually you have to call say reserve inventory, you have to update a database to make sure that you have enough inventory. And then after that, if you successfully reserve the inventory, you will cut out to the payment process. For example, this could be an external service like Stripe or PayPal. You will say I want to charge a user this much. And after that you need to wait for the response from those services.
Qian Li 00:04:45 And then based on the result, you will decide whether to fulfill the order and send a confirmation email to the user or you have to undo your reservation for the inventory and then cancel the order and send a cancellation email. So this process can be abstracted as a workflow and then like what guarantees do we want for this workflow? First we want to make sure that once I click that checkout button, all steps will eventually succeed or complete, right? I don’t want to say I pay for my process but I never receive my item or I reserve the inventory but never charge a user. So that’s a first guarantee. And the second guarantee is that I want to guarantee effectively exactly once. So if I charge a user, I want to charge them once and if I reserve the inventory, I also want to only reserve it once.
Kanchan Shringi 00:05:38 So you talked about workflow and you mentioned Transaction. What’s the relationship?
[...]