rss-bridge 2025-11-25T21:42:00+00:00

SE Radio 696: Flavia Saldanha on Data Engineering for AI

Flavia Saldanha, a consulting data engineer, joins host Kanchan Shringi to discuss the evolution of data engineering from ETL (extract, transform, load) and data lakes to modern lakehouse architectures enriched with vector databases and embeddings. Flavia explains the industry's shift from treating data as a service to treating it as a product, emphasizing ownership, trust, and business context as critical for AI-readiness. She describes how unified pipelines now serve both business intelligence and AI use cases, combining structured and unstructured data while ensuring semantic enrichment and a single source of truth. She outlines key components of a modern data stack, including data marketplaces, observability tools, data quality checks, orchestration, and embedded governance with lineage tracking. This episode highlights strategies for abstracting tooling, future-proofing architectures, enforcing data privacy, and controlling AI-serving layers to prevent hallucinations. Saldanha concludes that data engineers must move beyond pure ETL thinking, embrace product and NLP skills, and work closely with MLOps, using AI as a co-pilot rather than a replacement.

Brought to you by IEEE Computer Society and IEEE Software magazine.

Flavia Saldanha, a consulting data engineer, joins host Kanchan Shringi to discuss the evolution of data engineering from ETL (extract, transform, load) and data lakes to modern lakehouse architectures enriched with vector databases and embeddings. Flavia explains the industry’s shift from treating data as a service to treating it as a product, emphasizing ownership, trust, and business context as critical for AI-readiness. She describes how unified pipelines now serve both business intelligence and AI use cases, combining structured and unstructured data while ensuring semantic enrichment and a single source of truth. She outlines key components of a modern data stack, including data marketplaces, observability tools, data quality checks, orchestration, and embedded governance with lineage tracking. The episode highlights strategies for abstracting tooling, future-proofing architectures, enforcing data privacy, and controlling AI-serving layers to prevent hallucinations. Saldanha concludes that data engineers must move beyond pure ETL thinking, embrace product and NLP skills, and work closely with MLOps, using AI as a co-pilot rather than a replacement.

Brought to you by IEEE Computer Society and IEEE Software magazine.

Show Notes

Related Episodes

SE Radio 561: Dan Demers on Dataware

SE Radio 507: Kevin Hu on Data Observability

SE Radio 456: Tomer Shiran on Data Lakes

Transcript

Transcript brought to you by IEEE Software magazine.

This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.

Kanchan Shringi 00:00:18 Welcome everyone to this episode of Software Engineering Radio. Today we welcome Flavia Salana. She’s a consulting data engineer who has designed and led enterprise level automated data warehousing solution. Her work spans across the banking and enterprise cloud platforms as well as data modernization and AI driven innovation. Flavia also, takes parts in internet groups focused on data products and AI readiness. So happy to have you here, Flavia, is there anything you’d like to add to your bio before we start talking about data engineering for AI?

Flavia Saldahna 00:00:55 Thank you Kanchan, that was a very nicely well-done introduction of myself and the role that I play in my organization around data engineering. I don’t think there is anything much to add other than my role is centered around data architecture, governance, and engineering enablement, now mostly moving towards AI readiness. So very grateful to be here on this platform to be able to share my own real experiences of working in a financial organization. Happy to be here.

Kanchan Shringi 00:01:28 Before we start drilling down further into data engineering, especially in the context of AI, I like to point the listeners to a few related Episodes that we’ve done in the past. These are Episode 561, Dan DeMers on Dataware, 507- Kevin Hu on Data Observability, 523-Jesse Ashtown and Uri Gilad on Data Governance. And lastly 424- Sean Knapp on Data Flow Pipeline Automation. Flavia, let’s start off with you trying to explain to us in a simple way, what exactly do data engineers do today?

Flavia Saldahna 00:02:07 In a very colloquial definition of data engineering, typically as we hear the word data engineering, the first thing that comes to anybody’s mind is data engineers are professionals who work with data. May not necessarily articulate well like you know, what aspect or facet of data engineering a data engineer typically performs. And even if we look at the traditional history and the way this discipline has evolved, there is so much that has changed in terms of what a data engineer can actually do, how a data engineer is positioned in an organization. So, yes, while a data engineer works with data, I think it is important where, let’s look even a little bit into the history of how this discipline itself has evolved. Data engineering became very popular a few years ago when big data was trending and organizations and companies were talking about how do we transform process volumes of big data, right?

Flavia Saldahna 00:03:13 And the velocity and the speed with which data needs to be processed faster and be made available for our consumers. Let’s go back a little bit further, and this actually takes back to the time when I was fresh out of college and into my first ITLP assignment. At that time, we didn’t really have these job titles of data engineers, but again, though these titles were not present, the aspect of data engineering or this job function itself has always existed. A lot of folks in the organization that have been here like me for a long time, more than a decade now, would know and relate with some of the job titles like SQL Developer, Oracle Developer or an RDBMS, or Relational Database Management System Developer or an architect, right? So we were very much confined to working with data into a container or that particular relational database in itself and we would write large long pages of code, maybe in the form of store procedures, user defined functions and so on.

Flavia Saldahna 00:04:19 So that was also, a form of data engineering where we were trying to write code that solved a particular business problem, right? We provided actions around what needs to be done with the data, how do we structure it, how do we define it, and put into a monolith like a data warehouse in those times. And then after writing large, long pages of code, there came a phase or a period I would say, where a lot of ETL technologies came in. Now with ETL, I mean Extract, Transform and Load technologies, which is essentially the core and the heart of data engineering. This title of ETL Developer became quite prominent in the industry and there were a lot of job families even like you know, which today we call as data engineering revolved and centered mainly around ETL roles. Now with ETL job families coming in, a lot of innovative ETL tools came in, which is essentially what defined this role where we didn’t really have to write a lot of, you know, SQL code, transact SQL and so on.

Flavia Saldahna 00:05:32 But a lot of code started to shrink and these technologies took over where this engineering professional then had to worry about where and how the data needs to be moved into, right? We were essentially like data movers under this job title of ETL developers and then came the big data phase where we were now talking about like bigger volumes of data and that’s when this term of data engineering became quite popular. But if you think about, even though this term was coined, the aspect of data engineering still centered and revolved around the movement itself where you’re moving data from point A to point B and every time a data engineer gets designated to a task or a goal, the very first question the data engineer is going to ask is, what’s my source for the data? What’s my destination of for the data, right?

[...]

Original source