rss-bridge 2025-10-30T17:01:54+00:00

GraphQL Data Mocking at Scale with LLMs and @generateMock

Michael Rebello

11 min readOct 30, 2025

How Airbnb combines GraphQL infra, product context, and LLMs to generate and maintain convincing, type-safe mock data using a new directive.

Press enter or click to view image in full size

Introduction

Producing valid and realistic mock data for testing and prototyping with GraphQL has been a persistent challenge across the industry for years. Mock data is tedious to write and maintain, and attempts to improve the process, such as random value generation and field-level stubbing, fall short because they lack essential domain context to make test data realistic and meaningful. The time spent on this manual work ultimately takes away from what most engineers would like to focus on: building features.

In this post, we’ll explore how we’ve reimagined mocking GraphQL data at Airbnb by combining GraphQL validation, rich product and schema context, and LLMs to generate and maintain convincing, type-safe mock data. Our solution centers around a simple new GraphQL client directive — @generateMock — that engineers can add to any operation, fragment, or field. This approach eliminates the need for engineers to manually write and maintain mocks as queries evolve, freeing up time to focus on building the product.

Key challenges

After meeting with Airbnb product engineers and analyzing results from internal surveys, we distilled the most common pain points around GraphQL mocking down into three key challenges:

Manually creating mocks is time consuming. GraphQL queries can grow to hundreds of lines, and hand-crafting mock response data is extremely tedious. Most engineers manually write mocks as either raw JSON files or by instantiating types generated from the GraphQL schema, while others modify copy-and-pasted JSON responses from the server. Although both of these methods can yield realistic-looking data that can be used for demos and snapshot tests, they require significant time investment and are prone to subtle mistakes.
Prototyping & demoing features without the server is hard. Typically, server and client engineers agree on a GraphQL schema early on in the feature development process. Once the schema has been established, however, the two groups split off and start working in parallel: Server engineers implement the logic to back the new schema and client engineers build the frontend UI, logic, and the queries that power them. This parallelization is particularly challenging for client engineers, since they can’t actually test the UI they’re building until the server has fully implemented the schema. To unblock themselves, client engineers often hardcode data into views, leverage proxies to manipulate responses, or hack custom logic into the networking layer locally, resulting in wasted time and effort.
Mocks get out of sync with GraphQL queries over time. Since most mocks are hand-written, they are not tightly coupled to the underlying queries and schema they are supposed to represent. If a team builds a new feature, then comes back a few months later to add new functionality backed by additional GraphQL fields, engineers must remember to manually update their mock data. As there is no forcing function to guarantee mocks stay in sync with queries, mock data tends to shift further away from the production reality as time passes — degrading the quality of tests.

These challenges are not unique to Airbnb and are common across the industry. Although tooling like random value generators and local field resolvers can provide some assistance, they lack the domain knowledge and context needed to produce realistic, meaningful data for high-quality demos, quick product iteration, and reliable testing.

Goals

When setting out to solve these challenges at Airbnb, we established three north-star goals:

Eliminate the need to hand-write mock data. Mock data should be generated automatically to free up engineers from needing to hand-craft and maintain mock GraphQL data.
Create highly realistic mock data. Mock data should match the user interface designs and look like real production data in order to support high-quality demos, which are highly valued at Airbnb for early feedback.
Keep engineers in their local focus loops. Our solution should seamlessly integrate into engineers’ current development processes so they can generate mocks without context-switching to a website, separate repository, or unfamiliar tool.

@generateMock: Schema + context + LLMs = magic

To generate mock data while keeping engineers in their local focus loops, we introduced a new client GraphQL directive called @generateMock, which engineers can use to automatically generate mock data for a given GraphQL operation, fragment, or field:

Press enter or click to view image in full size

Example of @generateMock being specified on a GraphQL query.

This directive accepts a few optional arguments that engineers can use to customize the generated mock data, and the directive itself can be repeated with different input arguments to generate different mock variations:

id: The identifier to use for the mock, as well as for naming generated helper functions. Useful when repeating the @generateMock directive to produce multiple mocks.
hints: Additional context or instructions on how the mock should look. For example, a hint might be “Include travel entries for Barcelona, Paris, and Kyoto.” Under the hood, this information is fed to an LLM and heavily influences what the generated mock data looks like and how densely populated its fields are.
designURL: The URL of a design mockup of the screen that will render the mock data. Specifying this argument helps the LLM produce mock data that matches the design by generating matching names, addresses, and other similar content.

At Airbnb, engineers use a command line tool we call Niobe to generate code for their GraphQL queries and fragments. After modifying a .graphql file locally, engineers run this code generator, then use the generated TypeScript/Kotlin/Swift files to send GraphQL requests. To generate mock data using @generateMock, engineers simply need to run Niobe code generation after adding the directive — just as they would after making any other GraphQL change.

During code generation, Niobe produces both a JSON file containing the actual mock data for each @generateMock directive, as well as a source file that provides functions for loading and consuming mock data from demo apps, snapshot tests, and unit tests. As shown in the Swift code below, the mockMixedStatusIndicators() function is generated on the InboxSyncQuery’s root Data type. It provides access to an instantiated type that’s populated with the generated mock data for mixed_status_indicators, allowing engineers to use the mock without having to load the JSON data manually:

Press enter or click to view image in full size

Using a generated mock in a Swift unit test.

Engineers are free to modify the generated mock JSON data as well — as we’ll see below, Niobe will avoid overwriting their modifications on subsequent generation invocations.

What does mock data look like?

The context that we provide to the LLM is vital to generating data that is realistic enough to use in demos. To this end, Niobe collects the following information and includes it in the context passed to the LLM:

[...]

*Original source*