rss-bridge 2025-10-08T15:32:02+00:00

Unlocking Faster Insights with Experimenter-Defined Segmentations

Imagine you have a fabulous idea to drive more sales on Etsy by giving out free ice cream with every purchase. How would you know if it will actually work?
One way to test this out is to run an experiment! An experiment is a learning tool that can help us see if a new change really affects user behavior and key metrics we care about. At Etsy, we run nearly 2000 experiments a year for testing new ideas to improve the user experience for our buyers and sellers.
For our ice cream idea, we could set up our experiment as an A/B test on the listing page, where one group of users (the “control” group) sees the usual “Buy it now” button, while the other (the "treatment" group) sees a shiny new version that says, “Buy it now and get a FREE ICE CREAM!”.

Comparison between the “Buy it now” button in the control and treatment groups in our example experiment. Disclaimer: Example for illustrative purposes only.

We would then track how many people click the “Buy” button in each group to see if the new button actually increases sales.
But what if you want to dig deeper? What if you want to know if certain groups of users react differently to the free ice cream offer?
For example:

Do ice cream lovers click the button more often than those who don’t like ice cream? 🍦
What about people who’ve had ice cream in the past week versus those who haven’t? 😋
Or people browsing high-priced items versus low-priced ones? Does the price of the listing affect buyer behavior? For example, would a free ice cream entice someone to buy a $5,000 chair?

This is where segmentations come into play! Segmentation allows you to slice and dice your experiment data to analyze the impact on different groups of users. It’s a powerful tool to uncover meaningful patterns that might remain hidden in the overall results. For instance, our ice cream experiment might show neutral engagement overall, but analyzing by segmentation could reveal that ice cream lovers are much more likely to buy with the new button — while non-lovers engage less, canceling each other out in the overall results.
Since experimentation is all about learning, using segmentations helps us maximize new insights from our experiments. However, for a very long time at Etsy, experimenters didn’t have the best tools to make the most of segmentations. In fact, there was no way for experimenters to define their own segmentations within our centralized experimentation platform.
In this post, we’ll take you through how we transformed segmentations by empowering experimenters to easily define their own segmentations within our experimentation platform — resulting in 3x the number of segmentations and unlocking faster, deeper insights for teams across Etsy.
The challenge with using segmentations at Etsy
Etsy’s in-house experimentation platform, Catapult, supports a fixed set of segmentation out of the box. Experimenters can view their results automatically broken down by 13 supported segmentations on their experiment results page, with results being refreshed daily.
However, these 13 segmentations were hard-coded in Catapult’s back-end, and there was no way for experimenters to analyze their results by any other segmentations within Catapult.
Okay, but can’t we just add new segmentations to Catapult?
Not so easy. Adding new segmentations required an engineer to make a code change to the data pipeline each time, which was a slow, painful process. This was simply not scalable when different teams wanted different segmentations. To make matters worse, all segmentations were processed together in a single batch job, which could only start once the upstream data for all segmentations was ready. This meant that if the upstream data for even just one segmentation was delayed, the entire batch job was held up, causing delays across the board. Adding more segmentations only made this problem worse, leading to longer processing times and higher risk of delays as each new segmentation was onboarded.

The Catapult data pipeline processed 13 hard-coded segmentations all within one batch job.

As a result, teams ended up relying on analysts to run ad-hoc SQL queries outside Catapult to explore unsupported segmentations. This process was slow, hard to reproduce, and difficult to share with product managers and stakeholders. Experimenters often felt frustrated – there was so much to learn, but no easy way to get the insights they needed to make critical business decisions quickly.
As the team at Etsy responsible for Catapult and internal experimentation support, we wanted to change this by empowering experimenters to define their own segmentations — and to reuse those segmentations across experiments. Our vision was to create a Segmentation Library: a collection of experimenter-defined segmentations that could be easily shared and automatically applied in the data pipeline, no code changes required!
Empowering experimenters to define their own segmentations
To make it possible for experimenters to define their own segmentations, we needed a way to represent segmentations as configuration objects that are both human-readable and machine-processable.
We drew inspiration from the ad-hoc SQL scripts our analysts had been writing to explore custom segmentations — because SQL is widely understood and expressive. Typically, those scripts all follow these steps:

Query various source tables for segment data
Join the segment data with experimental unit data (i.e. users that were part of an experiment)
Define how to assign each experimental unit to a segment value

We decided to model each segmentation as:

A SQL query defining the segment data source, and how to assign segment values to each experimental unit
A set of join columns to connect segment data to experimental units using some shared IDs (e.g. user IDs)

Here is an example of a segmentation definition from our ice cream experiment:
Segmentation name: "ice_cream_lovers"
SQL:
SELECT
user_id,
CASE
WHEN rating = 9 THEN 'Ice cream lover'
WHEN rating
Join columns: ["user_id"]
The nice thing about defining segmentations this way is that, unlike the old hard-coded segmentations that were buried deep in the data pipeline, these SQL definitions are easier for analysts, product managers, and engineers to define and understand. This makes it easier for teams to adopt and reuse existing segmentations that were previously defined by other teams for their own experiments. Additionally, the existing segmentations serve as inspiration for new ones! Anyone looking to create a new segmentation can view the existing ones in the Segmentation Library for inspiration before defining their own in the Segmentation Creation tool.
The SQL queries and join keys can also be easily plugged into templated scripts in Catapult’s data pipeline to compute segmented experiment results. It’s a win-win for both experimenters and the data pipeline!
Validating segmentation definitions
With experimenter-defined SQL queries comes a big question: How do we make sure they actually work?
As the owners of Catapult’s data pipeline, we want to make sure these experimenter-defined queries won’t cause any damage — like accidentally deleting important tables or causing failures in our pipeline. On the flip side, experimenters creating these segmentations want to know that their SQL will run smoothly in our pipeline, and that the results match what they expected.
To tackle both concerns, every new segmentation goes through a rigorous validation process before it can be deployed to production:

The segmentation validation and deployment workflows are triggered to push new segmentations to production.

First, when an experimenter defines a new segmentation in the Segmentation Creation tool and clicks “Validate”, it triggers a validation workflow through the workflow orchestration API. This workflow receives the experimenter's segmentation SQL query and metadata such as join keys. Then, each task in the validation process checks different parts of the segmentation query to make sure the SQL follows our rules — like having the correct output schema and using only approved join keys. If anything fails, an error message is sent back to the experimenter through the Segmentation Creation tool. This workflow-orchestration–based validation ensures that segmentations are accurate, safe, and ready to run before deployment. Once the SQL validation finishes successfully, we generate a sample result in BigQuery for the experimenter to review.
Once the experimenter reviews their sample results and are happy with it, they can click “Deploy”, which triggers an automated deployment workflow to push the new segmentation to production. This workflow adds the validated SQL to the segmentation definition repository and automatically creates a pull request. After a final validation of the segmentation SQL query, the pull request is automatically merged, and the new segmentation is immediately deployed to production. The system also manages all dependencies, ensuring that all upstream data is ready before running the segmentation in production.
Once a segmentation is in production, it is ready to be used in experiments!
Building a more flexible pipeline with dynamic task generation
After new segmentations were defined, the next challenge was to ensure that our data pipeline could automatically pick them up and process them — without needing any code changes.
The problem was that Catapult’s data pipeline was set up as a workflow with a fixed set of tasks for a predefined set of segmentations. This meant adding a new segmentation required updating the pipeline code to add new tasks.
To solve this, we leveraged dynamic task generation in our workflow to create tasks at runtime based on dynamic inputs — like the list of active segmentations — without knowing in advance how many tasks are needed.
This was a perfect fit for us, since it enabled us to process a changing set of segmentations each day as independent, parallel work streams, all running the same underlying logic. This approach also helped us scale up the number of supported segmentations without duplicating code.
Here is how we utilized dynamic task generation in our data pipeline:

First, the workflow makes an API call to get the current list of active segmentations from ongoing experiments.
Then, using dynamic task generation, it spins up a set of identical task groups, one for each segmentation, all running independently and in parallel.
Within each task group:

It first uses sensors to wait for the segmentation’s upstream data to be ready.
Once the upstream data is ready, it kicks off subsequent SQL scripts that use the segmentation’s SQL query and join columns to connect segment data to experimental units, and aggregate them for each experiment.
Finally, it outputs the segmented experiment results to the Experiment Results table that is then used to display the results in the Catapult UI.

Here is a side-by-side comparison of our data pipeline before and after using dynamic task generation:

Overall, this new design offers several benefits:

Automated processing: New segmentations are automatically detected and processed in the data pipeline. No code changes required!
Improved failure isolation: Each segmentation runs independently, so if one fails, it doesn’t affect the others, making the pipeline more reliable.
Simplified downstream dependencies: Each segmentation has a unique name, allowing downstream jobs to wait for the specific tasks they need. This means they can start as soon as their required segmentations are ready, without having to wait around for the entire batch to finish.

The Impact: 3X more segmentations, faster experiment insights
Within the first three months of launching the new Segmentation Creation tool, experimenters have created 34 new segmentations. The number of segmentations our experimentation system supported more than tripled, from 13 hard-coded segmentations to a total of 47 segmentations. We knew our experimenters would love the ability to define their own segmentations — but we didn’t expect them to love it this much!
Since the launch, teams across Etsy are gaining insights faster than before. In fact, 78% of experimenters say they are saving 2-5 hours a month by analyzing their experiments with custom segmentations in Catapult. Having a Segmentation Library as a single source of truth means that teams can easily reuse existing segmentations instead of reinventing the wheel every time, making it way easier to share insights across different business areas. Plus, with faster access to experiment insights, teams are able to make decisions and iterate on their products more quickly than ever!
Conclusion: Building a better Etsy through flexible experimentation
In summary, by enabling experimenters to define their own segmentations — and building data pipelines smart enough to adapt to those experimenter-defined ideas, we’ve given teams across Etsy the power to shape their own experiment analysis and get actionable insights quickly.
Why does this matter? Because every bit of flexibility we add to experimentation leads to a better experience for Etsy buyers and sellers. When teams can iterate quickly and confidently, they build products that genuinely serve our community — helping Etsy thrive as a platform where creativity and commerce meet.
Some key lessons we learned along the way:

Moving from hard-coded logic to a configuration-driven pipeline has truly opened up tons of new possibilities and expanded what our experimentation platform can do. We’re already using this approach to make other parts of experimentation more flexible, using Segmentations as an example to follow.
Having a group of experimenters as early “pilot testers” was a key part of our success. They worked closely with us from the start, testing out early versions of the tool and giving us honest feedback, which helped us build the Segmentation Creation Tool to be as user-friendly as possible. Our pilot testers have also become champions who helped spread the word on this feature to other teams. The segmentations they have created served as examples for others to follow. Thank you, our pilot testers!
Although Segmentations has been adopted by many teams across Etsy since its release, we know there are still teams out there who haven’t discovered this tool yet! We are committed to finding new ways to share the knowledge and help teams discover how this tool can unlock the insights they need.

We hope that our learnings will inspire you to turn experiment analysis from a fixed process into a flexible, self-serviceable platform that can keep up with whatever questions your team asks next!

Source: https://www.etsy.com/codeascraft/unlocking-faster-insights-with-experimenter-defined-segmentations?utm_source=OpenGraph&utm_medium=PageTools&utm_campaign=Share