Why Stack Overflow and Cloudflare launched a pay-per-crawl model
Inside the pay-per-crawl model colaunched by Stack Overflow and Cloudflare.
February 19, 2026
Why Stack Overflow and Cloudflare launched a pay-per-crawl model
Inside the pay-per-crawl model colaunched by Stack Overflow and Cloudflare.
In this episode of Leaders of Code, Stack Overflow’s Janice Manningham and Josh Zhang sit down with Cloudflare VP Will Allen to discuss the innovative pay-per-crawl model co-launched by their organizations. They explore how the rise of AI has disrupted the traditional “open versus block” internet model, creating a need for platforms to protect their content and data from commercial exploitation while maintaining community access.The discussion also:
- Explores the future of the bot ecosystem, emphasizing the importance of putting publishers back in the driver’s seat to decide how their content is accessed and monetized.
- Explains the technical implementation of the pay-per-crawl system, which uses Cloudflare’s bot categorization and WAF rules to serve a 402 “Payment Required” message to specific crawlers.
- Highlights the strategic value of data licensing, comparing comprehensive enterprise contracts with the more flexible, programmatic pay-per-use access enabled by the new model.
Notes
- Connect with Will Allen, Janice Manningham and Josh Zhang on LinkedIn.
- Learn more about Stack Overflow Data Licensing here.
TRANSCRIPT
Eira May:
Hello and welcome to another episode of Leaders of Code on the Stack Overflow Podcast. If this is your first time listening, this is a segment where we get senior engineering leaders in the same room, same virtual room anyway, and we let them ask each other questions about the work they're doing, how they build teams, and the biggest challenges they're facing right now. My name is Eira May. I am the B2B editor at Stack Overflow, and today I'm joined by two of my colleagues at Stack, strategic product leader, Janice Manningham and site reliability engineer, Josh Zhang. How are you guys doing today?
Janice Manningham:
Hey, Eira.
Josh Zhang:
Good.
Janice Manningham:
Nice to see you.
Eira May:
Thanks for joining us. We also, in addition to Janice and Josh, we have a guest today, that's Will Allen. He's a VP at Cloudflare. Will, welcome to the show.
Will Allen:
Thanks so much for having me. Great to be here.
Eira May:
Great to have you. So we wanted to get you in the virtual room today because we wanted to talk about a new Pay Per Crawl model that Cloudflare and Stack Overflow are co-launching. So this is a model that is an effort to monetize public data and get bots to pay for crawls. So I'll turn it over from there to Janice and let her tell you all a little bit about this project.
Janice Manningham:
Thanks, Eira. So let's talk about Pay Per Crawl. First, I want to emphasize that it's not to be confused with what others might have heard, which is paper crawl, as in a piece of paper and crawl. It is in a more correct form, paying for each type of crawl or bot who is attempting to access a site. Why we are talking Pay Per Crawl is because with the rise of AI crawlers, they've fundamentally broken what I believe is the old internet, which is open versus block models.
Traditionally, we've had a lot of content platforms like Stack Overflow operate in a way where we have open access generally for bots to be able to take advantage of our public content. We block for any malicious activity, but with the rise of AI products looking to take data from model's training, we have found ourselves in a position in the last year or so to revisit that approach and look at ways for us to protect our data against commercial usage for model training, but also still allowing access to our community. That's why companies like Stack and Cloudflare have started to talk openly about this pay access to our content and other content providers as well.
I want to actually bring in Josh to explain a little bit about historically what Stack has done with our bot traffic and the work that he has done to manage this set of AI crawlers up to date and before we really worked closely with Will and Cloudflare.
Josh Zhang:
Sure. So historically, before the advent of AI, a lot of bots would actually just try to bring the website down, right? So one of the reasons we switched to Cloudflare was for their DDoS mitigation. And I basically specialized in protecting the site from bots that brought the site down. But with the advent of AI, bots evolved because now there's money in scraping and sending as much traffic as you can to a website, but masking it as normal traffic. So on one hand, they're not taking your site down, but they are just sending you a ton of extra traffic that you ultimately have to pay for that isn't bringing any value. In fact, they could take away your value because if they're taking the data but not kind of sending traffic back to your site, it's not a good virtuous cycle, right? So with bots getting more and more sophisticated, it's getting really complicated to try to basically weed out the good traffic from the bad.
In fact, very recently, the bots are getting so good, they're fooling advertisers. They're basically eating up ad impressions because it used to be they would scrape using a headless ... Just like curl or very simple programmatic ways to access a site, and that wouldn't trigger any kind of ad impression, which is at least something. But now they're using headless browsers and other methods to basically completely fool you and we might be able to catch them, but ultimately they're eating up ad impressions, which is also really terrible back to the advertisers themselves. So this is really ever going arms race against basically bots that are just trying to extract as much information from you as possible while basically trying to pretend to be legitimate traffic.
Janice Manningham:
Yeah. Just like to feed off of that as well. I think we were kind of playing accountant masking for a while, don't you think, Josh? Where we were doing more of an active block approach where ad hoc we identify some malicious activity from specific user agents or bots and then we just add them to a block list. And then we continued doing that approach, and that block list started to get a little bit unwieldy.
Josh Zhang:
Definitely hard to scale that approach. We're basically just play whack-a-mole, right? And there are other tools you can use like fingerprinting and Cloudflare's bot score and things like that. But of course it's an adversarial relationship so the people writing the bots know what they have to defeat. So you're basically just, it's an ongoing cycle of trying to catch the next technology wave and things like that. And unfortunately, in a lot of cases, there's more of them than there are of us.
Janice Manningham:
Totally. And so then I think we looked into using Cloudflare and their existing tools and infrastructures to kind of support us in this new situation and give us the robust tools needed to manage that. So I wanted to talk to you, Will, about the product that you present with Cloudflare and how you've helped Stack Overflow with that managing bots and allowing us to control traffic.
Will Allen:
[...]