Cloudflare Drops a Feature to Stop AI Bots Scraping Your Website
Cloudflare is offering its customers a new feature to block AI bots, scrapers, and crawlers with a single click. This feature is designed to help preserve a safe Internet for content creators by preventing unauthorized access to their content by AI companies for training models or running inference.
To help preserve a safe Internet for content creators, we’ve just launched a brand new “easy button” to block all AI bots. It’s available for all customers, including those on our free tier.
This broad availability ensures that content creators of all sizes can protect their work from unauthorized AI scraping.

Why Cloudflare is Offering This
- The demand for content to train AI models has skyrocketed
- Not all AI companies are transparent about their web scraping activities
- There have been instances of AI companies using content without proper licensing or consent
- Cloudflare customers have expressed a strong desire to block AI bots from visiting their websites
How to Block AI Scraping Bots in Cloudflare
- Log in to your Cloudflare dashboard
- Navigate to the Security > Bots section
- Find the toggle labeled "AI Scrapers and Crawlers"
- Click the toggle to enable the feature
Once enabled, this feature will automatically block all identified AI bots from accessing your website. Cloudflare will continue to update this feature over time as new AI bot fingerprints are identified, ensuring ongoing protection against web scraping for AI training purposes.
AI Bot Activity
The most active AI bots on Cloudflare's network include:
- Bytespider (ByteDance/TikTok): Highest in request volume and frequently blocked
- GPTBot (OpenAI): Second in crawling extent and blocking
- Amazonbot: High in request volume
- ClaudeBot: Recently increased in request volume
AI bots accessed about 39% of the top one million Internet properties using Cloudflare, but only 2.98% of these properties took measures to block or challenge those requests.
What's there to say about some shady Chinese scraping bots if even Microsoft's AI Chief openly suggests your web content is 'freeware' and is ok to use for any purpose. So just because content is made available online it's automatically in the public domain? As Forbes has pointed out, this would kind of make Windows free as well because it's available for download online.
Mustafa Suleyman has made this statement during an interview with CNBC’s Andrew Ross Sorkin, where he was asked about the alleged theft of intellectual property by AI companies.
No wonder content owners are worried about shameless scraping of their stuff.
Should You Block AI Bots from Your Website?
There are differing opinions on whether websites should block AI bots. Here are some key arguments for and against.
Arguments Against Blocking AI Bots
- Blocking AI can limit visibility and reach of your content as some AI models like Perplexity and Microsoft Co-pilot do provide attribution and links (how often do people open them it's another question)
- When people do click on those links this can help improve SEO and drive traffic to your site
- Blocking AI may thus cause you to miss out on potential opportunities, like bookings or sales
Arguments For Blocking AI Bots
- Many AI models don't provide attribution or links, potentially reducing traffic and brand visibility
- AI bots can strain server resources and impact website performance
- There are privacy concerns related to AI bot data collection
- For content creators, AI replication of their work could lead to net loss
Balanced Approaches
Some experts suggest more nuanced strategies:
- Monitor the situation and wait before making a decision
- Consider blocking only on specific pages with unique or sensitive content
- Evaluate based on your business model and type of content
Blocking Bytespider Alone in Cloudflare
There are ways to specifically block the Bytespider bot using Cloudflare, if you're unsure about cutting off the whole bunch of AI crawlers.
Simply create a custom WAF rule to block requests with the User Agent containing "Bytespider"
Multiple sources recommend blocking Bytespider due to its aggressive behavior and disregard for standard crawling etiquette. Using Cloudflare's WAF or firewall rules to block based on the User Agent appears to be the most effective and commonly suggested method.
So ultimately, the decision to block AI bots depends on individual circumstances and should be carefully considered in light of evolving AI technologies and their impact on web traffic and content usage.
I myself is on the fence about this issue, but I'm really glad that Cloudflare, which I've been a happy user of for years is once again introducing a feature clients will appreciate to have available.
Until we know what the click rate is from attribution links within the AI-powered tools, it's hard to make a decision whether they're a net benefit or a dead weight for content creators.
Published: Jul 4, 2024 at 7:58 PM