Cloudflare 新政颠覆互联网,AI 巨头迎来噩耗

Cloudflare 新政颠覆互联网,AI 巨头迎来噩耗

2025-07-05Technology
--:--
--:--
纪飞
听众朋友们大家好,欢迎收听新一期的<Goose Pod>。我是纪飞。
国荣
大家好,我是国荣。纪飞,最近有没有感觉网上冲浪变慢了?这可能不是你的错觉哦。今天我们就来聊聊一个可能让整个互联网抖三抖的大新闻。
纪飞
没错。这事儿的主角是Cloudflare,一个咱们平时可能不直接接触,但它控制着全球差不多五分之一网络流量的巨头。他们最近出了个新政策,可以说是平地一声雷。
国荣
对!简单来说,他们给所有AI机器人,比如那些训练大模型的爬虫,默认亮了红牌,门口挂上‘谢绝参观’的牌子。除非网站主人亲自发请柬,否则一律不准进。
纪飞
嗯,这个操作直接颠覆了过去的“默认欢迎,不爽再请走”的模式。更有意思的是,他们还推出了一个叫“按次付费抓取”的计划。这潜台词不就是…AI公司们,你们的好日子到头了,以后想拿数据,可能得掏钱了。
国荣
说到这儿,大家肯定会好奇,为什么Cloudflare突然有这么大的转变?嗯…其实主要是因为这些AI爬虫,实在太“勤奋”了!
纪飞
(轻笑)'勤奋'得有点过头了。很多站长抱怨说,像OpenAI的GPTBot,跟打了鸡血一样,每秒钟疯狂访问几百次。服务器都快被挤爆了,网站卡得不行。
国荣
这不就是典型的资源滥用嘛。纪飞,你那个小店的比喻就很形象,你再给大家讲讲?
纪飞
哈哈,行。这就好比你的小店,突然涌进来成千上万个只逛不买的“显眼包”,把过道堵得死死的,真正想花钱的顾客反而一个都进不来。网站服务器现在就这感觉,太难了。
国荣
没错。而且,除了服务器压力,更深层次的矛盾在于“内容”和“公平”。你想想,各大新闻机构、内容创作者辛辛苦苦写出来的文章、拍出来的照片,AI公司转手就拿去喂模型了,招呼不打,钱也不给。
纪飞
对,这就触及到了知识产权的核心问题。虽然以前有些判决觉得这算“合理使用”,但在创作者看来,这跟“明抢”也差不多了。所以他们一直在找帮手,这次Cloudflare算是替他们出头了。
纪飞
这一下,矛盾就彻底公开化了。一边是内容创作者联盟,他们觉得终于等来了救星,要求拿回内容的控制权和应得的报酬。Cloudflare的CEO也说,他想建立一个更公平的互联网经济。
国荣
但另一边,AI巨头们可就坐不住了。我记得Meta的高管就放过狠话,说要是用数据都得先拿许可,那AI行业就别干了。他们觉得,海量数据是技术进步的燃料,没燃料,火箭怎么上天?
纪飞
这就是一场高风险的博弈。你看,AI公司一直在背后使劲,游说政府把抓数据定义为“合理使用”。甚至有传言说,美国版权局刚发了份对他们不利的报告,负责人转头就被换了。这水可深了。
国荣
所以你看,一边喊着“为我们的劳动付费!”,另一边辩护“我们需要数据来推动未来!”。Cloudflare这一招,就像往一锅已经快烧开的水里,又扔了块滚烫的石头。真热闹。
纪飞
那最直接的影响是什么呢?我觉得是力量平衡的变化。以前是AI公司占主动,现在,全球五分之一的网页可能一夜之间对它们关上了免费的大门。想再像以前那样随心所欲地拿数据,恐怕没那么容易了。
国荣
嗯,对于那些被AI搞得流量大跌的新闻网站来说,这可能是个天大的好消息。用户习惯了从AI那儿直接看答案,谁还去点原文链接啊?现在这个政策,也许能把一部分流量和收入,重新赶回创作者的口袋里。
纪飞
展望未来,那个“按次付费抓取”系统是关键。说起来也挺有意思的,他们用了一个很古老的技术标准,叫HTTP 402,就是“需要付款”的意思。这等于是给数据明码标价,创造了一个全新的潜在市场。
国荣
是的,所以现在最大的悬念就是,其他的大佬,比如Akamai这些云服务商,会不会跟进。如果大家都这么干,那AI行业的“免费午餐”时代,可能就真的要画上句号了。你怎么看?
纪飞
总而言之,Cloudflare的这一步棋,无疑是给内容创作者和AI巨头之间本已紧张的关系,划下了一条新的楚河汉界。未来的互联网数据生态会怎么演变,值得我们持续关注。今天的讨论就到这里了。
国荣
感谢收听<Goose Pod>,我们明天再见!

Of course. Here is a comprehensive summary of the news article, formatted as requested. ### **Summary of ZDNET Report: Cloudflare's Policy Shift Against AI Crawlers** **News Metadata** * **Title:** Cloudflare just changed the internet, and it’s bad news for the AI giants * **Provider:** ZDNET * **Author:** Steven Vaughan-Nichols * **Publication Date:** July 2, 2025 --- ### **Executive Summary** Effective July 1, the major Content Delivery Network (CDN) Cloudflare has implemented a new default policy to block AI web crawlers from accessing content on its customers' websites. This significant move, which reverses the previous opt-out standard, now requires website owners to explicitly grant permission (opt-in) for AI bots to scrape their data. The policy is a direct response to the aggressive behavior of AI crawlers that overload websites and the widespread, uncompensated use of web content for training AI models. Affecting approximately **20% of the entire web**, this change could fundamentally alter the relationship between content creators and AI companies, potentially forcing the latter to negotiate and pay for data access. --- ### **Key Findings and Policy Changes** #### **1. New Default Policy: Block by Default** * **Effective Date:** Starting July 1. * **Core Change:** For all new websites on its platform, Cloudflare now **blocks AI crawlers by default**. The previous standard required website owners to manually opt-out of being crawled. * **Scope:** The policy impacts Cloudflare's **two million-plus customers**, which collectively represent **20% of the web**. * **Enhanced Detection:** Cloudflare will also use behavioral analysis and machine learning to identify and block "shadow" scrapers that try to hide their identity. #### **2. Rationale for the Change** * **Technical Overload:** Website owners have reported that AI crawlers (e.g., OpenAI's GPTBot, Anthropic's ClaudeBot) are far more aggressive than traditional search bots. * They generate massive request volumes, sometimes hitting sites with **hundreds of requests per second**, causing significant slowdowns. * As an example of high traffic, GoogleBot alone sends over **4.5 billion requests a month** to sites hosted on Vercel. * **Copyright and Compensation:** Publishers and creators are frustrated that AI companies are "strip mining" the web for content to train models without consent or compensation, often ignoring protocols like `robots.txt`. * **Legal Context:** This move comes amid legal battles where courts have sometimes ruled in favor of AI firms (Meta, Anthropic) under the "fair use" doctrine. ZDNET's parent company, Ziff Davis, filed a lawsuit against OpenAI in **April 2025** over alleged copyright infringement. * **Decline in Publisher Traffic:** The rise of AI-powered search and content generation has led to a sharp decline in traffic to original news sources. * **Statistic:** Business Insider's traffic dropped by **55%** between April 2022 and April 2025. * **Prediction:** Nicholas Thompson, CEO of The Atlantic, predicted that his staff should "expect traffic from Google to drop to zero" due to AI. #### **3. Proposed Economic Model: "Pay Per Crawl"** * Cloudflare has launched a program in private beta called **"Pay Per Crawl."** * This system allows publishers to set their own prices for AI companies that wish to scrape their content. * Technically, it will use the **HTTP 402 "Payment Required"** server response, an older but simple-to-implement standard, to manage these paid access requests. --- ### **Industry Reactions and Notable Statements** * **Matthew Prince, Cloudflare CEO:** The policy aims to *"give publishers the control they deserve and build a new economic model that works for everyone—creators, consumers, tomorrow’s AI founders, and the future of the web itself."* * **Nicholas Thompson, The Atlantic CEO:** *"Until now, AI companies have not needed to pay for content licenses because they could simply take it without repercussions. Now they will need to negotiate."* * **Sir Nick Clegg, Meta Executive:** In contrast, the Meta executive and former UK Deputy Prime Minister stated that asking for permission before scraping copyrighted content *"will 'basically kill the AI industry.'"* --- ### **Risks, Concerns, and Future Outlook** * **Shift in Power Dynamics:** The primary impact is a shift of power from AI companies to content publishers. AI firms may no longer be able to freely take data and will be forced to negotiate licenses or pay for access to a significant portion of the internet. * **Regulatory Uncertainty:** The move occurs amidst a contentious debate over AI and copyright. The U.S. Copyright Office's recent report suggested mass scraping does not qualify as fair use, but its head was subsequently fired by the Trump administration and replaced with an attorney with no copyright experience. * **Industry-Wide Implications:** The key question is whether other major CDNs, such as Akamai, will adopt similar policies. For now, the era of unrestricted, free data scraping for AI training has ended for the one-fifth of the internet managed by Cloudflare.

Cloudflare just changed the internet, and it’s bad news for the AI giants

Read original at ZDNET

iStock / Getty Images PlusThe major internet Content Delivery Network (CDN), Cloudflare, has declared war on AI companies. Starting July 1, Cloudflare now blocks by default AI web crawlers accessing content from your websites without permission or compensation.The change addresses a real problem. My own small site, where I track all my stories, Practical Technology, has been slowed dramatically at times by AI crawlers.

It's not just me. Numerous website owners have reported that AI crawlers, such as OpenAI's GPTBot and Anthropic's ClaudeBot, generate massive volumes of automated requests that clog up websites so they're as slow as sludge. GoogleBot alone reports that the cloud-hosting service Vercel bombards the sites it hosts with over 4.

5 billion requests a month. These AI bots often crawl sites far more aggressively than traditional search engine crawlers. They sometimes revisit the same pages every few hours or even hit sites with hundreds of requests per second. While the AI companies deny that their bots are to blame, the evidence tells a different story.

Also: Senate removes ban on state AI regulations from Trump's tax billThus, on behalf of its two million-plus customers, 20% of the web, Cloudflare now blocks AI crawlers. For any new website signing up for its services, AI crawlers will be automatically blocked from accessing its content unless the site owner grants explicit permission.

Additionally, Cloudflare promises to detect "shadow" scrapers — bots that attempt to evade detection — by using behavioral analysis and machine learning. What's good for the AI goose is good for the gander. This move reverses the previous status quo, where website owners had to opt out of AI crawling.

Now, blocking is the default, and AI vendors must request access and clarify their intentions, whether for model training, search, or other uses, before they're allowed in. This change arises not only because of frustrated website owners. Numerous publishing companies, such as The Associated Press, Condé Nast, and ZDNET's own parent company, Ziff Davis, are frustrated that AI companies have been "strip mining" the web for content.

All too often, this has been done without compensation or consent, and sometimes, ignoring standard protocols like robots.txt that are meant to block crawlers. (Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.

)Moreover, recent court cases have ruled in favor of Meta and Anthropic, finding that their use of copyrighted works was legal under the doctrine of fair use. Needless to say, writers, artists, and publishers don't like this one bit. Publishers are still worried that the federal government will give AI free rein to do as it wants with their content.

AI powerhouses such as OpenAI and Google are continuing to lobby the government to classify AI training on copyrighted data as fair use. It's also worth noting that after the Copyright Office released a pre-publication version of its 108-page copyright and AI report, which struck a middle ground by supporting both of these world-class industries that contribute so much to our economic and cultural advancement.

However, it added that while some generative AI probably constitutes a "transformative" use, the mass scraping of all data did not qualify as fair use. The next day, the Trump administration fired the head of the Copyright Office and replaced her with an attorney with no prior experience in copyright law.

Also: The US Copyright Office's new ruling on AI art is here - and it could change everythingGiven all this, it's no wonder that publishers sought an ally in technology.As Cloudflare CEO Matthew Prince said in a statement, its new policy is meant to "give publishers the control they deserve and build a new economic model that works for everyone—creators, consumers, tomorrow's AI founders, and the future of the web itself."

To complement the move to block AI crawlers, Cloudflare has also launched its "Pay Per Crawl" program. This enables publishers to set their own rates for AI companies that want to scrape their content. Also: AI-generated images are a legal mess - and still a very human processThis system is currently in private beta and aims to create a framework where AI firms can pay for access, or be denied if they refuse.

Technically, this will be done by dusting off an old, mostly unused web server response, HTTP 402, which responds with a "Payment Required" error message. This means it should be simple to implement and compatible with existing websites and their infrastructure. Overall, this is a big deal. Thanks to Cloudflare powering such a large portion of the internet, a significant amount of web content could become inaccessible to AI companies unless they negotiate access or pay licensing fees.

As Nicholas Thompson, CEO of The Atlantic, noted, "Until now, AI companies have not needed to pay for content licenses because they could simply take it without repercussions. Now they will need to negotiate." To this point, most AI companies have been actively against paying for content. As Sir Nick Clegg, former deputy UK Prime Minister and Meta executive, said recently, merely asking artists' permission before they scrape copyrighted content will "basically kill the AI industry."

Also: Cloudflare blocks largest DDoS attack - here's how to protect yourselfCloudflare's new policy is a direct response to this approach and the increasing volume and intrusiveness of AI crawlers that have come with it. It's also an attempt to stop the siphoning of traffic that would otherwise go to publishers.

Since the rise of AI, traffic to news sites has plunged. For example, Business Insider's traffic dropped by over half, 55% from April 2022 to April 2025. Left unchecked, Thompson recently predicted that, thanks to AI, the Atlantic staff should expect traffic from Google to drop to zero.What will happen next?

Will the other CDN, such as Akamai, follow suit? Stay tuned. For now, the era of unrestricted AI crawling appears to be ending, well, at least for the fifth of the internet that flows through Cloudflare's pipes.Get the morning's top stories in your inbox each day with our Tech Today newsletter.Featured

Analysis

Phenomenon+
Conflict+
Background+
Impact+
Future+

Related Podcasts