维基百科对AI忧心忡忡

维基百科对AI忧心忡忡

2025-10-22Technology
--:--
--:--
雷总
各位小王,你好!我是雷总。今天是10月22日星期三,23点56分。
董小姐
我是董小姐。欢迎收听专属于你的Goose Pod,今天我们来聊聊维基百科对AI的深切忧虑。
雷总
董小姐,维基百科最近可是“忧心忡忡”啊,他们发现人类页面浏览量下降了8%!
董小姐
是啊雷总,这主要是因为AI搜索摘要和社交媒体。更糟的是,他们还发现大量机器人伪装成人类,偷偷抓取内容。
雷总
没错,这些AI公司用机器人抓取数据训练模型,却不引导用户回流。维基百科的米勒对此很担忧。
董小姐
这种“只取不还”的行为,会减少维基百科的志愿者和捐助者。这让我想起DeepMind的机器人AI,技术强大,但也带来新的伦理挑战。
雷总
确实,AI的进步令人惊叹,但维基百科这种开放知识平台,正面临被“吸血”的困境。
董小姐
如果不解决流量回流问题,维基百科的生态系统可能真的会受损。
雷总
董小姐,维基百科的忧虑,其实源于它与搜索巨头们复杂的历史关系。
董小姐
是啊,雷总。从90年代早期搜索萌芽,到1998年谷歌凭借算法崛起,占据全球近九成市场,维基百科的内容一直是重要的“知识源”。
雷总
没错,谷歌的知识面板、Siri、Alexa等,都大量依赖维基百科的数据。它几乎成了大科技公司的知识“基础设施”。
董小姐
所以维基媒体基金会才推出“维基媒体企业”项目,希望能向这些大公司收取内容使用费,建立更正式的合作关系。
雷总
但这在社区内部引发了争议。志愿者担心这违背开放理念,基金会的财务透明度也屡遭质疑。
董小姐
确实,维基百科提供了海量知识,现在是如何平衡公共价值与商业利用,同时维护其社区精神的难题。
雷总
董小姐,维基百科的担忧,其实是内容所有者与AI公司冲突的缩影。《纽约时报》就起诉OpenAI和微软版权侵权。
董小姐
是啊雷总,他们指控AI未经授权,大量使用新闻内容训练模型,直接挑战了版权底线。
雷总
不只如此,桌面游戏行业也感“AI绝望”,原创作品恐被“山寨”。电影制片厂也要求AI公司付费授权。
董小姐
但科技公司认为,为AI提供例外能促进创新。这种矛盾,各国政府正努力立法平衡。
雷总
没错,这场关于AI内容使用权的博弈,将重塑我们对版权的理解。
雷总
董小姐,AI对维基百科的冲击,究竟有多大?
董小姐
雷总,维基百科财务稳健,全凭用户热爱。但LLM用它训练数据却不回馈,威胁其可持续性。
雷总
没错,这引发伦理问题。志愿者质疑,无偿贡献被“价值数十亿科技公司收割”,积极性受挫。
董小姐
这种“只取不还”可能损害维基百科声誉。
雷总
所以,大科技公司需更高透明度和问责制。
雷总
董小姐,维基百科未来如何应对AI?
董小姐
雷总,他们三年AI战略,核心是AI辅助编辑,提升效率。
雷总
比如协助审核、翻译,让志愿者专注内容,强调开源。
董小姐
没错,与AI搜索合作,维基百科仍能发挥作用。
雷总
今天的讨论到此结束。
董小姐
维基百科的挑战,是AI时代知识产权的缩影。
雷总
感谢小王收听Goose Pod。
董小姐
我们下次再见!

### **News Summary: Wikipedia's Concerns Over AI Impact** **Metadata:** * **News Title**: Wikipedia Is Getting Pretty Worried About AI * **Report Provider/Author**: John Herrman, New York Magazine (nymag.com) * **Date/Time Period Covered**: The article discusses observations and data from **May 2025** through the "past few months" leading up to its publication on **October 18, 2025**, with comparisons to **2024**. * **News Identifiers**: Topic: Artificial Intelligence, Technology. **Main Findings and Conclusions:** Wikipedia has identified that a recent surge in website traffic, initially appearing to be human, was largely composed of sophisticated bots. These bots, often working for AI firms, are scraping Wikipedia's content for training and summarization. This bot activity has masked a concurrent decline in actual human engagement with the platform, raising concerns about its sustainability and the future of online information access. **Key Statistics and Metrics:** * **Observation Start**: Around **May 2025**, unusually high amounts of *apparently human* traffic were first observed on Wikipedia. * **Data Reclassification Period**: Following an investigation and updates to bot detection systems, Wikipedia reclassified its traffic data for the period of **March–August 2025**. * **Bot-Driven Traffic**: The reclassification revealed that much of the high traffic during **May and June 2025** was generated by bots designed to evade detection. * **Human Pageview Decline**: After accounting for bot traffic, Wikipedia is now seeing declines in human pageviews. This decrease amounts to roughly **8%** when compared to the same months in **2024**. **Analysis of the Problem and Significant Trends:** * **AI Scraping for Training**: Bots are actively scraping Wikipedia's extensive and well-curated content to train Large Language Models (LLMs) and other AI systems. * **User Diversion by AI Summaries**: The rise of AI-powered search engines (like Google's AI Overviews) and chatbots provides direct summaries of information, often eliminating the need for users to click through to the original source like Wikipedia. This shifts Wikipedia's role from a primary destination to a background data source. * **Competitive Content Generation**: AI platforms are consuming Wikipedia's data and repackaging it into new products that can be directly competitive, potentially making the original source obsolete or burying it under AI-generated output. * **Evolving Web Ecosystem**: Wikipedia, founded as a stand-alone reference, has become a critical dataset for the AI era. However, AI platforms are now effectively keeping users away from Wikipedia even as they explicitly use and reference its materials. **Notable Risks and Concerns:** * **"Death Spiral" Threat**: A primary concern is that a sustained decrease in real human visits could lead to fewer contributors and donors. This situation could potentially send Wikipedia, described as "one of the great experiments of the web," into a "death spiral." * **Impact on Contributors and Donors**: Reduced human traffic directly threatens the volunteer base and financial support essential for Wikipedia's operation and maintenance. * **Source Reliability Questions**: The article raises a philosophical point about AI chatbots' reliability if Wikipedia itself is considered a tertiary source that synthesizes information. **Important Recommendations:** * Marshall Miller, speaking for the Wikipedia community, stated: "We welcome new ways for people to gain knowledge. However, LLMs, AI chatbots, search engines, and social platforms that use Wikipedia content must encourage more visitors to Wikipedia." This highlights a call for AI developers and platforms to direct traffic back to the original sources they utilize. **Interpretation of Numerical Data and Context:** The numerical data points to a critical shift in how Wikipedia's content is accessed and utilized. The observation of high traffic in **May 2025** was an initial indicator of an anomaly. The subsequent reclassification of data for **March–August 2025** provided the concrete evidence that bots, not humans, were responsible for the surge, particularly in **May and June 2025**. The **8% decrease** in human pageviews, measured against **2024** figures, quantifies the real-world impact: fewer people are visiting Wikipedia directly, a trend exacerbated by AI's ability to summarize and present information without sending users to the source. This trend poses a significant risk to Wikipedia's operational model, which relies on human engagement and support.

Wikipedia Is Getting Pretty Worried About AI

Read original at New York Magazine

The free encyclopedia took a look at the numbers and they aren’t adding up. By , a tech columnist at Intelligencer Formerly, he was a reporter and critic at the New York Times and co-editor of The Awl. Photo: Wikimedia Over at the official blog of the Wikipedia community, Marshall Miller untangled a recent mystery.

“Around May 2025, we began observing unusually high amounts of apparently human traffic,” he wrote. Higher traffic would generally be good news for a volunteer-sourced platform that aspires to reach as many people as possible, but it would also be surprising: The rise of chatbots and the AI-ification of Google Search have left many big websites with fewer visitors.

Maybe Wikipedia, like Reddit, is an exception? Nope! It was just bots: This [rise] led us to investigate and update our bot detection systems. We then used the new logic to reclassify our traffic data for March–August 2025, and found that much of the unusually high traffic for the period of May and June was coming from bots that were built to evade detection … after making this revision, we are seeing declines in human pageviews on Wikipedia over the past few months, amounting to a decrease of roughly 8% as compared to the same months in 2024.

To be clearer about what this means, these bots aren’t just vaguely inauthentic users or some incidental side effect of the general spamminess of the internet. In many cases, they’re bots working on behalf of AI firms, going undercover as humans to scrape Wikipedia for training or summarization. Miller got right to the point.

“We welcome new ways for people to gain knowledge,” he wrote. “However, LLMs, AI chatbots, search engines, and social platforms that use Wikipedia content must encourage more visitors to Wikipedia.” Fewer real visits means fewer contributors and donors, and it’s easy to see how such a situation could send one of the great experiments of the web into a death spiral.

Arguments like this are intuitive and easy to make, and you’ll hear them beyond the ecosystem of the web: AI models ingest a lot of material, often without clear permission, and then offer it back to consumers in a form that’s often directly competitive with the people or companies that provided it in the first place.

Wikipedia’s authority here is bolstered by how it isn’t trying to make money — it’s run by a foundation, not an established commercial entity that feels threatened by a new one — but also by its unique position. It was founded as a stand-alone reference resource before settling ambivalently into a new role: A site that people mostly just found through Google but in greater numbers than ever.

With the rise of LLMs, Wikipedia became important in a new way as a uniquely large, diverse, well-curated data set about the world; in return, AI platforms are now effectively keeping users away from Wikipedia even as they explicitly use and reference its materials. Here’s an example: Let’s say you’re reading this article and become curious about Wikipedia itself — its early history, the wildly divergent opinions of its original founders, its funding, etc.

Unless you’ve been paying attention to this stuff for decades, it may feel as if it’s always been there. Surely, there’s more to it than that, right? So you ask Google, perhaps as a shortcut for getting to a Wikipedia page, and Google uses AI to generate a blurb that looks like this: This is an AI Overview that summarizes, among other things, Wikipedia.

Formally, it’s pretty close to an encyclopedia article. With a few formatting differences — notice the bullet-point AI-ese — it hits a lot of the same points as Wikipedia’s article about itself. It’s a bit shorter than the top section of the official article and contains far fewer details. It’s fine!

But it’s a summary of a summary. The next option you encounter still isn’t Wikipedia’s article — that shows up further down. It’s a prompt to “Dive deeper in AI Mode.” If you do that, you see this: It’s another summary, this time with a bit of commentary. (Also: If Wikipedia is “generally not considered a reliable source itself because it is a tertiary source that synthesizes information from other places,” then what does that make a chatbot?

) There are links in the form of footnotes, but as Miller’s post suggests, people aren’t really clicking them. Google’s treatment of Wikipedia’s autobiography is about as pure an example as you’ll see of AI companies’ effective relationship to the web (and maybe much of the world) around them as they build strange, complicated, but often compelling products and deploy them to hundreds of millions of people.

To these companies, it’s a resource to be consumed, processed, and then turned into a product that attempts to render everything before it is obsolete — or at least to bury it under a heaping pile of its own output. Wikipedia Is Getting Pretty Worried About AI

Analysis

Conflict+
Related Info+
Core Event+
Background+
Impact+
Future+

Related Podcasts