Wikipedia Is Getting Pretty Worried About AI

Wikipedia Is Getting Pretty Worried About AI

2025-10-22Technology
--:--
--:--
Mask
Good morning 王康, I'm Mask, and this is Goose Pod for you. Today is Wednesday, October 22nd. I'm Taylor Weaver, and we are here to discuss a truly intriguing topic: Wikipedia Is Getting Pretty Worried About AI.
Taylor Weaver
It's a fascinating subject, Mask! Wikipedia, the bastion of open knowledge, finding itself on the defensive. It really makes you wonder about the shifting landscape of information in the age of artificial intelligence, doesn't it?
Mask
Absolutely, Taylor, and the numbers are stark. Wikipedia is seeing a significant decline in human pageviews, an 8% year-over-year decrease. This isn't just a minor fluctuation; it's a direct hit to one of the web's most critical resources, all thanks to AI search summaries and social video. It’s almost like a digital parasite, isn’t it?
Taylor Weaver
That's such a vivid way to put it, Mask! And it's true, Marshall Miller from the Wikimedia Foundation pointed out that after updating their bot detection, they found a huge chunk of their 'unusually high traffic' in May and June was from bots built specifically to evade detection. So, it wasn't just organic growth, it was machines pretending to be people!
Mask
Bots built to evade detection, Taylor, that's the key. It's not just incidental spam; these are bots working for AI firms, going undercover to scrape Wikipedia for training data. This effectively keeps real users away, even as they explicitly use Wikipedia's content. It's a classic case of taking without giving back, threatening Wikipedia's very existence.
Taylor Weaver
And the implications are huge. As Miller put it, with fewer visits, there's less incentive for volunteers to enrich content, and fewer individual donors to support the work. This could send one of the internet's greatest experiments into a death spiral. It’s a real concern for the future of free, human-curated knowledge.
Mask
It's a classic disruptive cycle, really. Remember the early internet? We went from manually indexing the web—imagine that, Taylor, literally sifting through pages by hand—to the emergence of search engines. Archie, W3Catalog, WebCrawler, then boom, Google changes everything. It's a testament to necessity breeding invention, but also to how quickly things can become obsolete.
Taylor Weaver
Oh, absolutely! It’s like watching a time-lapse of innovation. From Vannevar Bush's 'memex' concept in 1945 to Archie searching FTP files, it was all about organizing an explosion of information. And then Yahoo! came along, and Google, with its PageRank algorithm, just solidified its dominance. Wikipedia, in a way, became the perfect partner, a vast, well-curated dataset.
Mask
And Big Tech knows it. Google, Apple, Amazon—they all rely on Wikipedia content for everything from knowledge panels in search results to powering Siri and Alexa. It’s the foundational layer for so much of our digital information consumption. They feast on it, yet the source is struggling.
Taylor Weaver
That's where the Wikimedia Enterprise initiative comes in, right? It's Wikipedia's attempt to establish a more formal, almost commercial, relationship with these tech giants, trying to charge for easier electronic access. It’s a pragmatic move, but it has definitely stirred up some controversy within the volunteer community, who see it as straying from their core mission.
Mask
Controversy is an understatement. Many Wikipedians are unhappy, feeling their free contributions are being leveraged by billion-dollar companies without proper reciprocation. It highlights a fundamental tension: the open-source ethos versus the commercial realities of the digital age, especially when donations from Google are flowing in through less-than-transparent channels.
Mask
And this isn't just a Wikipedia problem; it's a wider battle brewing. Look at The New York Times suing OpenAI and Microsoft for copyright infringement. They're alleging that 'massive amounts' of their news stories were used to train ChatGPT. It's the ultimate example of AI companies taking content without explicit permission, then offering it back in a competitive form.
Taylor Weaver
It's a huge legal showdown, Mask. And it echoes concerns across industries. The tabletop game design world is talking about 'AI despair' over intellectual property theft and 'copycat products' showing up online. It really chills creativity when you know your work can just be ingested and regurgitated without credit or compensation.
Mask
Indeed. The audacity of some AI firms is breathtaking. They argue that exceptions for AI will benefit the industry, but content owners, like film studios, are demanding licensing. It's a clash of titans, with governments scrambling to catch up. The EU is moving towards stricter rules, allowing content owners to opt-out, while Japan offers broad exemptions. It's a fragmented global response.
Taylor Weaver
It truly is, and the movie industry is particularly worried about AI scraping copyrighted videos and even pirated content for training. There's a growing consensus that we need guardrails for generative AI, as Himanshu Yadav said, to find a way for everyone to coexist in this new era. The current situation is simply unsustainable for creators.
Mask
The core impact, Taylor, is that Wikipedia's financial sustainability is now directly threatened by these Large Language Models. It's this incredible, vast dataset that AI applications feed on, yet the very act of feeding is undermining the source. It's an ethical tightrope walk, and Wikipedia's unique model is particularly vulnerable.
Taylor Weaver
It's a paradox, isn't it? Wikipedia is financially successful, they're conservative with money, and they famously reject advertising. Their model works because people love it. But now, as one editor put it, 'Our contributions are being harvested by tech companies worth billions, yet we continue working for free.' It makes it hard to justify the immense time investment.
Mask
Precisely. The volunteers, the lifeblood of Wikipedia, are questioning their roles. This erosion of motivation, combined with the potential for AI-generated summaries to manipulate or degrade Wikipedia's reputation, could lead to a loss of nuanced human judgment in content creation. It's a dangerous path for the integrity of information.
Taylor Weaver
And it highlights the need for greater transparency from big tech. If Wikipedia is to sustain itself, especially as a trusted source, it needs to balance embracing technological advancement with safeguarding its editorial standards. Matthew Vetter's work on 'Sustaining Wikimedia' really emphasizes these challenges.
Mask
So, what's the play here? Wikipedia's new three-year strategy aims to integrate AI to 'assist human editors, not replace them.' But frankly, Taylor, 'assist' can quickly morph into 'automate' in the pursuit of efficiency. How do they truly ensure content integrity remains paramount when AI is streamlining everything from moderation to translation?
Taylor Weaver
That's the critical question, Mask. The strategy focuses on removing technical barriers for editors and moderators, using open-source models, and prioritizing content integrity over pure generation. They aim to enhance tools and support underrepresented languages, guiding new editors with AI. It's about empowering humans, not replacing them, by making tasks easier.
Mask
But with AI, 'easier' can sometimes mean 'less human oversight.' The real test will be if they can adapt, forming partnerships with AI search engines rather than competing. If Wikipedia can continue to be the core source for AI-generated answers, while still maintaining its crowdsourced accuracy, then perhaps it can navigate this crossroads. It's a big 'if'.
Mask
That's the end of today's discussion. Thank you, 王康, for listening to Goose Pod. We appreciate your time and engagement.
Taylor Weaver
Indeed, and it's a conversation that will undoubtedly continue to evolve. Until next time, stay curious and keep questioning the sources of your knowledge. See you tomorrow on Goose Pod!

### **News Summary: Wikipedia's Concerns Over AI Impact** **Metadata:** * **News Title**: Wikipedia Is Getting Pretty Worried About AI * **Report Provider/Author**: John Herrman, New York Magazine (nymag.com) * **Date/Time Period Covered**: The article discusses observations and data from **May 2025** through the "past few months" leading up to its publication on **October 18, 2025**, with comparisons to **2024**. * **News Identifiers**: Topic: Artificial Intelligence, Technology. **Main Findings and Conclusions:** Wikipedia has identified that a recent surge in website traffic, initially appearing to be human, was largely composed of sophisticated bots. These bots, often working for AI firms, are scraping Wikipedia's content for training and summarization. This bot activity has masked a concurrent decline in actual human engagement with the platform, raising concerns about its sustainability and the future of online information access. **Key Statistics and Metrics:** * **Observation Start**: Around **May 2025**, unusually high amounts of *apparently human* traffic were first observed on Wikipedia. * **Data Reclassification Period**: Following an investigation and updates to bot detection systems, Wikipedia reclassified its traffic data for the period of **March–August 2025**. * **Bot-Driven Traffic**: The reclassification revealed that much of the high traffic during **May and June 2025** was generated by bots designed to evade detection. * **Human Pageview Decline**: After accounting for bot traffic, Wikipedia is now seeing declines in human pageviews. This decrease amounts to roughly **8%** when compared to the same months in **2024**. **Analysis of the Problem and Significant Trends:** * **AI Scraping for Training**: Bots are actively scraping Wikipedia's extensive and well-curated content to train Large Language Models (LLMs) and other AI systems. * **User Diversion by AI Summaries**: The rise of AI-powered search engines (like Google's AI Overviews) and chatbots provides direct summaries of information, often eliminating the need for users to click through to the original source like Wikipedia. This shifts Wikipedia's role from a primary destination to a background data source. * **Competitive Content Generation**: AI platforms are consuming Wikipedia's data and repackaging it into new products that can be directly competitive, potentially making the original source obsolete or burying it under AI-generated output. * **Evolving Web Ecosystem**: Wikipedia, founded as a stand-alone reference, has become a critical dataset for the AI era. However, AI platforms are now effectively keeping users away from Wikipedia even as they explicitly use and reference its materials. **Notable Risks and Concerns:** * **"Death Spiral" Threat**: A primary concern is that a sustained decrease in real human visits could lead to fewer contributors and donors. This situation could potentially send Wikipedia, described as "one of the great experiments of the web," into a "death spiral." * **Impact on Contributors and Donors**: Reduced human traffic directly threatens the volunteer base and financial support essential for Wikipedia's operation and maintenance. * **Source Reliability Questions**: The article raises a philosophical point about AI chatbots' reliability if Wikipedia itself is considered a tertiary source that synthesizes information. **Important Recommendations:** * Marshall Miller, speaking for the Wikipedia community, stated: "We welcome new ways for people to gain knowledge. However, LLMs, AI chatbots, search engines, and social platforms that use Wikipedia content must encourage more visitors to Wikipedia." This highlights a call for AI developers and platforms to direct traffic back to the original sources they utilize. **Interpretation of Numerical Data and Context:** The numerical data points to a critical shift in how Wikipedia's content is accessed and utilized. The observation of high traffic in **May 2025** was an initial indicator of an anomaly. The subsequent reclassification of data for **March–August 2025** provided the concrete evidence that bots, not humans, were responsible for the surge, particularly in **May and June 2025**. The **8% decrease** in human pageviews, measured against **2024** figures, quantifies the real-world impact: fewer people are visiting Wikipedia directly, a trend exacerbated by AI's ability to summarize and present information without sending users to the source. This trend poses a significant risk to Wikipedia's operational model, which relies on human engagement and support.

Wikipedia Is Getting Pretty Worried About AI

Read original at New York Magazine

The free encyclopedia took a look at the numbers and they aren’t adding up. By , a tech columnist at Intelligencer Formerly, he was a reporter and critic at the New York Times and co-editor of The Awl. Photo: Wikimedia Over at the official blog of the Wikipedia community, Marshall Miller untangled a recent mystery.

“Around May 2025, we began observing unusually high amounts of apparently human traffic,” he wrote. Higher traffic would generally be good news for a volunteer-sourced platform that aspires to reach as many people as possible, but it would also be surprising: The rise of chatbots and the AI-ification of Google Search have left many big websites with fewer visitors.

Maybe Wikipedia, like Reddit, is an exception? Nope! It was just bots: This [rise] led us to investigate and update our bot detection systems. We then used the new logic to reclassify our traffic data for March–August 2025, and found that much of the unusually high traffic for the period of May and June was coming from bots that were built to evade detection … after making this revision, we are seeing declines in human pageviews on Wikipedia over the past few months, amounting to a decrease of roughly 8% as compared to the same months in 2024.

To be clearer about what this means, these bots aren’t just vaguely inauthentic users or some incidental side effect of the general spamminess of the internet. In many cases, they’re bots working on behalf of AI firms, going undercover as humans to scrape Wikipedia for training or summarization. Miller got right to the point.

“We welcome new ways for people to gain knowledge,” he wrote. “However, LLMs, AI chatbots, search engines, and social platforms that use Wikipedia content must encourage more visitors to Wikipedia.” Fewer real visits means fewer contributors and donors, and it’s easy to see how such a situation could send one of the great experiments of the web into a death spiral.

Arguments like this are intuitive and easy to make, and you’ll hear them beyond the ecosystem of the web: AI models ingest a lot of material, often without clear permission, and then offer it back to consumers in a form that’s often directly competitive with the people or companies that provided it in the first place.

Wikipedia’s authority here is bolstered by how it isn’t trying to make money — it’s run by a foundation, not an established commercial entity that feels threatened by a new one — but also by its unique position. It was founded as a stand-alone reference resource before settling ambivalently into a new role: A site that people mostly just found through Google but in greater numbers than ever.

With the rise of LLMs, Wikipedia became important in a new way as a uniquely large, diverse, well-curated data set about the world; in return, AI platforms are now effectively keeping users away from Wikipedia even as they explicitly use and reference its materials. Here’s an example: Let’s say you’re reading this article and become curious about Wikipedia itself — its early history, the wildly divergent opinions of its original founders, its funding, etc.

Unless you’ve been paying attention to this stuff for decades, it may feel as if it’s always been there. Surely, there’s more to it than that, right? So you ask Google, perhaps as a shortcut for getting to a Wikipedia page, and Google uses AI to generate a blurb that looks like this: This is an AI Overview that summarizes, among other things, Wikipedia.

Formally, it’s pretty close to an encyclopedia article. With a few formatting differences — notice the bullet-point AI-ese — it hits a lot of the same points as Wikipedia’s article about itself. It’s a bit shorter than the top section of the official article and contains far fewer details. It’s fine!

But it’s a summary of a summary. The next option you encounter still isn’t Wikipedia’s article — that shows up further down. It’s a prompt to “Dive deeper in AI Mode.” If you do that, you see this: It’s another summary, this time with a bit of commentary. (Also: If Wikipedia is “generally not considered a reliable source itself because it is a tertiary source that synthesizes information from other places,” then what does that make a chatbot?

) There are links in the form of footnotes, but as Miller’s post suggests, people aren’t really clicking them. Google’s treatment of Wikipedia’s autobiography is about as pure an example as you’ll see of AI companies’ effective relationship to the web (and maybe much of the world) around them as they build strange, complicated, but often compelling products and deploy them to hundreds of millions of people.

To these companies, it’s a resource to be consumed, processed, and then turned into a product that attempts to render everything before it is obsolete — or at least to bury it under a heaping pile of its own output. Wikipedia Is Getting Pretty Worried About AI

Analysis

Conflict+
Related Info+
Core Event+
Background+
Impact+
Future+

Related Podcasts