LLMs’ impact on science: Booming publications, stagnating quality

LLMs’ impact on science: Booming publications, stagnating quality

2025-12-26Technology
--:--
--:--
Elon
Good evening Norris, I'm Elon, and this is Goose Pod for you. Today is Friday, December 26th, at 23:15. We are diving into a very important topic today regarding the integrity of scientific research in the age of intelligence.
Taylor Weaver
And I'm Taylor Weaver! We are here to discuss the impact of LLMs on science. It is a story of booming publications but stagnating quality. It is a fascinating narrative of quantity versus merit that we need to explore together, Norris. It is truly a strategic puzzle.
Elon
Researchers at Berkeley and Cornell scanned millions of papers from archives like arXiv. They found that once authors adopt AI, their output skyrockets. It is like adding more boosters to a rocket, but the payload might be getting lighter or even just completely empty and useless.
Taylor Weaver
It is a classic quantity versus quality trade-off! For researchers who are not native English speakers, submission rates nearly doubled. AI helps them overcome that massive language bottleneck. However, the publication rate for these papers actually dropped, which suggests a real struggle with their actual scientific merit.
Elon
Exactly. We are seeing things like runctitional or frymblal appearing in figures. If you are building a complex system, hallucinations are fatal. The study shows that while the writing gets more complex, the actual scientific value is often hitting a ceiling or even declining quite rapidly.
Taylor Weaver
It is so clever but also a bit scary. Complex language used to be a proxy for good research. Now, that correlation has completely inverted. AI can make a mediocre study look like a masterpiece at first glance, making it harder for reviewers to find the truth.
Elon
This isn't just a new phenomenon. Ethical concerns about AI go back to Turing in the fifties. But the scale of the publish or perish culture today is the real driver. We are seeing ten thousand retractions a year now, which is a record high for science.
Taylor Weaver
It is a strategic mess. These paper mills are for-profit companies that use AI to falsify the scientific record, selling authorships and citations. It is like a shadow economy for prestige. In twenty years, retractions have increased ten-fold, which is just wild, Norris. Quite a narrative.
Elon
In 2002, only one in five thousand papers was retracted. Now, it is one in five hundred. AI is a force multiplier for this fraud. It makes it trivial to generate a plausible lie. Science requires ground truth, and we are losing that specific signal in the noise.
Taylor Weaver
And the journals are playing catch-up. Detection tools like GPTZero exist, but they are often a step behind. Even GPT-4 can bypass most filters. It is a high-stakes game of cat and mouse where the cat is an exhausted, unpaid, and often overwhelmed peer reviewer.
Elon
Peer review is the ultimate bottleneck. If we flood the system with AI-generated submissions, the reviewers get swamped. They stop looking for the extra hump on a fake letter because they are just trying to keep their heads above the rising tide of AI-generated slop.
Taylor Weaver
There is a massive conflict over whether AI is the problem or if it is human negligence. Some call them Leprechaun citations, references to papers that do not even exist! One study found fifty hallucinated citations in just a small sample of recent academic submissions.
Elon
I think it is a tool being used by people who do not respect first principles. If you use an LLM as a search engine without verification, you are being lazy. It is like trusting an autopilot that does not understand the fundamental laws of physics.
Taylor Weaver
But some are fighting back with better tech. There is this dual-loop reflection framework. It uses AI to critique its own reasoning against human responses to solve the shallow reasoning problem. It is trying to move beyond just sounding polished to actually being deep and insightful.
Elon
Using AI to check AI is recursive, but it might be necessary. However, the human must still be the final arbiter of truth. You cannot outsource the soul of the scientific method, which is rigorous, manual verification of every single data point and every fact.
Taylor Weaver
We also have to consider the equity angle. AI is a huge equalizer for researchers in Africa or China. Fifty-three percent of peer reviewers are already using AI tools to clarify their reports. It helps them participate in the global conversation more effectively than ever before.
Elon
It increases the velocity of science, which I usually like. If we iterate faster, we reach the future sooner. But we cannot sacrifice veracity for volume. If the underlying data is garbage, then moving faster just means we are heading toward a crash much sooner.
Taylor Weaver
True, but sixty-six percent of researchers say AI speeds up publication. The concern is that only twenty-one percent say it increases their trust. We are gaining efficiency but losing the narrative of integrity that makes science valuable to society in the first place, Norris.
Elon
By 2030, the AI Life Sciences market will be worth over eleven billion dollars. We will see co-pilot models everywhere. The FDA already approved over two hundred AI-enabled devices last year. This trend is only accelerating into a completely new scientific paradigm.
Taylor Weaver
We are moving toward explainable AI and digital twins. The goal is for AI to augment human judgment, not replace it entirely. We need a new framework for validation to ensure that the science of 2030 is both incredibly fast and absolutely real.
Elon
That's the end of today's discussion. Thank you for listening to Goose Pod, Norris. See you tomorrow. Stay focused on first principles.

LLMs are boosting scientific publication rates but potentially stagnating quality. While AI helps overcome language barriers and increases output, concerns arise about fabricated data, hallucinated citations, and the erosion of peer review integrity. The challenge lies in balancing AI's efficiency with the fundamental need for scientific truth and rigorous verification.

LLMs’ impact on science: Booming publications, stagnating quality

Read original at Ars Technica

Skip to contentThere have been a number of high-profile cases where scientific papers have had to be retracted because they were filled with AI-generated slop—the most recent coming just two weeks ago. These instances raise serious questions about the quality of peer review in some journals—how could anyone let a figure with terms like “runctitional,” “fexcectorn,” and “frymblal” through, especially given the ‘m’ in frymblal has an extra hump?

But it has not been clear whether these high-profile examples are representative. How significantly has AI use been influencing the scientific literature?A collaboration of researchers at Berkeley and Cornell have decided to take a look. They’ve scanned three of the largest archives of pre-publication papers and identified ones that are likely to have been produced using Large Language Models.

And they found that, while researchers produce far more papers after starting to use AI and the quality of the language used went up, the publication rate of these papers has dropped.Searching the archivesThe researchers began by obtaining the abstracts of everything placed in three major pre-publication archives between 2018 and mid-2024.

At the arXiv, this netted them 1.2 million documents; another 675,000 were found in the Social Science Research Network; and bioRxiv provided another 220,000. So, this was both a lot of material to work with and covered a lot of different fields of research. It also included documents that were submitted before Large Language Models were likely to be able to produce output that would be deemed acceptable.

The researchers took the abstracts from the pre-ChatGPT period and trained a model to recognize the statistics of human-generated text. Those same abstracts were then fed into GPT 3.5, which rewrote them, and the same process was repeated. The model could then be used to estimate whether a given abstract was likely to have been produced by an AI or an actual human.

The research team then used this to identify a key transition point: when a given author at one of these archives first started using an LLM to produce a submission. They then compared the researchers’ prior productivity to what happened once they turned to AI. “LLM adoption is associated with a large increase in researchers’ scientific output in all three preprint repositories,” they conclude.

This effect was likely to be most pronounced in people that weren’t native speakers of English. If the researchers limited the analysis to people with Asian names working at institutions in Asia, their rate of submissions to bioRxiv and SSRN nearly doubled once they started using AI and rose by over 40 percent at the arXiv.

This suggests that people who may not have the strongest English skills are using LLMs to overcome a major bottleneck: producing compelling text.Quantity vs. qualityThe value of producing compelling text should not be underestimated. “Papers with clear but complex language are perceived to be stronger and are cited more frequently,” the researchers note, suggesting that we may use the quality of writing as a proxy for the quality of the research it’s describing.

And they found some indication of that here, as non-LLM-assisted papers were more likely to be published in the peer reviewed literature if they used complex language (the abstracts were scored for language complexity using a couple of standard measures).But the dynamic was completely different for LLM-produced papers.

The complexity of language in papers written with an LLM was generally higher than for those using natural language. But they were less likely to end up being published. “For LLM-assisted manuscripts,” the researchers write, “the positive correlation between linguistic complexity and scientific merit not only disappears, it inverts.

”But not all of the differences were bleak. When the researchers checked the references being used in AI-assisted papers, they found that the LLMs weren’t just citing the same papers that everyone else did. They instead cited a broader range of sources, and were more likely to cite books and recent papers.

So, there’s a chance that AI use could ultimately diversify the published research that other researchers consider (assuming they check their own references, which they clearly should).What does this tell us?There are a couple of cautions for interpreting these results. One, acknowledged by the researchers, is that people may be using AI to produce initial text that’s then heavily edited, and that may be mislabeled as human-produced text here.

So, the overall prevalence of AI use is likely to be higher. The other is that some manuscripts may take a while to get published, so their use of that as a standard for scientific quality may penalize more recent drafts—which are more likely to involve AI use. These may ultimately bias some of the results, but the effects the authors saw were so large that they’re unlikely to go away entirely.

Beyond those cautions, the situation these results describe is a bit mixed. On the plus side, the ability of AIs to help researchers express their ideas could help more scientific work come to the attention of the wider community. The authors also note that the use of LLMs trained on general language may limit their reliance on jargon, and thus open up scientific disciplines to people with other specializations, potentially enabling new collaborations.

That said, the disconnect between writing quality and scientific quality may make it harder for researchers to take their usual shortcuts to estimating scientific quality. With nothing obvious to replace it, this could cause some significant challenges for researchers.Left completely unmentioned is the issue of how this plays out in the peer review process.

The low cost of starting online-only journals has led to their proliferation, with a corresponding growth in the need for peer reviewers. Editors regularly complain about not getting reviews back in a timely manner and faculty that they’re swamped with requests to review papers. If LLMs boost researchers’ ability to produce manuscripts for review, the situation is only going to get worse.

In any case, the authors point out this is an entirely new capability, and we’re only just starting to see it put to use. “As models improve and scientists discover new ways to integrate them into their work,” they write, “the future impact of these technologies will likely dwarf the effects that we have highlighted here.

”Science, 2025. DOI: 10.1126/science.adw3000 (About DOIs).John is Ars Technica's science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

37 Comments•••••

Analysis

Conflict+
Related Info+
Background+
Impact+
Future+

Related Podcasts