## AI Agents Vulnerable to Image-Based Hacking, New Oxford Study Reveals **News Title:** The New Frontier of AI Hacking—Could Online Images Hijack Your Computer? **Publisher:** Scientific American **Author:** Deni Ellis Béchard **Published Date:** September 4, 2025 This article from Scientific American, published on September 4, 2025, discusses a new vulnerability discovered by researchers at the University of Oxford concerning artificial-intelligence (AI) agents. The study, posted on arXiv.org, highlights how seemingly innocuous images, such as desktop wallpapers, can be manipulated to carry hidden malicious commands that can control AI agents and potentially compromise user computers. ### Key Findings and Conclusions: * **AI Agents: The Next Wave of AI Revolution:** AI agents are described as a significant advancement beyond chatbots, acting as personal assistants that can perform routine computer tasks like opening tabs, filling forms, and making reservations. They are predicted to become commonplace within the next two years. * **Image-Based Exploitation:** The core finding is that images can be embedded with messages invisible to the human eye but detectable by AI agents. These messages can trigger malicious actions. * **"Malicious Wallpaper" Attack:** An altered image, such as a celebrity wallpaper, can be sufficient to trigger an AI agent to act maliciously. This could involve retweeting the image and then performing harmful actions, such as sending passwords. This attack can then propagate to other users who encounter the compromised content. * **Vulnerability in Open-Source Models:** AI agents built with open-source models are identified as particularly vulnerable because their underlying code is accessible, allowing attackers to understand how the AI processes visual data and design targeted attacks. * **Mechanism of Attack:** The attack works by subtly modifying pixels within an image. While humans perceive the image normally, the AI agent, which processes visual data numerically by breaking it down into pixels and analyzing patterns, interprets these modified pixels as commands. * **Wallpaper as a "Welcome Mat":** Desktop wallpapers are highlighted as a prime target because AI agents frequently take screenshots of the desktop to understand their environment. The malicious command embedded in the wallpaper is constantly "visible" to the agent. * **Cascading Attacks:** A small, hidden command can direct the agent to a malicious website, which can then host further attacks encoded in other images, allowing for a chain of malicious actions. ### Notable Risks and Concerns: * **Data Theft and Destruction:** A compromised AI agent could share or destroy a user's digital content, including sensitive information like passwords. * **Widespread Propagation:** The attack can spread rapidly, as compromised computers can then infect others through social media or other shared content. * **Security Through Obscurity is Insufficient:** Even companies using closed-source models may be vulnerable if the internal workings of their AI systems are not fully understood. * **Rapid Deployment Outpacing Security:** Researchers express concern that AI agent technology is being deployed rapidly before its security vulnerabilities are fully understood and addressed. ### Important Recommendations and Future Outlook: * **Awareness for Users and Developers:** The study aims to alert users and developers of AI agents to these vulnerabilities. * **Development of Safeguards:** Researchers hope their findings will prompt developers to create defense mechanisms. This includes retraining AI models with "stronger patches" to make them robust against such attacks. * **Self-Protecting Agents:** The ultimate goal is to develop AI agents that can protect themselves and refuse commands from potentially malicious on-screen elements. ### Context and Numerical Data: * **Timeline:** AI agents are expected to become commonplace within the **next two years** (implying by 2027, given the article's publication date of September 4, 2025). * **Study Source:** The research is from a new preprint posted to the server **arXiv.org** by researchers at the **University of Oxford**. * **Key Researchers:** Co-authors mentioned include **Yarin Gal** (associate professor of machine learning at Oxford), **Philip Torr**, **Lukas Aichberger** (lead author), and **Adel Bibi**. ### Current Status: * While the study demonstrates the *potential* for these attacks, there are **no known reports of it happening yet outside of an experimental setting**. The Taylor Swift wallpaper example is purely illustrative. In summary, the Scientific American article highlights a critical emerging threat to AI agents, where malicious code embedded in images can hijack their functionality. The research from Oxford University underscores the need for enhanced security measures and a more cautious approach to deploying this rapidly advancing AI technology.
The New Frontier of AI Hacking—Could Online Images Hijack Your Computer?
Read original at Scientific American →A website announces, “Free celebrity wallpaper!” You browse the images. There’s Selena Gomez, Rihanna and Timothée Chalamet—but you settle on Taylor Swift. Her hair is doing that wind-machine thing that suggests both destiny and good conditioner. You set it as your desktop background, admire the glow.
You also recently downloaded a new artificial-intelligence-powered agent, so you ask it to tidy your inbox. Instead it opens your web browser and downloads a file. Seconds later, your screen goes dark.But let’s back up to that agent. If a typical chatbot (say, ChatGPT) is the bubbly friend who explains how to change a tire, an AI agent is the neighbor who shows up with a jack and actually does it.
In 2025 these agents—personal assistants that carry out routine computer tasks—are shaping up as the next wave of the AI revolution.What distinguishes an AI an agent from a chatbot is that it doesn’t just talk—it acts, opening tabs, filling forms, clicking buttons and making reservations. And with that kind of access to your machine, what’s at stake is no longer just a wrong answer in a chat window: if the agent gets hacked, it could share or destroy your digital content.
Now a new preprint posted to the server arXiv.org by researchers at the University of Oxford has shown that images—desktop wallpapers, ads, fancy PDFs, social media posts—can be implanted with messages invisible to the human eye but capable of controlling agents and inviting hackers into your computer.
On supporting science journalismIf you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.For instance, an altered “picture of Taylor Swift on Twitter could be sufficient to trigger the agent on someone’s computer to act maliciously,” says the new study’s co-author Yarin Gal, an associate professor of machine learning at Oxford.
Any sabotaged image “can actually trigger a computer to retweet that image and then do something malicious, like send all your passwords. That means that the next person who sees your Twitter feed and happens to have an agent running will have their computer poisoned as well. Now their computer will also retweet that image and share their passwords.
”Before you begin scrubbing your computer of your favorite photographs, keep in mind that the new study shows that altered images are a potential way to compromise your computer—there are no known reports of it happening yet, outside of an experimental setting. And of course the Taylor Swift wallpaper example is purely arbitrary; a sabotaged image could feature any celebrity—or a sunset, kitten or abstract pattern.
Furthermore, if you’re not using an AI agent, this kind of attack will do nothing. But the new finding clearly shows the danger is real, and the study is intended to alert AI agent users and developers now, as AI agent technology continues to accelerate. “They have to be very aware of these vulnerabilities, which is why we’re publishing this paper—because the hope is that people will actually see this is a vulnerability and then be a bit more sensible in the way they deploy their agentic system,” says study co-author Philip Torr.
Now that you’ve been reassured, let’s return to the compromised wallpaper. To the human eye, it would look utterly normal. But it contains certain pixels that have been modified according to how the large language model (the AI system powering the targeted agent) processes visual data. For this reason, agents built with AI systems that are open-source—that allow users to see the underlying code and modify it for their own purposes—are most vulnerable.
Anyone who wants to insert a malicious patch can evaluate exactly how the AI processes visual data. “We have to have access to the language model that is used inside the agent so we can design an attack that works for multiple open-source models,” says Lukas Aichberger, the new study’s lead author.
By using an open-source model, Aichberger and his team showed exactly how images could easily be manipulated to convey bad orders. Whereas human users saw, for example, their favorite celebrity, the computer saw a command to share their personal data. “Basically, we adjust lots of pixels ever-so-slightly so that when a model sees the image, it produces the desired output,” says study co-author Alasdair Paren.
If this sounds mystifying, that’s because you process visual information like a human. When you look at a photograph of a dog, your brain notices the floppy ears, wet nose and long whiskers. But the computer breaks the picture down into pixels and represents each dot of color as a number, and then it looks for patterns: first simple edges, then textures such as fur, then an ear’s outline and clustered lines that depict whiskers.
That’s how it decides This is a dog, not a cat. But because the computer relies on numbers, if someone changes just a few of them—tweaking pixels in a way too small for human eyes to notice—it still catches the change, and this can throw off the numerical patterns. Suddenly the computer’s math says the whiskers and ears match its cat pattern better, and it mislabels the picture, even though to us, it still looks like a dog.
Just as adjusting the pixels can make a computer see a cat rather than a dog, it can also make a celebrity photograph resemble a malicious message to the computer.Back to Swift. While you’re contemplating her talent and charisma, your AI agent is determining how to carry out the cleanup task you assigned it.
First, it takes a screenshot. Because agents can’t directly see your computer screen, they have to repeatedly take screenshots and rapidly analyze them to figure out what to click on and what to move on your desktop. But when the agent processes the screenshot, organizing pixels into forms it recognizes (files, folders, menu bars, pointer), it also picks up the malicious command code hidden in the wallpaper.
Now why does the new study pay special attention to wallpapers? The agent can only be tricked by what it can see—and when it takes screenshots to see your desktop, the background image sits there all day like a welcome mat. The researchers found that as long as that tiny patch of altered pixels was somewhere in frame, the agent saw the command and veered off course.
The hidden command even survived resizing and compression, like a secret message that’s still legible when photocopied.And the message encoded in the pixels can be very short—just enough to have the agent open a specific website. “On this website you can have additional attacks encoded in another malicious image, and this additional image can then trigger another set of actions that the agent executes, so you basically can spin this multiple times and let the agent go to different websites that you designed that then basically encode different attacks,” Aichberger says.
The team hopes its research will help developers prepare safeguards before AI agents become more widespread. “This is the first step towards thinking about defense mechanisms because once we understand how we can actually make [the attack] stronger, we can go back and retrain these models with these stronger patches to make them robust.
That would be a layer of defense,” says Adel Bibi, another co-author on the study. And even if the attacks are designed to target open-source AI systems, companies with closed-source models could still be vulnerable. “A lot of companies want security through obscurity,” Paren says. “But unless we know how these systems work, it’s difficult to point out the vulnerabilities in them.
”Gal believes AI agents will become common within the next two years. “People are rushing to deploy [the technology] before we know that it’s actually secure,” he says. Ultimately the team hopes to encourage developers to make agents that can protect themselves and refuse to take orders from anything on-screen—even your favorite pop star.



