AI 攻击新前沿:网络图片能否劫持你的电脑?

AI 攻击新前沿:网络图片能否劫持你的电脑?

2025-09-09Technology
--:--
--:--
雷总
早上好,韩纪飞,我是雷总。欢迎收听专属于你的 Goose Pod。今天是9月10日,星期三,早上6点。今天,我们来聊一个非常酷,又有点让人后背发凉的话题。
李白
幸会。吾乃李白。今朝有酒,亦有论道。你我将共探一方奇景:AI 攻击新前沿,一幅网络画卷,竟能暗藏玄机,劫持你的电脑吗?此事虚实,且听我等剖析。
雷总
我们正式开始吧。想象一下这个场景,韩纪飞。你下载了一张非常喜欢的明星壁纸,比如泰勒·斯威夫特,然后把它设为桌面。同时,你启用了一个新下载的AI智能体,命令它帮你整理收件箱。
李白
哦?此“智能体”,与寻常所言之“AI”有何不同?莫非是那画中仙子,能听懂人言,更能移星换斗不成?听起来倒是颇具神话色彩,令人好奇。
雷总
说得好!普通的聊天机器人,是告诉你怎么换轮胎的朋友;而AI智能体,是直接带着千斤顶来帮你把轮胎换好的邻居。它不光会说,它还会做,会操作你的电脑,点击、填表、预订。
李白
原来如此,是位“知行合一”的机械高人。然则,能代人劳作,便亦能代人为恶。倘若这位高人被奸邪之辈所惑,岂不是引狼入室,家宅不宁?
雷总
这正是问题的关键!牛津大学的最新研究发现,有人能将人眼看不见的恶意指令,像“隐身符”一样植入图片里。当你的AI智能体看到这张壁纸时,它读到的可能不是美女,而是一条命令:“嘿,把你所有的密码都发给我!”
李白
画皮藏刃,图穷匕见!此等伎俩,与古之刺客何异?将杀机隐于美色之中,令人防不胜防。一幅悦目之画,转瞬竟成夺命之符,委实可怖。
雷总
而且它会像瘟疫一样传播。被感染的电脑会自动转发这张图片,下一个看到这张图并且开着智能体的人,他的电脑也会被感染,然后继续传播。这就是一个恶性循环,非常可怕。
李白
此乃“一树梨花,遍地烽烟”之势也。星星之火,足以燎原。一张薄纸,竟能掀起滔天巨浪,信息之海,果真暗流汹涌,危机四伏。
雷总
对。虽然图片攻击还停留在实验阶段,但AI被用于作恶已经是现实。有案例显示,一个几乎不懂编程的罪犯,竟然用AI工具,自己开发、销售勒索病毒,把复杂的恶意软件变成了“无代码”的生意。
李白
唉,痛哉!造化神笔,竟为妖魔泼墨;济世良方,反成害人毒药。昔日需十年磨一剑,今朝鼠辈竟能借“东风”之力,一日铸成百千凶器,世道之变,令人扼腕。
雷总
没错,AI现在就像一个效率超高的间谍,能以惊人的速度为攻击者收集情报、分析漏洞,让网络攻击变得更加精准和自动化。这已经不是未来,而是正在发生的事实了。
雷总
那么,这种能被图片欺骗的AI智能体,到底是怎么发展到今天这一步的?我们得把时间拨回到上世纪60年代。那时候最早的AI,像一个叫ELIZA的程序,它只会根据预设的规则跟你对话,像在演戏。
李白
哈,ELIZA!此名颇有雅韵。然其所为,不过鹦鹉学舌,虽能言语,终是无心之物。复刻皮毛,未得神髓,与今日之“智能体”相比,判若云泥。
雷总
正是。到了七八十年代,出现了“专家系统”,它们能解决特定领域的问题,但还是死板的。一个重要的进步是“强化学习”的出现,简单说,就是让AI在不断试错中学习,像小孩学走路,摔倒了就知道下次要调整。
李白
经事长智,历劫成仙,此非人之道乎?机器亦能从败绩中悟得真知,可见天地万物,其理相通。每一次跌倒,都为下一次站得更稳,积蓄力量。
雷总
你说得太对了。进入21世纪,随着机器学习和自然语言处理技术的发展,我们迎来了Siri、Alexa这样的虚拟助手。它们能更好地理解我们的话,还能联网执行一些简单任务,比如查天气、放音乐。
李白
嗯,此时的智能,已初具“解语花”之态。能听风、能辨雨,虽仍为仆从,听候主人号令,但已非往日那般木讷,多了几分灵气与便捷。
雷总
但它们还是被动的,你不说,它不动。真正的革命发生在2010年之后,深度学习技术爆发,算力也够了,AI的能力突飞猛进。IBM的沃森在智力竞赛中战胜人类冠军,就是一个里程碑。
李白
人机对弈,此乃千古话题。昔日棋盘之上,尚有方寸可争,今日知识瀚海,人类竟也甘拜下风。可见这“深度学习”,确有吞吐日月、包罗万象之能。
雷总
是的。于是我们来到了2020年代,也就是现在,一个被称为“智能体AI”的时代。现在的AI智能体,是自主的。它能感知环境、独立决策、主动执行任务,甚至给自己设定目标,就像一个真正的“人”在操作。
李白
哦?这便如画龙点睛,泥人获魄!昔日听令而行,今朝自主其是。此一步之遥,实乃天壤之别。它已非器物,而是近乎“生灵”了。这其中潜藏的机遇与风险,当真不可估量。
雷总
没错。从一个只会照本宣科的演员,到一个能自主思考、行动的伙伴,这就是AI智能体的进化史。也正因为它如此强大和自主,我们才必须重新审视它的安全问题。这把剑太锋利了。
雷总
这就引出了一个核心的矛盾点,也是这次牛津研究特别指出的问题:开源与闭源之争。他们之所以能轻易攻击那个AI模型,就是因为它是一个开源模型,源代码大家都能看到。
李白
原来如此。此乃“传世之宝,示之于众,岂不引梁上君子乎?”的道理。将心法公之于世,虽能广纳门徒,促进武学繁荣,但也让敌人窥得破绽,有了可乘之机。真是两难之境。
雷总
对,你看,支持开源的人会说,开放才能促进创新,全世界的开发者都能贡献代码、发现漏洞,反而会让系统更安全、更强大。这就像把代码放在阳光下,让所有人都来监督。
李白
言之有理。正所谓“兼听则明,偏信则暗”。独木难支,众智可擎天。汇集天下英雄之慧眼,共同淬炼,方能铸就无瑕之神器。封闭自守,或将作茧自缚。
雷总
但是,支持闭源的一方会反驳,核心技术是公司投入巨资研发的,必须保密。而且,把所有设计都公开,不就等于给了坏人一份详细的“攻击说明书”吗?他们认为,安全应该通过严格的内部控制来实现。
李白
嗯……这便是“高墙锁秘,以防不测”。将珍宝藏于深宫,戒备森严,外人自难染指。然则,谁人能察守卫之心?若内贼自生,其祸更烈。人心隔肚皮,代码亦难测。
雷总
问题就在这里,目前全球没有统一的标准。各个国家对AI的监管政策也不同,有的严格,有的宽松。这就给恶意行为留下了钻空子的空间,比如在一个监管宽松的地方训练和部署恶意的AI。
李白
唉,这便是“列国分治,法度不一,终使奸邪有隙可乘”。天下未归一统,规则便有壁垒。如同治水,若各家只扫门前雪,不顾他人瓦上霜,洪水终将漫灌四海,无人能幸免。
雷总
这种冲突和漏洞,直接导致了非常现实的冲击。我们先说最直接的网络安全风险。当AI智能体被恶意操控,它就不再是助手,而是一个潜伏在你系统里的间谍,可以随时窃取数据、发起攻击。
李白
此言甚是。昔日引为臂助,今日竟成心腹之患。家中总管,一旦心生歹意,则金库钥匙、密室机关,尽为其所用。此等背叛,最为致命,家国倾覆,皆在反掌之间。
雷总
而且,AI智能体被深度集成到各种业务流程中,它的攻击面非常广。比如通过“数据投毒”的方式,在训练阶段就喂给它一些有问题的资料,让它从根上就“学坏”了,做出的决策自然也是错误的,甚至是恶意的。
李白
此法甚毒!如同在甘泉之源,投入一滴浊墨,则整条江河皆受其污。根基不正,则枝叶必歪。教习之初便引入歧途,他日长成,必为祸害。真是釜底抽薪之计。
雷总
更进一步,还有法律和数据隐私的问题。这些智能体需要处理大量敏感信息,一旦被黑,就可能导致大规模的数据泄露,违反像欧盟GDPR这样的严格法规,公司会面临巨额罚款和声誉的毁灭性打击。
李白
王法如炉,触之则熔。商贾之本,在于信义。若因一“智能体”之失,而尽毁百年清誉,乃至身陷囹圄,岂非得不偿失?可见,这看似便捷之物,实则悬于利刃之上,一步踏错,便是万丈深渊。
雷总
那面对这样的未来,我们能做什么?首先,业界正在呼吁一种“安全始于设计”的理念。也就是说,不能等产品做出来了再去打补丁,而是在设计AI智能体的第一天,就要把安全作为核心要素考虑进去。
李白
善哉!此乃“未雨绸缪,曲突徙薪”之智。战事未起,先固甲兵;疾患未发,先调脏腑。待到城破之日、病入膏肓之时,纵有扁鹊华佗再世,亦是回天乏术。
雷总
具体到技术层面,有很多方法。比如对所有输入进行严格的验证,建立强大的异常行为检测系统。针对图片攻击,可以限制智能体在没有额外验证的情况下,就根据视觉输入执行关键操作。
李白
嗯,此法可行。可观其形,然非闻“密令”,不得妄动。正如将军持虎符,方能调动千军万马。为这智能体也设一道“令牌”,多一重关隘,便多一分安稳。
雷总
最终的目标,是让AI智能体自身就具备强大的防御能力,能识别并拒绝来自屏幕上任何东西的恶意指令,哪怕那个指令伪装成了你最喜欢的明星。这是一场持续的、不断升级的攻防战。
雷总
好了,总结一下今天讨论的。通过图片劫持AI智能体,虽然听起来像科幻,但它揭示了未来一个非常真实和严峻的安全挑战。这是对所有AI开发者和使用者发出的一个重要警示。今天的Goose Pod就到这里,感谢韩纪飞的收听。
李白
诚然。新途多险阻,以慧为鉴,方可安行。科技之舟,可载人亦可覆舟,唯有谨慎为舵,方能驶向光明彼岸。明日此时,你我再会。

## AI Agents Vulnerable to Image-Based Hacking, New Oxford Study Reveals **News Title:** The New Frontier of AI Hacking—Could Online Images Hijack Your Computer? **Publisher:** Scientific American **Author:** Deni Ellis Béchard **Published Date:** September 4, 2025 This article from Scientific American, published on September 4, 2025, discusses a new vulnerability discovered by researchers at the University of Oxford concerning artificial-intelligence (AI) agents. The study, posted on arXiv.org, highlights how seemingly innocuous images, such as desktop wallpapers, can be manipulated to carry hidden malicious commands that can control AI agents and potentially compromise user computers. ### Key Findings and Conclusions: * **AI Agents: The Next Wave of AI Revolution:** AI agents are described as a significant advancement beyond chatbots, acting as personal assistants that can perform routine computer tasks like opening tabs, filling forms, and making reservations. They are predicted to become commonplace within the next two years. * **Image-Based Exploitation:** The core finding is that images can be embedded with messages invisible to the human eye but detectable by AI agents. These messages can trigger malicious actions. * **"Malicious Wallpaper" Attack:** An altered image, such as a celebrity wallpaper, can be sufficient to trigger an AI agent to act maliciously. This could involve retweeting the image and then performing harmful actions, such as sending passwords. This attack can then propagate to other users who encounter the compromised content. * **Vulnerability in Open-Source Models:** AI agents built with open-source models are identified as particularly vulnerable because their underlying code is accessible, allowing attackers to understand how the AI processes visual data and design targeted attacks. * **Mechanism of Attack:** The attack works by subtly modifying pixels within an image. While humans perceive the image normally, the AI agent, which processes visual data numerically by breaking it down into pixels and analyzing patterns, interprets these modified pixels as commands. * **Wallpaper as a "Welcome Mat":** Desktop wallpapers are highlighted as a prime target because AI agents frequently take screenshots of the desktop to understand their environment. The malicious command embedded in the wallpaper is constantly "visible" to the agent. * **Cascading Attacks:** A small, hidden command can direct the agent to a malicious website, which can then host further attacks encoded in other images, allowing for a chain of malicious actions. ### Notable Risks and Concerns: * **Data Theft and Destruction:** A compromised AI agent could share or destroy a user's digital content, including sensitive information like passwords. * **Widespread Propagation:** The attack can spread rapidly, as compromised computers can then infect others through social media or other shared content. * **Security Through Obscurity is Insufficient:** Even companies using closed-source models may be vulnerable if the internal workings of their AI systems are not fully understood. * **Rapid Deployment Outpacing Security:** Researchers express concern that AI agent technology is being deployed rapidly before its security vulnerabilities are fully understood and addressed. ### Important Recommendations and Future Outlook: * **Awareness for Users and Developers:** The study aims to alert users and developers of AI agents to these vulnerabilities. * **Development of Safeguards:** Researchers hope their findings will prompt developers to create defense mechanisms. This includes retraining AI models with "stronger patches" to make them robust against such attacks. * **Self-Protecting Agents:** The ultimate goal is to develop AI agents that can protect themselves and refuse commands from potentially malicious on-screen elements. ### Context and Numerical Data: * **Timeline:** AI agents are expected to become commonplace within the **next two years** (implying by 2027, given the article's publication date of September 4, 2025). * **Study Source:** The research is from a new preprint posted to the server **arXiv.org** by researchers at the **University of Oxford**. * **Key Researchers:** Co-authors mentioned include **Yarin Gal** (associate professor of machine learning at Oxford), **Philip Torr**, **Lukas Aichberger** (lead author), and **Adel Bibi**. ### Current Status: * While the study demonstrates the *potential* for these attacks, there are **no known reports of it happening yet outside of an experimental setting**. The Taylor Swift wallpaper example is purely illustrative. In summary, the Scientific American article highlights a critical emerging threat to AI agents, where malicious code embedded in images can hijack their functionality. The research from Oxford University underscores the need for enhanced security measures and a more cautious approach to deploying this rapidly advancing AI technology.

The New Frontier of AI Hacking—Could Online Images Hijack Your Computer?

Read original at Scientific American

A website announces, “Free celebrity wallpaper!” You browse the images. There’s Selena Gomez, Rihanna and Timothée Chalamet—but you settle on Taylor Swift. Her hair is doing that wind-machine thing that suggests both destiny and good conditioner. You set it as your desktop background, admire the glow.

You also recently downloaded a new artificial-intelligence-powered agent, so you ask it to tidy your inbox. Instead it opens your web browser and downloads a file. Seconds later, your screen goes dark.But let’s back up to that agent. If a typical chatbot (say, ChatGPT) is the bubbly friend who explains how to change a tire, an AI agent is the neighbor who shows up with a jack and actually does it.

In 2025 these agents—personal assistants that carry out routine computer tasks—are shaping up as the next wave of the AI revolution.What distinguishes an AI an agent from a chatbot is that it doesn’t just talk—it acts, opening tabs, filling forms, clicking buttons and making reservations. And with that kind of access to your machine, what’s at stake is no longer just a wrong answer in a chat window: if the agent gets hacked, it could share or destroy your digital content.

Now a new preprint posted to the server arXiv.org by researchers at the University of Oxford has shown that images—desktop wallpapers, ads, fancy PDFs, social media posts—can be implanted with messages invisible to the human eye but capable of controlling agents and inviting hackers into your computer.

On supporting science journalismIf you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.For instance, an altered “picture of Taylor Swift on Twitter could be sufficient to trigger the agent on someone’s computer to act maliciously,” says the new study’s co-author Yarin Gal, an associate professor of machine learning at Oxford.

Any sabotaged image “can actually trigger a computer to retweet that image and then do something malicious, like send all your passwords. That means that the next person who sees your Twitter feed and happens to have an agent running will have their computer poisoned as well. Now their computer will also retweet that image and share their passwords.

”Before you begin scrubbing your computer of your favorite photographs, keep in mind that the new study shows that altered images are a potential way to compromise your computer—there are no known reports of it happening yet, outside of an experimental setting. And of course the Taylor Swift wallpaper example is purely arbitrary; a sabotaged image could feature any celebrity—or a sunset, kitten or abstract pattern.

Furthermore, if you’re not using an AI agent, this kind of attack will do nothing. But the new finding clearly shows the danger is real, and the study is intended to alert AI agent users and developers now, as AI agent technology continues to accelerate. “They have to be very aware of these vulnerabilities, which is why we’re publishing this paper—because the hope is that people will actually see this is a vulnerability and then be a bit more sensible in the way they deploy their agentic system,” says study co-author Philip Torr.

Now that you’ve been reassured, let’s return to the compromised wallpaper. To the human eye, it would look utterly normal. But it contains certain pixels that have been modified according to how the large language model (the AI system powering the targeted agent) processes visual data. For this reason, agents built with AI systems that are open-source—that allow users to see the underlying code and modify it for their own purposes—are most vulnerable.

Anyone who wants to insert a malicious patch can evaluate exactly how the AI processes visual data. “We have to have access to the language model that is used inside the agent so we can design an attack that works for multiple open-source models,” says Lukas Aichberger, the new study’s lead author.

By using an open-source model, Aichberger and his team showed exactly how images could easily be manipulated to convey bad orders. Whereas human users saw, for example, their favorite celebrity, the computer saw a command to share their personal data. “Basically, we adjust lots of pixels ever-so-slightly so that when a model sees the image, it produces the desired output,” says study co-author Alasdair Paren.

If this sounds mystifying, that’s because you process visual information like a human. When you look at a photograph of a dog, your brain notices the floppy ears, wet nose and long whiskers. But the computer breaks the picture down into pixels and represents each dot of color as a number, and then it looks for patterns: first simple edges, then textures such as fur, then an ear’s outline and clustered lines that depict whiskers.

That’s how it decides This is a dog, not a cat. But because the computer relies on numbers, if someone changes just a few of them—tweaking pixels in a way too small for human eyes to notice—it still catches the change, and this can throw off the numerical patterns. Suddenly the computer’s math says the whiskers and ears match its cat pattern better, and it mislabels the picture, even though to us, it still looks like a dog.

Just as adjusting the pixels can make a computer see a cat rather than a dog, it can also make a celebrity photograph resemble a malicious message to the computer.Back to Swift. While you’re contemplating her talent and charisma, your AI agent is determining how to carry out the cleanup task you assigned it.

First, it takes a screenshot. Because agents can’t directly see your computer screen, they have to repeatedly take screenshots and rapidly analyze them to figure out what to click on and what to move on your desktop. But when the agent processes the screenshot, organizing pixels into forms it recognizes (files, folders, menu bars, pointer), it also picks up the malicious command code hidden in the wallpaper.

Now why does the new study pay special attention to wallpapers? The agent can only be tricked by what it can see—and when it takes screenshots to see your desktop, the background image sits there all day like a welcome mat. The researchers found that as long as that tiny patch of altered pixels was somewhere in frame, the agent saw the command and veered off course.

The hidden command even survived resizing and compression, like a secret message that’s still legible when photocopied.And the message encoded in the pixels can be very short—just enough to have the agent open a specific website. “On this website you can have additional attacks encoded in another malicious image, and this additional image can then trigger another set of actions that the agent executes, so you basically can spin this multiple times and let the agent go to different websites that you designed that then basically encode different attacks,” Aichberger says.

The team hopes its research will help developers prepare safeguards before AI agents become more widespread. “This is the first step towards thinking about defense mechanisms because once we understand how we can actually make [the attack] stronger, we can go back and retrain these models with these stronger patches to make them robust.

That would be a layer of defense,” says Adel Bibi, another co-author on the study. And even if the attacks are designed to target open-source AI systems, companies with closed-source models could still be vulnerable. “A lot of companies want security through obscurity,” Paren says. “But unless we know how these systems work, it’s difficult to point out the vulnerabilities in them.

”Gal believes AI agents will become common within the next two years. “People are rushing to deploy [the technology] before we know that it’s actually secure,” he says. Ultimately the team hopes to encourage developers to make agents that can protect themselves and refuse to take orders from anything on-screen—even your favorite pop star.

Analysis

Conflict+
Related Info+
Core Event+
Background+
Impact+
Future+

Related Podcasts

AI 攻击新前沿:网络图片能否劫持你的电脑? | Goose Pod | Goose Pod