为何人形机器人和具身智能在现实世界中仍步履维艰

为何人形机器人和具身智能在现实世界中仍步履维艰

2025-12-15Technology
--:--
--:--
雷总
Hello Norris1, 早上好!我是雷总。今天是12月16日,星期二,现在的北京时间是凌晨1点08分。非常高兴能在 Goose Pod 和你单独交流。虽然是深夜,但我现在依然非常兴奋,因为我们要聊的话题,关乎着未来科技的终极形态。
小撒1
Norris1 你好,我是小撒1!雷总这精神头我是真服气,凌晨一点还像刚喝了三杯浓缩咖啡。欢迎来到只属于你的 Goose Pod。咱们今天要聊的这个话题,听起来特别科幻,但实际上又特别“骨感”,那就是——为何人形机器人和具身智能在现实世界中仍步履维艰?
雷总
没错。大家现在看短视频,总能刷到机器人跳舞、跑酷,甚至后空翻,感觉《西部世界》就在眼前了。但 Norris1,作为业内人我得说句大实话,这和现实差距太大了。那些酷炫的视频大多是在极其受控的环境下拍的,一旦把它们扔到真实的大街上,可能连个马路牙子都过不去。
小撒1
这就像我们在舞台上彩排,灯光舞美都好了,怎么演都帅。但一到生活里,全是突发状况。我看资料说,现在的困境在于,虽然 ChatGPT 这种大模型脑子很灵,但它们缺乏“具身知识”。就好比一个读遍了所有《游泳指南》的人,理论无敌,但一下水就咕咚沉底了。
雷总
这个比喻非常精准!这就是所谓的“Reality Gap”,现实鸿沟。不过 Norris1,你也别太灰心,行业也在进步。比如最近 1X 公司的 NEO 机器人开始预售了,它是专门为家庭设计的,虽然一开始还得靠人远程操作来教它,但它能通过学习用户数据进化。这就好比是让 AI 真正去“下水游泳”,去积累肌肉记忆。
小撒1
这倒是有点意思,让机器人像学徒一样跟着师父练。不过雷总,刚才提到的那个核心难点,为什么让机器人“像人一样动”这么难?我看那些波士顿动力的机器人跳得比我都好,怎么让它去厨房拿个碗,或者叠个衣服,它就还要再学个十年八年的?
雷总
这背后其实是数据的维度问题。Ken Goldberg 教授有个观点我特别认同,他说看二维的视频学不会三维的动作。这就好比你看了一万遍乔丹投篮的视频,你也成不了乔丹。因为你感受不到那种肌肉的张力、物体的摩擦力、重心的微调。Norris1,你可以想象一下,你端着一碗热汤,在一间关了灯、乱七八糟的卧室里走,你需要每一步都小心翼翼地试探、平衡。
小撒1
哎哟,这画面感太强了,我估计得烫一脚泡。也就是说,人类的这种“直觉”,其实是无数次试错换来的?Meta 的那个 AI 大神 Yann LeCun 不是说过嘛,一个四岁的孩子,光是用眼睛看过的世界,数据量就是现在的超大语言模型训练数据的 50 倍!这还没算孩子摸爬滚打的触觉体验呢。
雷总
是这样的。现在的解决思路主要有两条。一条是“演示”,让人戴着 VR 设备去控制机械臂,手把手教机器人什么是“好动作”,但这效率太低。另一条是“仿真”,在虚拟世界里让 AI 疯狂练习。但这又回到了刚才的问题,虚拟世界里没有真实的摩擦力,没有那种“软乎乎”的质感,模拟器里练得再好,一碰到现实中滑溜溜的肥皂,还是得歇菜。
小撒1
这听着真让人头大。虚拟世界里是“神”,现实世界里是“渣”。这就像那个扫地机器人,平时挺智能,一遇到地上的数据线或者宠物留下的“惊喜”,立马就变成了“智障”。所以现在很多公司其实是在“作弊”,把机器人限制在工厂流水线上,环境固定,动作重复,这就容易多了。
雷总
对,这就引出了一个核心的战略分歧。我们是该死磕“通用人形机器人”,还是先做“专用机器人”?比如施航智能做的深海机器人,在海底一万米作业,听着很难,但因为它环境单一,反而比在家里叠衣服更容易商业化。现在有些机器人能去深海洗船底,效率是人的五倍,这就是在特定场景把“具身智能”用活了的例子。
小撒1
这确实是个很好的切入点。但我们普通人,像 Norris1 这样的,肯定还是想要个能做家务的机器人保姆啊。对于这个愿景什么时候能实现,我看大佬们都要吵翻天了。英伟达的老黄,黄仁勋,他说这是“几年内”就能解决的问题,信心爆棚。但机器人界的泰斗 Rodney Brooks 就直接泼冷水。
雷总
Brooks 的观点确实非常犀利。他在2025年9月明确说,哪怕是那种笨手笨脚的人形机器人,要真正能赚钱商用,也得十年以后了。他甚至还给了一个特别逗的建议:大家看到全尺寸人形机器人,最好离它 3 米远。为什么?因为它们现在的协调性太差,指不定什么时候就摔倒砸到你了,这可是几百斤的铁疙瘩啊。
小撒1
哈哈,离远点,小心“碰瓷”!这对比太强烈了,一边是资本的狂欢,说马上就能用;一边是科学家的严谨,说早着呢。这中间的矛盾就在于,我们太容易低估“日常琐事”的难度了。就像刚才提到的 ATEC2025 挑战赛,几百个队伍参赛,结果机器人在野外连过个桥、浇个花都费劲,完全没有实验室里那么风光。
雷总
是的,这就是所谓的“莫拉维克悖论”:对计算机来说,如下棋这种高智商任务很容易,但如感知和行动这种低智商任务却极难。Norris1,你想想你在健身包里找一件T恤的过程。你的手伸进去,即便看不见,指尖触碰到布料的瞬间,你就能分辨出那是棉的还是速干的,是衣服领子还是袖口。这种触觉反馈和盲操能力,现在的机器人根本做不到。
小撒1
对对对,而且那个包里可能还有臭袜子、半瓶水,还得把它们分开。机器人如果不知道那是啥,可能直接把水捏爆了。这种对“软物体”和“混乱环境”的处理能力,才是真正的门槛。所以 Benjie Holson 提议搞个“家务奥林匹克”,比什么叠反过来的T恤、捡狗粑粑,这才是老百姓真正需要的“黑科技”。
雷总
这个提议太棒了!如果真有机器人能完美地把花生酱从手上擦干净,我绝对第一个买单,不管多少钱。这种技术一旦突破,它的冲击力不亚于蒸汽机的发明。它意味着物理世界的劳动力将被重新定义。虽然现在我们看到的更多是像 Agility Robotics 在搬箱子,或者 Figure AI 在拧螺丝,但这都是通往那个终极目标的必经之路。
小撒1
所以说,未来的机器人可能不像电影里那样,长得跟大明星似的,还能谈恋爱。它们可能更像是一个长了腿的洗衣机,或者一个长了手的洗碗柜?虽然不性感,但是真好用。那是那种“润物细无声”的感觉,每天默默地帮你把家里收拾得井井有条,也不发脾气,也不要加班费。
雷总
没错,真正的科技是让你感觉不到它的存在的。未来的具身智能,会褪去现在这种网红滤镜,变得朴实无华。它们会在工厂、在深海、在家庭的角落里,日复一日地重复那些枯燥的工作。虽然没有 TikTok 上那么炫酷,但那种稳定性和可靠性,才是工程师们追求的极致浪漫。Norris1,我们要有耐心,给这些“笨小孩”一点成长的空间。
小撒1
说得太好了,“笨小孩”终将长大。今天的讨论真是让我对家里的扫地机器人多了一份宽容,毕竟它也在努力地理解这个复杂的世界。感谢 Norris1 收听今天的 Goose Pod,希望这段对话能给你带来一些新的思考。
雷总
感谢你的陪伴,Norris1。在这个技术飞速变革的时代,保持好奇心和耐心同样重要。我是雷总,我们 Goose Pod 下期节目再见!

人形机器人与具身智能在现实世界仍面临挑战。虽有炫酷演示,但技术瓶颈在于“具身知识”缺失,难以处理真实世界的复杂性和不确定性。行业正通过“演示”与“仿真”探索,但通用人形机器人商用尚需时日,特定场景应用是当前可行方向。

Why Humanoid Robots and Embodied AI Still Struggle in the Real World

Read original at Scientific American

December 13, 20254 min readAdd Us On GoogleAdd SciAmGeneral-purpose robots remain rare not for a lack of hardware but because we still can’t give machines the physical intuition humans learn through experienceBERLIN, GERMANY SEPTEMBER 6: The NEURA Robotics humanoid robot 4NE-1 Gen 3 is on display during IFA 2025 in Berlin, Germany, on September 6, 2025.

Artur Widak/NurPhoto via Getty ImagesIn Westworld, humanoid robots pour drinks and ride horses. In Star Wars, “droids” are as ordinary as appliances. That’s the future I keep expecting when I watch the Internet’s new favorite genre: robots dancing, kickboxing or doing parkour. But then I look up from my phone, and there are no androids on the sidewalk.

By robots, I don’t mean the millions that are already deployed on factory floors or the tens of millions that consumers buy annually to vacuum rugs and mow lawns. I mean humanoid robots like C-3PO, Data and Dolores Abernathy: general-purpose humanoids.What’s keeping them off the street is a challenge robotics researchers have circled for decades.

Building robots is easier than making them function in the real world. A robot can repeat a TikTok routine on a flat surface, but the world has uneven sidewalks, slippery stairs and people that rush by. To understand the difficulty, imagine crossing a messy bedroom in the dark while carrying a bowl of soup; every movement requires constant reevaluation and recalibration.

---On supporting science journalismIf you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.---Artificial intelligence language models such as those that power ChatGPT don’t offer an easy solution.

They don’t have embodied knowledge. They’re like people who have read every book on sailing while always remaining on dry land: they can describe the wind and waves and quote famous mariners, but they don’t have a physical sense of how to steer the boat or handle the sail.“Some people think we can get the data from videos of humans—for instance, from YouTube—but looking at pictures of humans doing things doesn’t tell you the actual detailed motions that the humans are performing, and going from 2D to 3D is generally very hard,” said roboticist Ken Goldberg in an August interview with the University of California, Berkeley’s news site.

To explain the gap, Meta’s chief AI scientist Yann LeCun has noted that, by age four, a child has taken in vastly more visual information through their eyes alone than the amount of data that the largest large language models (LLMs) are trained on. “In 4 years, a child has seen 50 times more data than the biggest LLMs,” he wrote on LinkedIn and X last year.

Children are learning from an ocean of embodied experience, and the massive datasets used to train AI systems are puddles by comparison. They’re also the wrong puddle: training an AI on millions of poems and blogs won’t make it any more capable of making your bed.Roboticists have primarily focused on two approaches to closing this gap.

The first is demonstration. Humans teleoperate robotic arms, often through virtual reality, so systems can record what “good behavior” looks like. This has allowed a number of companies to begin building datasets for training future AIs.The second approach is simulation. In virtual environments, AI systems can practice tasks thousands of times faster than humans can in the physical world.

But simulation runs into the reality gap. An easy task in a simulator can fail in reality because the real world contains countless tiny details—friction, squishy materials, lighting quirks.That reality gap explains why a robot parkour star can’t wash your dishes. After the first World Humanoid Robot Games this year in Beijing, where robots competed in soccer and boxing, roboticist Benjie Holson wrote about his disappointment.

What people really want, he argued, is a robot that can do chores. He proposed a new Humanoid Olympics in which robots would face challenges such as folding an inside-out T-shirt, using a dog-poop bag and cleaning peanut butter off their own hand.It’s easy to underestimate the complexity of those tasks.

Consider something as ordinary as reaching into a gym bag crammed with clothes to find one shirt. Every part of your hand and wrist detects textures, shapes and resistance. You can recognize objects by touch and proprioception without having to remove and inspect everything.A useful parallel is a type of robot we’ve been teaching for years, usually without calling it a robot: the self-driving car.

For instance, Tesla collects data from its cars to train the next generation of its self-driving AI. Across the industry, companies have had to collect massive amounts of driving data to reach today’s levels of automation. But humanoids have a harder job than cars. Homes, outdoor spaces and construction sites are far more variable than highways.

This is why engineers design many current robots to function in clearly defined spaces—factories, warehouses, hospitals and sidewalks—and give them one job to do very well. Agility Robotics’ humanoid Digit carries warehouse totes. Figure AI’s robots work on assembly lines. UBTECH’s Walker S2 can lift and carry loads on production lines and autonomously swap out its battery.

And Unitree Robotics’ humanoid robots can walk and squat to pick up and move objects, but they’re still mostly used for research or demonstrations. Though these robots are useful, they’re still far from being a general-purpose household helper.Among those working on robotics, there is broad disagreement about how quickly that gap will close.

In March 2025 Nvidia CEO Jensen Huang told journalists, “This is not five-years-away problem, this is a few-years-away problem.” In September 2025 roboticist Rodney Brooks wrote, “We are more than ten years away from the first profitable deployment of humanoid robots even with minimal dexterity.” He also warned of the dangers that robots pose because of a lack of coordination and a risk of falling.

“My advice to people is to not come closer than 3 meters to a full size walking robot,” Brooks wrote.For now, what’s keeping Main Street from looking like a sci-fi set is that most humanoids are still in the kindergartens we’ve built for them: learning with teleoperators or in simulators. What we don’t know is how long their education will last.

When humanoid robots become commonplace, they’ll be more dynamic than today’s systems but far less flashy than the clips that go viral on TikTok. The future will still be machines doing the jobs for which they’ve been trained, day after day, without drama.It’s Time to Stand Up for ScienceIf you enjoyed this article, I’d like to ask for your support.

Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe.

I hope it does that for you, too.If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.There has never been a more important time for us to stand up and show why science matters.

I hope you’ll support us in that mission.

Analysis

Core Event+

Related Podcasts