Robots receive major intelligence boost thanks to Google DeepMind’s ‘thinking AI’ — a pair of models that help machines understand the world | Goose Pod

Authors: Alan Bradley

Publisher:

livescience.com

Published: 10/10/2025

Language:English

--:--

Mask

Good evening David, I'm Mask, and this is Goose Pod for you. Today is Wednesday, October 15th.

Taylor Weaver

And I’m Taylor Weaver. We're here to discuss a massive intelligence boost for robots from Google DeepMind.

Mask

Let's get started. DeepMind has unleashed two new AI models, Gemini Robotics 1.5 and an 'ER' version. This isn't about just placing a banana in a basket anymore. This is about enabling robots to perceive, think step-by-step, and execute complex, multistep tasks. It's a fundamental shift.

Taylor Weaver

It's a classic supervisor-worker story, which I love! The 'ER' model is the brain, the strategist. It's a vision-language model that sees the world, understands commands, and forms a plan. Then it sends instructions to the standard 1.5 model, which acts as the 'hands and eyes' to execute the plan.

Mask

Exactly. It's not just a set of instructions; it's a dynamic relationship. The robot can now sort fruit by color and explain its reasoning as it works. While the task is simple, the principle is incredibly powerful. This same model will scale to more sophisticated humanoid robots.

Taylor Weaver

And it's not just a Google story. This reminds me of Dyna Robotics, which was co-founded by a former DeepMind research scientist. They're also building these general-purpose foundation models. The entire field is moving towards this embodied, thinking AI. It’s a race to create true physical AGI.

Taylor Weaver

To really get this, we have to look back. This breakthrough didn't just happen overnight. DeepMind's robotics research started when they saw that perception tech, like computer vision, was getting incredibly good. They realized that strong perception was the key to unlocking real-world robotics.

Mask

Right. They correctly concluded that the entire robotics problem was just a subset of the general AI problem. The hardware is secondary, the intelligence is the real challenge. Solve for intelligence, and the robots will naturally follow. It’s about building a better brain, not just a better body.

Taylor Weaver

And they proved it years ago with a skunkworks project! They took some discontinued robot arms and used machine learning to teach them how to grasp different objects. That success demonstrated that deep learning was the path forward, and it kicked off their whole focus on this area.

Mask

Which brings us to their current strategy of massive scale. They launched the Open X-Embodiment database, a collaboration with over 30 research institutes. They're essentially creating the ImageNet for robotics, building the foundational dataset for the entire industry to use. It’s a brilliant, dominant move.

Taylor Weaver

And it's already paying off! They trained their RT-1-X model on that shared data and sent it to other labs. The model achieved a 50% success rate on their tasks right away. That’s the power of a shared narrative, or in this case, a shared dataset. It elevates everyone.

Mask

Of course, progress like this terrifies people. They hear 'thinking robots' and their minds jump straight to a job apocalypse. DeepMind's CEO Demis Hassabis even warned that AGI could disrupt jobs within five years. The reality is, people and industries must adapt or they will be left behind. It's that simple.

Taylor Weaver

But it's not a simple replacement story, it's a transformation. We're already seeing fascinating new roles emerge from the complexity, like the 'AI Ethics and Risk Officer.' Their job isn't to slam the brakes on innovation, but to be a thinking partner that installs the necessary guardrails.

Mask

Guardrails are fine, but the real challenge isn't about slowing down, it's about thoughtful integration. The goal is to make robotics a partner to human workers, not a replacement. If we fail to manage that displacement, we risk serious social and economic unrest. The stakes are incredibly high.

Taylor Weaver

I think it reframes the human role entirely. You're no longer just the operator. You become the strategist, the collaborator, the 'human half of the research duo.' In a world where insight is measured in milliseconds, having an AI co-pilot isn't a threat, it's the only way to stay relevant.

Mask

And the economic impact will be monumental. An AI boom alone is just information. To get a true economic super-boom, you have to automate physical labor. Without these super-capable robots, all that digital intelligence gets bottlenecked in the real world. This technology unlocks that bottleneck.

Taylor Weaver

The projections are just wild. Goldman Sachs sees a potential $38 billion market for humanoid robots by 2035, with 1.4 million units shipped. They're framing it as the next commonly adopted technology after EVs and smartphones. And with costs dropping 40%, it's becoming viable much faster than anyone expected.

Taylor Weaver

I also love the vision behind concepts like RoboBallet. It's about transforming the factory floor from a place of clunky, rigid machines into a synchronized, fluid, almost beautiful assembly line. It’s a much more elegant narrative for the future of automation.

Mask

This really is a new paradigm. We're creating 'true physical agents.' The key is what DeepMind calls 'Embodied Thinking.' The robot literally engages in an internal monologue, reasoning in natural language to break down a complex command into primitive motions before it even moves. It thinks before it acts.

Taylor Weaver

And it’s all powered by that elegant 'two-brain' system. One model is the orchestrator, doing the high-level strategic planning, while the other translates that plan into physical action. It's the perfect system for turning abstract thought into tangible reality.

Mask

That's all for today. Thanks for listening to Goose Pod, David.

Taylor Weaver

We'll see you tomorrow with the next big story!

## Google DeepMind Unveils Advanced AI Models for Robotics **News Title:** Robots receive major intelligence boost thanks to Google DeepMind’s ‘thinking AI’ — a pair of models that help machines understand the world **Report Provider:** livescience.com **Author:** Alan Bradley **Published Date:** October 10, 2025 ### Key Findings and Conclusions Google DeepMind has introduced two new artificial intelligence (AI) models, Gemini Robotics 1.5 and Google Robotics-ER 1.5, designed to significantly enhance the capabilities of robots. These models enable robots to perform complex, multi-step general tasks and reason about their environment in ways previously not possible. The core advancement lies in the synergistic operation of these two models: * **Google Robotics-ER 1.5 (the "brain"):** This is a vision-language model (VLM) that gathers spatial information, processes natural language commands, and employs advanced reasoning and tools to generate instructions. * **Google Robotics 1.5 (the "hands and eyes"):** This is a vision-language-action (VLA) model that translates instructions from the "brain" into actions by matching them with its visual understanding of the environment. It then builds a plan, executes it, and provides feedback on its processes and reasoning. ### Enhanced Capabilities Demonstrated The new AI models empower robots with sophisticated abilities, moving beyond simple task execution: * **Complex Task Performance:** Robots can now handle multi-step tasks that require spatial reasoning and decision-making. * **"Thinking While Acting":** The models allow robots to perceive their environment, think step-by-step, and complete intricate tasks. * **Natural Language Interaction:** Robots can explain their actions and reasoning in natural language. * **Object Recognition and Sorting:** A demonstration showcased a robot sorting a selection of fruit (banana, apple, lime) into appropriately colored containers, explaining its choices. * **Tool Utilization:** The models can leverage external tools, such as Google Search, to gather information and complete tasks. An example involved a robot using San Francisco's recycling rules, found via internet search, to sort objects into compost, recycling, and trash bins. * **Cross-System Learning:** A significant breakthrough is the ability to learn and apply knowledge across different robotic systems. Learning from one robot (e.g., Aloha 2, Apollo humanoid, Franka bi-arm) can be generalized and applied to others. * **Adaptability to Dynamic Environments:** Robots can re-evaluate and react to changes in their physical environment, as demonstrated by a robot successfully sorting clothes by color even when the clothes and bins were moved. ### Significance of the Advancement The Gemini Robotics Team emphasizes that these generalized reasoning capabilities allow models to approach problems with a broad understanding of physical spaces and interactions, breaking down complex tasks into manageable steps. This contrasts with older, specialized approaches that were limited to narrow, specific situations. Jie Tan, a senior staff research scientist at DeepMind, stated, "We enable it to think... It can perceive the environment, think step-by-step and then finish this multistep task. Although this example seems very simple, the idea behind it is really powerful. The same model is going to power more sophisticated humanoid robots to do more complicated daily tasks." ### Future Implications The development of these advanced AI models is poised to power more sophisticated humanoid robots capable of performing a wider range of complex daily tasks, marking a significant leap forward in the field of robotics.

Robots receive major intelligence boost thanks to Google DeepMind’s ‘thinking AI’ — a pair of models that help machines understand the world

Read original at livescience.com →

Google DeepMind has unveiled a pair of artificial intelligence (AI) models that will enable robots to perform complex general tasks and reason in a way that was previously impossible.Earlier this year, the company revealed the first iteration of Gemini Robotics, an AI model based on its Gemini large language model (LLM) — but specialized for robotics.

This allowed machines to reason and perform simple tasks in physical spaces. The baseline example Google points to is the banana test. The original AI model was capable of receiving a simple instruction like "place this banana in the basket," and guiding a robotic arm to complete that command.Powered by the two new models, a robot can now take a selection of fruit and sort them into individual containers based on color.

In one demonstration, a pair of robotic arms (the company's Aloha 2 robot) accurately sorts a banana, an apple and a lime onto three plates of the appropriate color. Further, the robot explains in natural language what it's doing and why as it performs the task. Gemini Robotics 1.5: Thinking while acting - YouTube Watch On "We enable it to think," said Jie Tan, a senior staff research scientist at DeepMind, in the video.

"It can perceive the environment, think step-by-step and then finish this multistep task. Although this example seems very simple, the idea behind it is really powerful. The same model is going to power more sophisticated humanoid robots to do more complicated daily tasks." AI-powered robotics of tomorrow While the demonstration may seem simple on the surface, it demonstrates a number of sophisticated capabilities.

The robot can spatially locate the fruit and the plates, identify the fruit and the color of all of the objects, match the fruit to the plates according to shared characteristics and provide a natural language output describing its reasoning.It's all possible because of the way the newest iterations of the AI models interact.

They work together in much the same way a supervisor and worker do.Google Robotics-ER 1.5 (the "brain") is a vision-language model (VLM) that gathers information about a space and the objects located within it, processes natural language commands and can utilize advanced reasoning and tools to send instructions to Google Robotics 1.

5 (the "hands and eyes"), a vision-language-action (VLA) model. Google Robotics 1.5 matches those instructions to its visual understanding of a space and builds a plan before executing them, providing feedback about its processes and reasoning throughout.The two models are more capable than previous versions and can use tools like Google Search to complete tasks.

The team demonstrated this capacity by having a researcher ask Aloha to use recycling rules based on her location to sort some objects into compost, recycling and trash bins. The robot recognized that the user was located in San Francisco and found recycling rules on the internet to help it accurately sort trash into the appropriate receptacles.

Another advance represented in the new models is the ability to learn (and apply that learning) across multiple robotics systems. DeepMind representatives said in a statement that any learning gleaned across its Aloha 2 robot (the pair of robotics arms), Apollo humanoid robot and bi-arm Franka robot can be applied to any other system due to the generalized way the models learn and evolve.

"General-purpose robots need a deep understanding of the physical world, advanced reasoning, and general and dexterous control," the Gemini Robotics Team said in a technical report on the new models. That kind of generalized reasoning means that the models can approach a problem with a broad understanding of physical spaces and interactions and problem-solve accordingly, breaking tasks down into small, individual steps that can be easily executed.

This contrasts with earlier approaches, which relied on specialized knowledge that only applied to very specific, narrow situations and individual robots.The scientists provided an additional example of how robots could help in a real-world scenario. They presented an Apollo robot with two bins and asked it to sort clothes by color — with whites going into one bin and other colors into the other.

They then added an additional hurdle as the task progressed by moving the clothes and bins around, forcing the robot to reevaluate the physical space and react accordingly, which it managed successfully.