## Google DeepMind Unveils Advanced AI Models for Robotics **News Title:** Robots receive major intelligence boost thanks to Google DeepMind’s ‘thinking AI’ — a pair of models that help machines understand the world **Report Provider:** livescience.com **Author:** Alan Bradley **Published Date:** October 10, 2025 ### Key Findings and Conclusions Google DeepMind has introduced two new artificial intelligence (AI) models, Gemini Robotics 1.5 and Google Robotics-ER 1.5, designed to significantly enhance the capabilities of robots. These models enable robots to perform complex, multi-step general tasks and reason about their environment in ways previously not possible. The core advancement lies in the synergistic operation of these two models: * **Google Robotics-ER 1.5 (the "brain"):** This is a vision-language model (VLM) that gathers spatial information, processes natural language commands, and employs advanced reasoning and tools to generate instructions. * **Google Robotics 1.5 (the "hands and eyes"):** This is a vision-language-action (VLA) model that translates instructions from the "brain" into actions by matching them with its visual understanding of the environment. It then builds a plan, executes it, and provides feedback on its processes and reasoning. ### Enhanced Capabilities Demonstrated The new AI models empower robots with sophisticated abilities, moving beyond simple task execution: * **Complex Task Performance:** Robots can now handle multi-step tasks that require spatial reasoning and decision-making. * **"Thinking While Acting":** The models allow robots to perceive their environment, think step-by-step, and complete intricate tasks. * **Natural Language Interaction:** Robots can explain their actions and reasoning in natural language. * **Object Recognition and Sorting:** A demonstration showcased a robot sorting a selection of fruit (banana, apple, lime) into appropriately colored containers, explaining its choices. * **Tool Utilization:** The models can leverage external tools, such as Google Search, to gather information and complete tasks. An example involved a robot using San Francisco's recycling rules, found via internet search, to sort objects into compost, recycling, and trash bins. * **Cross-System Learning:** A significant breakthrough is the ability to learn and apply knowledge across different robotic systems. Learning from one robot (e.g., Aloha 2, Apollo humanoid, Franka bi-arm) can be generalized and applied to others. * **Adaptability to Dynamic Environments:** Robots can re-evaluate and react to changes in their physical environment, as demonstrated by a robot successfully sorting clothes by color even when the clothes and bins were moved. ### Significance of the Advancement The Gemini Robotics Team emphasizes that these generalized reasoning capabilities allow models to approach problems with a broad understanding of physical spaces and interactions, breaking down complex tasks into manageable steps. This contrasts with older, specialized approaches that were limited to narrow, specific situations. Jie Tan, a senior staff research scientist at DeepMind, stated, "We enable it to think... It can perceive the environment, think step-by-step and then finish this multistep task. Although this example seems very simple, the idea behind it is really powerful. The same model is going to power more sophisticated humanoid robots to do more complicated daily tasks." ### Future Implications The development of these advanced AI models is poised to power more sophisticated humanoid robots capable of performing a wider range of complex daily tasks, marking a significant leap forward in the field of robotics.
Robots receive major intelligence boost thanks to Google DeepMind’s ‘thinking AI’ — a pair of models that help machines understand the world
Read original at livescience.com →Google DeepMind has unveiled a pair of artificial intelligence (AI) models that will enable robots to perform complex general tasks and reason in a way that was previously impossible.Earlier this year, the company revealed the first iteration of Gemini Robotics, an AI model based on its Gemini large language model (LLM) — but specialized for robotics.
This allowed machines to reason and perform simple tasks in physical spaces. The baseline example Google points to is the banana test. The original AI model was capable of receiving a simple instruction like "place this banana in the basket," and guiding a robotic arm to complete that command.Powered by the two new models, a robot can now take a selection of fruit and sort them into individual containers based on color.
In one demonstration, a pair of robotic arms (the company's Aloha 2 robot) accurately sorts a banana, an apple and a lime onto three plates of the appropriate color. Further, the robot explains in natural language what it's doing and why as it performs the task. Gemini Robotics 1.5: Thinking while acting - YouTube Watch On "We enable it to think," said Jie Tan, a senior staff research scientist at DeepMind, in the video.
"It can perceive the environment, think step-by-step and then finish this multistep task. Although this example seems very simple, the idea behind it is really powerful. The same model is going to power more sophisticated humanoid robots to do more complicated daily tasks." AI-powered robotics of tomorrow While the demonstration may seem simple on the surface, it demonstrates a number of sophisticated capabilities.
The robot can spatially locate the fruit and the plates, identify the fruit and the color of all of the objects, match the fruit to the plates according to shared characteristics and provide a natural language output describing its reasoning.It's all possible because of the way the newest iterations of the AI models interact.
They work together in much the same way a supervisor and worker do.Google Robotics-ER 1.5 (the "brain") is a vision-language model (VLM) that gathers information about a space and the objects located within it, processes natural language commands and can utilize advanced reasoning and tools to send instructions to Google Robotics 1.
5 (the "hands and eyes"), a vision-language-action (VLA) model. Google Robotics 1.5 matches those instructions to its visual understanding of a space and builds a plan before executing them, providing feedback about its processes and reasoning throughout.The two models are more capable than previous versions and can use tools like Google Search to complete tasks.
The team demonstrated this capacity by having a researcher ask Aloha to use recycling rules based on her location to sort some objects into compost, recycling and trash bins. The robot recognized that the user was located in San Francisco and found recycling rules on the internet to help it accurately sort trash into the appropriate receptacles.
Another advance represented in the new models is the ability to learn (and apply that learning) across multiple robotics systems. DeepMind representatives said in a statement that any learning gleaned across its Aloha 2 robot (the pair of robotics arms), Apollo humanoid robot and bi-arm Franka robot can be applied to any other system due to the generalized way the models learn and evolve.
"General-purpose robots need a deep understanding of the physical world, advanced reasoning, and general and dexterous control," the Gemini Robotics Team said in a technical report on the new models. That kind of generalized reasoning means that the models can approach a problem with a broad understanding of physical spaces and interactions and problem-solve accordingly, breaking tasks down into small, individual steps that can be easily executed.
This contrasts with earlier approaches, which relied on specialized knowledge that only applied to very specific, narrow situations and individual robots.The scientists provided an additional example of how robots could help in a real-world scenario. They presented an Apollo robot with two bins and asked it to sort clothes by color — with whites going into one bin and other colors into the other.
They then added an additional hurdle as the task progressed by moving the clothes and bins around, forcing the robot to reevaluate the physical space and react accordingly, which it managed successfully.




