What happened
---
Main Findings and Conclusions
Google DeepMind has unveiled a new vision language action (VLA) model named Gemini Robotics On-Device, designed to operate directly on robotic devices without requiring an internet connection. This advancement signifies a step towards more robust and responsive robotic systems, particularly for applications where...
Google DeepMind introduced a vision language action (VLA) model that runs locally on robotic devices, without accessing a data network.The new Gemini Robotics On-Device robotics foundation model features general-purpose dexterity and fast task adaptation, the company said in a Tuesday (June 24) blog post.
“Since the model operates independent of a data network, it’s helpful for latency sensitive applications and ensures robustness in environments with intermittent or zero connectivity,” Google DeepMind Senior Director and Head of Robotics Carolina Parada said in the post.Building on the task generalization and dexterity capabilities of Gemini Robotics, which was introduced in March, Gemini Robotics On-Device is meant for bi-arm robots and is designed to enable rapid experimentation with dexterous manipulation and adaptability to new tasks through fine-tuning, according to the post.
The model follows natural language instructions and is dexterous enough to perform tasks like unzipping bags, folding clothes, zipping a lunchbox, drawing a card, pouring salad dressing and assembling products, per the post.It is also Google DeepMind’s first VLA model that is available for fine-tuning, per the post.
“While many tasks will work out of the box, developers can also choose to adapt the model to achieve better performance for their applications,” Parada said in the post. “Our model quickly adapts to new tasks, with as few as 50 to 100 demonstrations — indicating how well this on-device model can generalize its foundational knowledge to new tasks.
”Google DeepMind’s Gemini Robotics is one of several companies’ efforts to develop humanoid robots that can do general tasks, PYMNTS reported in March.Robotics are in fashion as in Silicon Valley as large language models are giving robots the capability to understand natural language commands and do complex tasks.
The company’s advancements in Gemini Robotics show that the decision to make Gemini multimodal — taking and generating text, images and audio — is the path toward better reasoning. Gemini’s multimodality can spawn a whole new genre of consumer products for Google, PYMNTS reported in April.Several other companies are also developing AI-powered robots demonstrating advancements in general tasks, making for a crowded market, PYMNTS reported in February.
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.See More In: artificial intelligence, deepmind, digital transformation, GenAI, Google, Innovation, News, PYMNTS News, Robots, Technology, What's Hot
Source coverage
Of course. Here is a comprehensive summary of the news article, formatted as requested.
Summary of Report on Google DeepMind's On-Device Robotics AI Model
Deeper analysis
Full source content
Google DeepMind introduced a vision language action (VLA) model that runs locally on robotic devices, without accessing a data network.The new Gemini Robotics On-Device robotics foundation model features general-purpose dexterity and fast task adaptation, the company said in a Tuesday (June 24) blog post.
“Since the model operates independent of a data network, it’s helpful for latency sensitive applications and ensures robustness in environments with intermittent or zero connectivity,” Google DeepMind Senior Director and Head of Robotics Carolina Parada said in the post.Building on the task generalization and dexterity capabilities of Gemini Robotics, which was introduced in March, Gemini Robotics On-Device is meant for bi-arm robots and is designed to enable rapid experimentation with dexterous manipulation and adaptability to new tasks through fine-tuning, according to the post.
The model follows natural language instructions and is dexterous enough to perform tasks like unzipping bags, folding clothes, zipping a lunchbox, drawing a card, pouring salad dressing and assembling products, per the post.It is also Google DeepMind’s first VLA model that is available for fine-tuning, per the post.
“While many tasks will work out of the box, developers can also choose to adapt the model to achieve better performance for their applications,” Parada said in the post. “Our model quickly adapts to new tasks, with as few as 50 to 100 demonstrations — indicating how well this on-device model can generalize its foundational knowledge to new tasks.
”Google DeepMind’s Gemini Robotics is one of several companies’ efforts to develop humanoid robots that can do general tasks, PYMNTS reported in March.Robotics are in fashion as in Silicon Valley as large language models are giving robots the capability to understand natural language commands and do complex tasks.
The company’s advancements in Gemini Robotics show that the decision to make Gemini multimodal — taking and generating text, images and audio — is the path toward better reasoning. Gemini’s multimodality can spawn a whole new genre of consumer products for Google, PYMNTS reported in April.Several other companies are also developing AI-powered robots demonstrating advancements in general tasks, making for a crowded market, PYMNTS reported in February.
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.See More In: artificial intelligence, deepmind, digital transformation, GenAI, Google, Innovation, News, PYMNTS News, Robots, Technology, What's Hot
How this page is built
Goose Pod turns cited reporting into a public episode summary first, then pairs that summary with audio playback so listeners can check the source material before they decide how deeply to engage.
The goal is to make this page useful as a news landing page first, while still giving listeners transcript access, related episodes, and direct links back to the original publishers.



