## AI Breakthrough: Hierarchical Reasoning Model (HRM) Outperforms LLMs **News Title:** Scientists just developed a new AI modeled on the human brain — it’s outperforming LLMs like ChatGPT at reasoning tasks **Report Provider/Author:** livescience.com, by Keumars Afifi-Sabet **Date/Time Period Covered:** The study was uploaded to arXiv on June 26th. The publication date of the article is August 27th, 2025. ### Main Findings and Conclusions Scientists at Sapient, an AI company in Singapore, have developed a novel artificial intelligence (AI) model called a **Hierarchical Reasoning Model (HRM)**. This new AI is designed to **reason differently** from most Large Language Models (LLMs) like ChatGPT, inspired by the **hierarchical and multi-timescale processing in the human brain**. The HRM has demonstrated **significantly better performance** in key benchmarks, particularly in reasoning tasks. The HRM model is also **more efficient**, requiring fewer parameters and training examples compared to advanced LLMs. This new approach challenges the effectiveness of current "chain-of-thought" (CoT) reasoning used by most LLMs, which the Sapient scientists argue has shortcomings such as "brittle task decomposition, extensive data requirements, and high latency." ### Key Statistics and Metrics * **HRM Parameters:** 27 million * **HRM Training Samples:** 1,000 * **Comparison to LLMs:** Most advanced LLMs have billions or even trillions of parameters. Estimates suggest GPT-5 has between 3 trillion and 5 trillion parameters. **Performance on ARC-AGI Benchmark:** | Model | ARC-AGI-1 Score | ARC-AGI-2 Score | | :-------------- | :------------ | :------------ | | **HRM** | **40.3%** | **5%** | | o3-mini-high | 34.5% | 3% | | Claude 3.7 | 21.2% | 0.9% | | Deepseek R1 | 15.8% | 1.3% | The HRM also achieved **near-perfect performance** on challenging tasks like complex Sudoku puzzles and excelled at optimal path-finding in mazes, tasks that conventional LLMs struggled with. ### Important Recommendations While not explicitly stated as recommendations, the research suggests a shift in AI development towards models that mimic the human brain's processing capabilities for improved reasoning and efficiency. ### Significant Trends or Changes This development signifies a potential **paradigm shift in AI reasoning**, moving away from extensive parameter counts and explicit step-by-step natural language decomposition towards a more integrated, hierarchical approach that mirrors biological intelligence. ### Notable Risks or Concerns The article does not explicitly mention risks or concerns associated with the HRM itself. However, it references a related topic about AI entering an "unprecedented regime" and questions about whether it should be stopped before it causes destruction, indicating broader societal discussions around advanced AI development. ### Material Financial Data No financial data is presented in this news report. ### Model Operation and Architecture The HRM operates through two modules: 1. **High-level module:** Responsible for slow, abstract planning. 2. **Low-level module:** Handles rapid and detailed computations. It executes sequential reasoning tasks in a **single forward pass** without explicit supervision of intermediate steps. The model employs **iterative refinement**, a technique that improves accuracy by repeatedly refining an initial approximation through short bursts of "thinking." Each burst determines whether to continue the process or submit a "final" answer. ### Peer Review and Reproducibility The study was uploaded to the preprint arXiv database and **has yet to be peer-reviewed**. However, organizers of the ARC-AGI benchmark attempted to **recreate the results** after the model was open-sourced on GitHub. They reproduced the numbers but found that the **hierarchical architecture had minimal performance impact**, attributing substantial performance gains to an **under-documented refinement process during training**.
Scientists just developed a new AI modeled on the human brain — it’s outperforming LLMs like ChatGPT at reasoning tasks
Read original at livescience.com →(Image credit: Eugene Mymrin/Getty Images)Scientists have developed a new type of artificial intelligence (AI) model that can reason differently from most large language models (LLMs) like ChatGPT, resulting in much better performance in key benchmarks.The new reasoning AI, called a hierarchical reasoning model (HRM), is inspired by the hierarchical and multi-timescale processing in the human brain — the way different brain regions integrate information over varying durations (from milliseconds to minutes).
Scientists at Sapient, an AI company in Singapore, say this reasoning model can achieve better performance and can work more efficiently. This is thanks to the model requiring fewer parameters and training examples.The HRM model has 27 million parameters while using 1,000 training samples, the scientists said in a study uploaded June 26 to the preprint arXiv database (which has yet to be peer-reviewed).
In comparison, most advanced LLMs have billions or even trillions of parameters. Although an exact figure has not been made public, some estimates suggest that the newly released GPT-5 has between 3 trillion and 5 trillion parameters.A new way of thinking for AIWhen the researchers tested HRM in the ARC-AGI benchmark — a notoriously tough examination that aims to test how close models are to achieving artificial general intelligence (AGI) — the system achieved impressive results, according to the study.
HRM scored 40.3% in ARC-AGI-1, compared with 34.5% for OpenAI's o3-mini-high, 21.2% for Anthropic's Claude 3.7 and 15.8% for Deepseek R1. In the tougher ARC-AGI-2 test, HRM scored 5% versus o3-mini-high's 3%, Deepseek R1's 1.3% and Claude 3.7's 0.9%.Most advanced LLMs use chain-of-thought (CoT) reasoning, in which a complex problem is broken down into multiple, much simpler intermediate steps that are expressed in natural language.
It emulates the human thought process by breaking down elaborate problems into digestible chunks.Get the world’s most fascinating discoveries delivered straight to your inbox.Related: AI is entering an 'unprecedented regime.' Should we stop it — and can we — before it destroys us?But the Sapient scientists argue in the study that CoT has key shortcomings — namely "brittle task decomposition, extensive data requirements, and high latency."
Instead, HRM executes sequential reasoning tasks in a single forward pass, without any explicit supervision of the intermediate steps, through two modules. One high-level module is responsible for slow, abstract planning, while a low-level module handles rapid and detailed computations. This is similar to the way in which the human brain processes information in different regions.
It operates by applying iterative refinement — a computing technique that improves the accuracy of a solution by repeatedly refining an initial approximation — over several short bursts of "thinking." Each burst considers whether the process of thinking should continue or be submitted as a "final" answer to the initial prompt.
HRM achieved near-perfect performance on challenging tasks like complex Sudoku puzzles — which conventional LLMs could not accomplish — as well as excelling at optimal path-finding in mazes.The paper has not been peer-reviewed, but the organizers of the ARC-AGI benchmark attempted to recreate the results for themselves after the study scientists open-sourced their model on GitHub.
Although they reproduced the numbers, representatives said in a blog post, they made some surprising findings, including that the hierarchical architecture had minimal performance impact — instead, there was an under-documented refinement process during training that drove substantial performance gains.
Keumars is the technology editor at Live Science. He has written for a variety of publications including ITPro, The Week Digital, ComputerActive, The Independent, The Observer, Metro and TechRadar Pro. He has worked as a technology journalist for more than five years, having previously held the role of features editor with ITPro.
He is an NCTJ-qualified journalist and has a degree in biomedical sciences from Queen Mary, University of London. He's also registered as a foundational chartered manager with the Chartered Management Institute (CMI), having qualified as a Level 3 Team leader with distinction in 2023.




