PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Don’t Panic: ‘Humanity’s Last Exam’ has begun

2026-02-25
(Press-News.org) When artificial intelligence systems began acing long‑standing academic assessments, researchers realized they had a problem: the tests were too easy. Popular evaluations, such as the Massive Multitask Language Understanding (MMLU) exam, once considered formidable, are no longer challenging enough to meaningfully test advanced AI systems.

To address this gap, a global consortium of nearly 1,000 researchers, including a Texas A&M University professor, created something different — an exam so broad, so challenging and so deeply rooted in expert human knowledge that current AI systems consistently fail it.

“Humanity’s Last Exam” (HLE) introduces a 2,500‑question assessment spanning mathematics, humanities, natural sciences, ancient languages and highly specialized subfields. The team's work is outlined in a paper published in Nature with documentation from the project available at lastexam.ai.

Among the long list of contributors is Dr. Tung Nguyen, instructional associate professor in the Department of Computer Science and Engineering at Texas A&M, who participated in authoring and refining questions.

“When AI systems start performing extremely well on human benchmarks, it’s tempting to think they’re approaching human‑level understanding,” Nguyen said. “But HLE reminds us that intelligence isn’t just about pattern recognition — it’s about depth, context and specialized expertise.”

The point wasn’t to stump humans. It was to reveal, precisely and systematically, what AI cannot do, at least not yet.

A global effort to measure AI’s limits Questions for HLE were written and reviewed by experts in their fields from all over the world, who ensured each one had a single, unambiguous, verifiable answer that couldn’t be solved instantly through internet retrieval. The prompts draw from expert-level academic problems: from translating ancient Palmyrene inscriptions to identifying microanatomical structures in birds or analyzing the intricate features of Biblical Hebrew pronunciation.

Each question was tested against leading AI models. If any system could answer it correctly, the question was removed. The result is an exam deliberately engineered to sit just beyond current AI capability.

And it worked. Early results showed that even the most advanced models struggled. GPT‑4o scored 2.7%; Claude 3.5 Sonnet reached 4.1%; OpenAI’s flagship o1 model achieved only 8%. The most advanced models, including Gemini 3.1 Pro and Claude Opus 4.6, have reached around 40% to 50% accuracy.

Why a new benchmark matters The problem with AI outgrowing traditional benchmarks isn’t simply academic, said Nguyen, who contributed 73 of the 2,500 public questions (the second-highest author), and authored the most questions in math and computer science.

“Without accurate assessment tools, policymakers, developers and users risk misinterpreting what AI systems can actually do,” he said. “Benchmarks provide the foundation for measuring progress and identifying risks.”

As the team’s paper notes, while AI may excel on exams designed for humans, those tests aren’t necessarily measuring “intelligence.” They measure performance on a set of tasks crafted for a very different kind of learner.

Not a threat, a tool Despite its apocalyptic name, Humanity’s Last Exam isn’t meant to suggest the end of human relevance. Instead, it highlights how much knowledge remains uniquely human and how far AI systems still have to go.

“This isn’t a race against AI,” Nguyen said. “It’s a method for understanding where these systems are strong and where they struggle. That understanding helps us build safer, more reliable technologies. And, importantly, it reminds us why human expertise still matters.”

A future-proof exam HLE is intended to serve as a long‑term, transparent benchmark for evaluating advanced AI systems. As part of that mission, the team has made some of the exam publicly available, while keeping most of the test questions hidden so AI models can’t memorize the answers.

“For now, Humanity’s Last Exam stands as one of the clearest assessments of the gap between AI and human intelligence,” Nguyen said, “and despite rapid technological advances, it remains wide.”

Research on a grand scale Nguyen noted the massive project reflects the importance of interdisciplinary, international research efforts.

“What made this project extraordinary was the scale,” he said. “Experts from nearly every discipline contributed. It wasn’t just computer scientists; it was historians, physicists, linguists, medical researchers. That diversity is exactly what exposes the gaps in today’s AI systems —perhaps ironically, it’s humans working together.”

###

END


ELSE PRESS RELEASES FROM THIS DATE:

A robust new telecom qubit in silicon

2026-02-25
Quantum technologies are anticipated to transform computing, communication and sensing by harnessing the unusual behavior of matter at the atomic scale. Translating quantum’s promise into practical devices will require physical systems that have desirable quantum properties and can be easily manufactured. Silicon, the material behind today’s computer chips, is highly attractive as a platform because it plays to the strengths of the trillion-dollar semiconductor industry that has already been built. Identifying quantum building blocks — qubits —in silicon is, therefore, an important frontier research ...

Vertebrate paleontology has a numbers problem. Computer vision can help

2026-02-25
How many fossils does it take to accurately train an image-based AI algorithm? According to a new study co-authored by Bruce MacFadden, UF Distinguished Professor Emeritus and retired curator of vertebrate paleontology at the Florida Museum of Natural History, the answer is somewhere around 250. This number is much lower than the amount scientists previously thought was needed. This is a new spin on an old question that paleontologists have contended with for years. The amount of information that can be gleaned from a single fossil is limited to a few bare facts. If they’re ...

Reinforced enzyme expression drives high production of durable lactate-based polyester

2026-02-25
Bio-based polyhydroxyalkanoates (PHAs) are considered one of the most promising sustainable alternatives to fossil-derived plastics. Poly[(D-lactate)-co-(R)-3-hydroxybutyrate] (LAHB) is an environmentally biodegradable microbial copolyester, and its lactate (LA) content significantly influences its properties. A new study shows how reinforcing the gene expression of the LA-polymerizing enzyme in a recombinant strain of Cupriavidus necator improves the LA fraction. The LA-enriched LAHB maintained a high molecular weight and displayed a balance of strength and elongation ...

In Rett syndrome, leaky brain blood vessels traced to microRNA

2026-02-25
MIT researchers have discovered that two common genetic mutations that cause Rett syndrome each set off a molecular chain of events that compromises the structural integrity of developing brain blood vessels, making them leaky. The study traces the problem to overexpression of a particular microRNA (miRNA-126-3p), and shows that tamping down the miRNA’s levels helps to rescue the vascular defect. Rett syndrome is a severe developmental disorder affecting both the brain and body. It is caused by various mutations in the widely expressed MECP2 ...

Scientists sharpen genetic maps to help pinpoint DNA changes that influence human health traits and disease risk

2026-02-25
Scientists have identified how specific genetic changes function in cells to influence disease risk and other human health traits. By probing regions of DNA previously linked to disease, the work has created high resolution maps of DNA variant activity, helping pinpoint the exact changes that shape blood pressure, cholesterol levels, blood sugar and other complex human traits. The study, published today in Nature and led by researchers from The Jackson Laboratory (JAX), the Broad Institute, and Yale University, takes on a long-standing challenge in human genetics. Scientists have known for years that ...

AI, monkey brains, and the virtue of small thinking

2026-02-25
What does it take to make AI that can pass as human? Try massive clusters of supercomputers. To build human-like intelligence, computer scientists think big. However, for neuroscientists who want to understand how real brains work, today’s AI only goes so far, as it replaces one deeply complicated system (the brain) with another (AI). How then do we figure out the inner workings of the biological brain? To answer this question, Cold Spring Harbor Laboratory Assistant Professor Benjamin Cowley is thinking small. In collaboration with Carnegie Mellon University Professor Matthew Smith and Princeton ...

Firearm mortality and equitable access to trauma care in Chicago

2026-02-25
About The Study: Strategic placement of a trauma center in an area with high rates of violent injury and limited trauma care access was associated with significantly reduced mortality within the service area. These findings should inform trauma system planning to address geographic disparities in trauma care access, particularly in communities with high rates of penetrating trauma.  Corresponding Author: To contact the corresponding author, Michael R. Poulson, MD, MPH, email michael.poulson@uchicagomedicine.org. To ...

Worldwide radiation dose in coronary artery disease diagnostic imaging

2026-02-25
About The Study: Given increasing rates of coronary artery disease (CAD) worldwide, the findings of this study of marked variation in radiation dose to patients from diagnostic testing identify a critical need for training, standardized protocols, and updated equipment to reduce radiation worldwide. This especially affects patients in low- and middle-income countries and patients undergoing coronary computed tomography angiograph. There are therefore important opportunities to improve the quality of CAD diagnosis for patients across the globe. Corresponding ...

Heat and pregnancy

2026-02-25
About The Article: Climate change is increasing the frequency and intensity of heat waves and the exposure of pregnant individuals to extreme heat. This article summarizes current evidence about risks to maternal health from ambient heat (hot weather, high indoor temperatures, and occupational exposures) and how these risks can be managed. Corresponding Author: To contact the corresponding author, Sari Kovats, PhD, email Sari.kovats@lshtm.ac.uk. To access the embargoed study: Visit our For The Media website at this link ...

Superagers’ brains have a ‘resilience signature,’ and it’s all about neuron growth

2026-02-25
Brains of older adults with super healthy cognition grow more new neurons than those of their peers, according to a study from UIC, Northwestern University and the University of Washington. Researchers found that the brains of superagers — octogenarians with uncommonly nimble minds — were the most neuronally fertile, while those with Alzheimer’s disease had negligible new growth. “This is a big step forward in understanding how the human brain processes cognition, forms memories and ages. Determining why some brains age more healthily ...

LAST 30 PRESS RELEASES:

New “lock-and-key” chemistry

Benzodiazepine use declines across the U.S., led by reductions in older adults

How recycled sewage could make the moon or Mars suitable for growing crops

Don’t Panic: ‘Humanity’s Last Exam’ has begun

A robust new telecom qubit in silicon

Vertebrate paleontology has a numbers problem. Computer vision can help

Reinforced enzyme expression drives high production of durable lactate-based polyester

In Rett syndrome, leaky brain blood vessels traced to microRNA

Scientists sharpen genetic maps to help pinpoint DNA changes that influence human health traits and disease risk

AI, monkey brains, and the virtue of small thinking

Firearm mortality and equitable access to trauma care in Chicago

Worldwide radiation dose in coronary artery disease diagnostic imaging

Heat and pregnancy

Superagers’ brains have a ‘resilience signature,’ and it’s all about neuron growth

New research sheds light on why eczema so often begins in childhood

Small models, big insights into vision

Finding new ways to kill bacteria

An endangered natural pharmacy hidden in coral reefs

The Frontiers of Knowledge Award goes to Charles Manski for incorporating uncertainty into economic research and its application to public policy analysis

Walter Koroshetz joins Dana Foundation as senior advisor

Next-generation CAR-T designs that could transform cancer treatment

As health care goes digital, patients are being left behind

A clinicopathologic analysis of 740 endometrial polyps: risk of premalignant changes and malignancy

Gibson Oncology, NIH to begin Phase 2 trials of LMP744 for treatment of first-time recurrent glioblastoma

Researchers develop a high-efficiency photocatalyst using iron instead of rare metals

Study finds no evidence of persistent tick-borne infection in people who link chronic illness to ticks

New system tracks blockchain money laundering faster and more accurately

In vitro antibacterial activity of crude extracts from Tithonia diversifolia (asteraceae) and Solanum torvum (solanaceae) against selected shigella species

Qiliang (Andy) Ding, PhD, named recipient of the 2026 ACMG Foundation Rising Scholar Trainee Award

Heat-free gas sensing: LED-driven electronic nose technology enhances multi-gas detection

[Press-News.org] Don’t Panic: ‘Humanity’s Last Exam’ has begun