PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Expert consensus outlines a standardized framework to evaluate clinical large language models

Consensus proposes retrospective workflows, metrics, multidisciplinary teams, design principles, feedback, and reporting standards for safe deployment of models in healthcare

2026-01-23
(Press-News.org)

A new expert consensus made available online on 10 October 2025 and published in Volume 5, Issue 4 of the journal Intelligent Medicine on 1 November 2025, sets out a structured framework to assess large language models (LLMs) before they are introduced into clinical workflows. The guidance responds to the rapid uptake of artificial intelligence (AI) tools for diagnostic support, medical documentation, and patient communication, and the corresponding need for consistent evaluation of safety, effectiveness, and fairness.

The consensus formalizes retrospective evaluation—testing fully trained models on real or simulated clinical data in specific care contexts, without further modifying the models—to verify performance, ethical compliance, and operational readiness prior to deployment.

Developed in line with World Health Organization guideline methods and registered on the Practice Guideline Registration for Transparency (PREPARE) platform (ID: PREPARE-2025CN503), the consensus draws on literature review, Delphi procedures, and multidisciplinary expert deliberation. In the final round, 35 experts achieved agreement on six recommendations.

What does the framework include?

Evaluation workflows prioritizing scientific rigor, objectivity, comprehensiveness, and ethics (e.g., double-blind procedures, conflict-of-interest transparency). Integrated metrics combining quantitative measures (accuracy, recall, F1-score; BLEU/ROUGE for generation) with structured qualitative ratings (e.g., mean opinion scores for accuracy, completeness, safety, practicality, professionalism). Multidisciplinary teams spanning clinicians, data and computer engineers, ethicists, legal experts, and statisticians, with standardized training and role definitions. Dataset design principles centered on clinical authenticity, broad representativeness across diseases, populations, and institutions, and fairness for vulnerable groups, with modular versioning and privacy/compliance safeguards. Feedback and versioning mechanisms to update standards as technology, regulations, or application scope evolve, including transparent dispute-resolution processes. Standardized reporting templates to improve transparency, reproducibility, and comparability across evaluations.

The consensus also defines six key LLM capability domains for assessment: medical knowledge question and answer; complex medical language understanding; diagnosis and treatment recommendation; medical documentation generation; multi-turn dialogue; and multimodal dialogue.

Emphasizing essential safeguards for patient data protection, bias mitigation, and the need for AI outputs to remain clinically explainable, the authors of the consensus are positioned to support the advancement of safer, more reliable, and ethically governed LLM applications within healthcare systems globally.

 

***

 

Reference
DOI: 10.1016/j.imed.2025.09.001

 

About the journal
Intelligent Medicine is a peer-reviewed, open-access journal focusing on the integration of artificial intelligence, data science, and digital technology in clinical medicine and public health. It is published by the Chinese Medical Association in partnership with Elsevier. To learn more about Intelligent Medicine, please visit https://www.sciencedirect.com/journal/intelligent-medicine


Funding information
The authors received no financial support for this research.

END



ELSE PRESS RELEASES FROM THIS DATE:

Bioengineered tissue as a revolutionary treatment for secondary lymphedema

2026-01-23
The rising incidence of cancer worldwide has led to an increasing number of surgeries that involve the removal of lymph nodes. Although these procedures play a major role in cancer staging and preventing the spread of malignancies, they sometimes come with severe long-term consequences. Since lymph nodes do not naturally regenerate once removed, their absence can lead to a condition known as secondary lymphedema. It manifests as chronic swelling, discomfort, and reduced mobility in affected limbs or regions, severely affecting a patient’s quality of life. Consequently, a major focus within the ...

Forty years of tracking trees reveals how global change is impacting Amazon and Andean Forest diversity

2026-01-23
New research published in Nature Ecology and Evolution reveals significant recent shifts in tree diversity among the tropical forests of the Andes and Amazon, driven by global change. The study, led by Dr Belen Fadrique from the University of Liverpool, uses 40 years of records on tree species collected by hundreds of international botanists and ecologists in long-term plots to offer comprehensive insights into tree diversity change in the world’s most diverse forests. Key Findings At the continental level, ...

Breathing disruptions during sleep widespread in newborns with severe spina bifida

2026-01-23
Children with spina bifida, a malformation of the spinal cord that can lead to mobility impairments and hydrocephalus — a buildup of fluid in the brain — face significant risk of cognitive difficulties throughout their lives. A new multi-center study led by researchers at Washington University School of Medicine in St. Louis and Michigan Medicine finds that breathing problems during sleep are a widespread but often undetected issue among these babies and raises the possibility that early treatment might significantly improve ...

Whales may divide resources to co-exist under pressures from climate change

2026-01-23
The North Atlantic Ocean is warming up. Higher temperatures and increased human activity in the region can trigger abrupt changes in marine ecosystems, for example how species are distributed and what they eat. In a long-term study published in Frontiers in Marine Science, researchers in Canada have examined the diet of three rorqual whale species and how these whales might have adapted their feeding habits as climate change and increasing human presence reshape the ecosystem of the Gulf of St Lawrence (GSL), a seasonally important feeding area for many whale species. “A ...

Why wetland restoration needs citizens on the ground

2026-01-23
Wetland restoration is expanding worldwide, but long-term success often remains uncertain. Most projects rely on short-term, expert monitoring that ends long before restored wetlands stabilize, leaving major gaps in understanding how restored wetlands actually evolve over time. One increasingly discussed way to close these gaps is to extend monitoring beyond professional teams by engaging local communities and citizens in long-term observation. In a Perspective published (DOI: 10.1016/j.ese.2026.100656) in Environmental Science and Ecotechnology in January 2026, researchers from Aarhus University ...

Sharktober: Study links October shark bite spike to tiger shark reproduction

2026-01-23
New University of Hawaiʻi research confirms that “Sharktober” is real, revealing a statistically significant spike in shark bite incidents in Hawaiian waters every October. The study, which analyzed 30 years of data (1995–2024), found that about 20% of all recorded bites occurred in that single month, a frequency far exceeding any other time of the year. Researchers at UH Mānoa’s Hawai‘i Institute of Marine Biology (HIMB) Shark Lab published their findings in Frontiers in Marine Science. The research, led by HIMB Professor Carl G. Meyer, determined ...

PPPL launches STELLAR-AI platform to accelerate fusion energy research

2026-01-23
A new computing platform that pairs artificial intelligence (AI) with high performance computing aims to end the bottleneck holding back fusion energy research by speeding the simulations needed to advance the field.  The project — known as the Simulation, Technology, and Experiment Leveraging Learning-Accelerated Research enabled by AI (STELLAR-AI ) — will be led by the U.S. Department of Energy’s (DOE) Princeton Plasma Physics Laboratory (PPPL). STELLAR-AI will expand far beyond the Lab’s walls, however, bringing together national laboratories, universities, technology companies and industry partners to build the computational foundation ...

Breakthrough in development of reliable satellite-based positioning for dense urban areas

2026-01-23
Global navigation satellite systems (GNSS) are vital for positioning autonomous vehicles, buses, drones, and outdoor robots. Yet its accuracy often degrades in dense urban areas due to signal blockage and reflections. Now, researchers have developed a GNSS-only method that delivers stable, accurate positioning without relying on fragile carrier-phase ambiguity resolution. Tested across six challenging urban scenarios, the approach consistently outperformed existing methods, enabling safer and more reliable autonomous navigation.   Accurately determining position is critical for the safety and reliability of autonomous vehicles and outdoor ...

DNA-templated method opens new frontiers in synthesizing amorphous silver nanostructures

2026-01-23
A research team from the Institute of Physics, Chinese Academy of Sciences has developed a novel DNA origami-based technique to synthesize stable, monolithic amorphous silver nanostructures under ambient conditions. By using DNA scaffold with fivefold rotational symmetry, the method introduces geometric frustration that effectively suppresses crystallization in metallic silver, a traditionally challenging feat due to the natural tendency of silver to form crystalline structures. Detailed characterization and molecular ...

Stress-testing AI vision systems: Rethinking how adversarial images are generated

2026-01-23
Deep neural networks (DNNs) have become a cornerstone of modern AI technology, driving a thriving field of research in image-related tasks. These systems have found applications in medical diagnosis, automated data processing, computer vision, and various forms of industrial automation, to name a few. As our reliance on AI models grows, so does our need to test them thoroughly using adversarial examples. Simply put, adversarial examples are images that have been strategically modified with noise to trick an AI into making a mistake. Understanding adversarial image generation techniques is essential for identifying vulnerabilities in DNNs and for developing more secure, reliable systems. Despite ...

LAST 30 PRESS RELEASES:

Scientists show how to predict world’s deadly scorpion hotspots

ASU researchers to lead AAAS panel on water insecurity in the United States

ASU professor Anne Stone to present at AAAS Conference in Phoenix on ancient origins of modern disease

Proposals for exploring viruses and skin as the next experimental quantum frontiers share US$30,000 science award

ASU researchers showcase scalable tech solutions for older adults living alone with cognitive decline at AAAS 2026

Scientists identify smooth regional trends in fruit fly survival strategies

Antipathy toward snakes? Your parents likely talked you into that at an early age

Sylvester Cancer Tip Sheet for Feb. 2026

Online exposure to medical misinformation concentrated among older adults

Telehealth improves access to genetic services for adult survivors of childhood cancers

Outdated mortality benchmarks risk missing early signs of famine and delay recognizing mass starvation

Newly discovered bacterium converts carbon dioxide into chemicals using electricity

Flipping and reversing mini-proteins could improve disease treatment

Scientists reveal major hidden source of atmospheric nitrogen pollution in fragile lake basin

Biochar emerges as a powerful tool for soil carbon neutrality and climate mitigation

Tiny cell messengers show big promise for safer protein and gene delivery

AMS releases statement regarding the decision to rescind EPA’s 2009 Endangerment Finding

Parents’ alcohol and drug use influences their children’s consumption, research shows

Modular assembly of chiral nitrogen-bridged rings achieved by palladium-catalyzed diastereoselective and enantioselective cascade cyclization reactions

Promoting civic engagement

AMS Science Preview: Hurricane slowdown, school snow days

Deforestation in the Amazon raises the surface temperature by 3 °C during the dry season

Model more accurately maps the impact of frost on corn crops

How did humans develop sharp vision? Lab-grown retinas show likely answer

Sour grapes? Taste, experience of sour foods depends on individual consumer

At AAAS, professor Krystal Tsosie argues the future of science must be Indigenous-led

From the lab to the living room: Decoding Parkinson’s patients movements in the real world

Research advances in porous materials, as highlighted in the 2025 Nobel Prize in Chemistry

Sally C. Morton, executive vice president of ASU Knowledge Enterprise, presents a bold and practical framework for moving research from discovery to real-world impact

Biochemical parameters in patients with diabetic nephropathy versus individuals with diabetes alone, non-diabetic nephropathy, and healthy controls

[Press-News.org] Expert consensus outlines a standardized framework to evaluate clinical large language models
Consensus proposes retrospective workflows, metrics, multidisciplinary teams, design principles, feedback, and reporting standards for safe deployment of models in healthcare