PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Expert consensus outlines a standardized framework to evaluate clinical large language models

Consensus proposes retrospective workflows, metrics, multidisciplinary teams, design principles, feedback, and reporting standards for safe deployment of models in healthcare

2026-01-23
(Press-News.org)

A new expert consensus made available online on 10 October 2025 and published in Volume 5, Issue 4 of the journal Intelligent Medicine on 1 November 2025, sets out a structured framework to assess large language models (LLMs) before they are introduced into clinical workflows. The guidance responds to the rapid uptake of artificial intelligence (AI) tools for diagnostic support, medical documentation, and patient communication, and the corresponding need for consistent evaluation of safety, effectiveness, and fairness.

The consensus formalizes retrospective evaluation—testing fully trained models on real or simulated clinical data in specific care contexts, without further modifying the models—to verify performance, ethical compliance, and operational readiness prior to deployment.

Developed in line with World Health Organization guideline methods and registered on the Practice Guideline Registration for Transparency (PREPARE) platform (ID: PREPARE-2025CN503), the consensus draws on literature review, Delphi procedures, and multidisciplinary expert deliberation. In the final round, 35 experts achieved agreement on six recommendations.

What does the framework include?

Evaluation workflows prioritizing scientific rigor, objectivity, comprehensiveness, and ethics (e.g., double-blind procedures, conflict-of-interest transparency). Integrated metrics combining quantitative measures (accuracy, recall, F1-score; BLEU/ROUGE for generation) with structured qualitative ratings (e.g., mean opinion scores for accuracy, completeness, safety, practicality, professionalism). Multidisciplinary teams spanning clinicians, data and computer engineers, ethicists, legal experts, and statisticians, with standardized training and role definitions. Dataset design principles centered on clinical authenticity, broad representativeness across diseases, populations, and institutions, and fairness for vulnerable groups, with modular versioning and privacy/compliance safeguards. Feedback and versioning mechanisms to update standards as technology, regulations, or application scope evolve, including transparent dispute-resolution processes. Standardized reporting templates to improve transparency, reproducibility, and comparability across evaluations.

The consensus also defines six key LLM capability domains for assessment: medical knowledge question and answer; complex medical language understanding; diagnosis and treatment recommendation; medical documentation generation; multi-turn dialogue; and multimodal dialogue.

Emphasizing essential safeguards for patient data protection, bias mitigation, and the need for AI outputs to remain clinically explainable, the authors of the consensus are positioned to support the advancement of safer, more reliable, and ethically governed LLM applications within healthcare systems globally.

 

***

 

Reference
DOI: 10.1016/j.imed.2025.09.001

 

About the journal
Intelligent Medicine is a peer-reviewed, open-access journal focusing on the integration of artificial intelligence, data science, and digital technology in clinical medicine and public health. It is published by the Chinese Medical Association in partnership with Elsevier. To learn more about Intelligent Medicine, please visit https://www.sciencedirect.com/journal/intelligent-medicine


Funding information
The authors received no financial support for this research.

END



ELSE PRESS RELEASES FROM THIS DATE:

Bioengineered tissue as a revolutionary treatment for secondary lymphedema

2026-01-23
The rising incidence of cancer worldwide has led to an increasing number of surgeries that involve the removal of lymph nodes. Although these procedures play a major role in cancer staging and preventing the spread of malignancies, they sometimes come with severe long-term consequences. Since lymph nodes do not naturally regenerate once removed, their absence can lead to a condition known as secondary lymphedema. It manifests as chronic swelling, discomfort, and reduced mobility in affected limbs or regions, severely affecting a patient’s quality of life. Consequently, a major focus within the ...

Forty years of tracking trees reveals how global change is impacting Amazon and Andean Forest diversity

2026-01-23
New research published in Nature Ecology and Evolution reveals significant recent shifts in tree diversity among the tropical forests of the Andes and Amazon, driven by global change. The study, led by Dr Belen Fadrique from the University of Liverpool, uses 40 years of records on tree species collected by hundreds of international botanists and ecologists in long-term plots to offer comprehensive insights into tree diversity change in the world’s most diverse forests. Key Findings At the continental level, ...

Breathing disruptions during sleep widespread in newborns with severe spina bifida

2026-01-23
Children with spina bifida, a malformation of the spinal cord that can lead to mobility impairments and hydrocephalus — a buildup of fluid in the brain — face significant risk of cognitive difficulties throughout their lives. A new multi-center study led by researchers at Washington University School of Medicine in St. Louis and Michigan Medicine finds that breathing problems during sleep are a widespread but often undetected issue among these babies and raises the possibility that early treatment might significantly improve ...

Whales may divide resources to co-exist under pressures from climate change

2026-01-23
The North Atlantic Ocean is warming up. Higher temperatures and increased human activity in the region can trigger abrupt changes in marine ecosystems, for example how species are distributed and what they eat. In a long-term study published in Frontiers in Marine Science, researchers in Canada have examined the diet of three rorqual whale species and how these whales might have adapted their feeding habits as climate change and increasing human presence reshape the ecosystem of the Gulf of St Lawrence (GSL), a seasonally important feeding area for many whale species. “A ...

Why wetland restoration needs citizens on the ground

2026-01-23
Wetland restoration is expanding worldwide, but long-term success often remains uncertain. Most projects rely on short-term, expert monitoring that ends long before restored wetlands stabilize, leaving major gaps in understanding how restored wetlands actually evolve over time. One increasingly discussed way to close these gaps is to extend monitoring beyond professional teams by engaging local communities and citizens in long-term observation. In a Perspective published (DOI: 10.1016/j.ese.2026.100656) in Environmental Science and Ecotechnology in January 2026, researchers from Aarhus University ...

Sharktober: Study links October shark bite spike to tiger shark reproduction

2026-01-23
New University of Hawaiʻi research confirms that “Sharktober” is real, revealing a statistically significant spike in shark bite incidents in Hawaiian waters every October. The study, which analyzed 30 years of data (1995–2024), found that about 20% of all recorded bites occurred in that single month, a frequency far exceeding any other time of the year. Researchers at UH Mānoa’s Hawai‘i Institute of Marine Biology (HIMB) Shark Lab published their findings in Frontiers in Marine Science. The research, led by HIMB Professor Carl G. Meyer, determined ...

PPPL launches STELLAR-AI platform to accelerate fusion energy research

2026-01-23
A new computing platform that pairs artificial intelligence (AI) with high performance computing aims to end the bottleneck holding back fusion energy research by speeding the simulations needed to advance the field.  The project — known as the Simulation, Technology, and Experiment Leveraging Learning-Accelerated Research enabled by AI (STELLAR-AI ) — will be led by the U.S. Department of Energy’s (DOE) Princeton Plasma Physics Laboratory (PPPL). STELLAR-AI will expand far beyond the Lab’s walls, however, bringing together national laboratories, universities, technology companies and industry partners to build the computational foundation ...

Breakthrough in development of reliable satellite-based positioning for dense urban areas

2026-01-23
Global navigation satellite systems (GNSS) are vital for positioning autonomous vehicles, buses, drones, and outdoor robots. Yet its accuracy often degrades in dense urban areas due to signal blockage and reflections. Now, researchers have developed a GNSS-only method that delivers stable, accurate positioning without relying on fragile carrier-phase ambiguity resolution. Tested across six challenging urban scenarios, the approach consistently outperformed existing methods, enabling safer and more reliable autonomous navigation.   Accurately determining position is critical for the safety and reliability of autonomous vehicles and outdoor ...

DNA-templated method opens new frontiers in synthesizing amorphous silver nanostructures

2026-01-23
A research team from the Institute of Physics, Chinese Academy of Sciences has developed a novel DNA origami-based technique to synthesize stable, monolithic amorphous silver nanostructures under ambient conditions. By using DNA scaffold with fivefold rotational symmetry, the method introduces geometric frustration that effectively suppresses crystallization in metallic silver, a traditionally challenging feat due to the natural tendency of silver to form crystalline structures. Detailed characterization and molecular ...

Stress-testing AI vision systems: Rethinking how adversarial images are generated

2026-01-23
Deep neural networks (DNNs) have become a cornerstone of modern AI technology, driving a thriving field of research in image-related tasks. These systems have found applications in medical diagnosis, automated data processing, computer vision, and various forms of industrial automation, to name a few. As our reliance on AI models grows, so does our need to test them thoroughly using adversarial examples. Simply put, adversarial examples are images that have been strategically modified with noise to trick an AI into making a mistake. Understanding adversarial image generation techniques is essential for identifying vulnerabilities in DNNs and for developing more secure, reliable systems. Despite ...

LAST 30 PRESS RELEASES:

Massage Therapy Foundation awards $300,000 research grant to the University of Denver

Gastrointestinal toxicity linked to targeted cancer therapies in the United States

Countdown to the Bial Award in Biomedicine 2025

Blood marker from dementia research could help track aging across the animal world

Birds change altitude to survive epic journeys across deserts and seas

Here's why you need a backup for the map on your phone

ACS Central Science | Researchers from Insilico Medicine and Lilly publish foundational vision for fully autonomous “Prompt-to-Drug” pharmaceutical R&D

Increasing the number of coronary interventions in patients with acute myocardial infarction does not appear to reduce death rates

Tackling uplift resistance in tall infrastructures sustainably

Novel wireless origami-inspired smart cushioning device for safer logistics

Hidden genetic mismatch, which triples the risk of a life-threatening immune attack after cord blood transplantation

Physical function is a crucial predictor of survival after heart failure

Striking genomic architecture discovered in embryonic reproductive cells before they start developing into sperm and eggs

Screening improves early detection of colorectal cancer

New data on spontaneous coronary artery dissection (SCAD) – a common cause of heart attacks in younger women

How root growth is stimulated by nitrate: Researchers decipher signalling chain

Scientists reveal our best- and worst-case scenarios for a warming Antarctica

Cleaner fish show intelligence typical of mammals

AABNet and partners launch landmark guide on the conservation of African livestock genetic resources and sustainable breeding strategies

Produce hydrogen and oxygen simultaneously from a single atom! Achieve carbon neutrality with an 'All-in-one' single-atom water electrolysis catalyst

Sleep loss linked to higher atrial fibrillation risk in working-age adults

Visible light-driven deracemization of α-aryl ketones synergistically catalyzed by thiophenols and chiral phosphoric acid

Most AI bots lack basic safety disclosures, study finds

How competitive gaming on discord fosters social connections

CU Anschutz School of Medicine receives best ranking in NIH funding in 20 years

Mayo Clinic opens patient information office in Cayman Islands

Phonon lasers unlock ultrabroadband acoustic frequency combs

Babies with an increased likelihood of autism may struggle to settle into deep, restorative sleep, according to a new study from the University of East Anglia.

National Reactor Innovation Center opens Molten Salt Thermophysical Examination Capability at INL

International Progressive MS Alliance awards €6.9 million to three studies researching therapies to address common symptoms of progressive MS

[Press-News.org] Expert consensus outlines a standardized framework to evaluate clinical large language models
Consensus proposes retrospective workflows, metrics, multidisciplinary teams, design principles, feedback, and reporting standards for safe deployment of models in healthcare