PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Expert consensus outlines a standardized framework to evaluate clinical large language models

Consensus proposes retrospective workflows, metrics, multidisciplinary teams, design principles, feedback, and reporting standards for safe deployment of models in healthcare

2026-01-23
(Press-News.org)

A new expert consensus made available online on 10 October 2025 and published in Volume 5, Issue 4 of the journal Intelligent Medicine on 1 November 2025, sets out a structured framework to assess large language models (LLMs) before they are introduced into clinical workflows. The guidance responds to the rapid uptake of artificial intelligence (AI) tools for diagnostic support, medical documentation, and patient communication, and the corresponding need for consistent evaluation of safety, effectiveness, and fairness.

The consensus formalizes retrospective evaluation—testing fully trained models on real or simulated clinical data in specific care contexts, without further modifying the models—to verify performance, ethical compliance, and operational readiness prior to deployment.

Developed in line with World Health Organization guideline methods and registered on the Practice Guideline Registration for Transparency (PREPARE) platform (ID: PREPARE-2025CN503), the consensus draws on literature review, Delphi procedures, and multidisciplinary expert deliberation. In the final round, 35 experts achieved agreement on six recommendations.

What does the framework include?

Evaluation workflows prioritizing scientific rigor, objectivity, comprehensiveness, and ethics (e.g., double-blind procedures, conflict-of-interest transparency). Integrated metrics combining quantitative measures (accuracy, recall, F1-score; BLEU/ROUGE for generation) with structured qualitative ratings (e.g., mean opinion scores for accuracy, completeness, safety, practicality, professionalism). Multidisciplinary teams spanning clinicians, data and computer engineers, ethicists, legal experts, and statisticians, with standardized training and role definitions. Dataset design principles centered on clinical authenticity, broad representativeness across diseases, populations, and institutions, and fairness for vulnerable groups, with modular versioning and privacy/compliance safeguards. Feedback and versioning mechanisms to update standards as technology, regulations, or application scope evolve, including transparent dispute-resolution processes. Standardized reporting templates to improve transparency, reproducibility, and comparability across evaluations.

The consensus also defines six key LLM capability domains for assessment: medical knowledge question and answer; complex medical language understanding; diagnosis and treatment recommendation; medical documentation generation; multi-turn dialogue; and multimodal dialogue.

Emphasizing essential safeguards for patient data protection, bias mitigation, and the need for AI outputs to remain clinically explainable, the authors of the consensus are positioned to support the advancement of safer, more reliable, and ethically governed LLM applications within healthcare systems globally.

 

***

 

Reference
DOI: 10.1016/j.imed.2025.09.001

 

About the journal
Intelligent Medicine is a peer-reviewed, open-access journal focusing on the integration of artificial intelligence, data science, and digital technology in clinical medicine and public health. It is published by the Chinese Medical Association in partnership with Elsevier. To learn more about Intelligent Medicine, please visit https://www.sciencedirect.com/journal/intelligent-medicine


Funding information
The authors received no financial support for this research.

END



ELSE PRESS RELEASES FROM THIS DATE:

Bioengineered tissue as a revolutionary treatment for secondary lymphedema

2026-01-23
The rising incidence of cancer worldwide has led to an increasing number of surgeries that involve the removal of lymph nodes. Although these procedures play a major role in cancer staging and preventing the spread of malignancies, they sometimes come with severe long-term consequences. Since lymph nodes do not naturally regenerate once removed, their absence can lead to a condition known as secondary lymphedema. It manifests as chronic swelling, discomfort, and reduced mobility in affected limbs or regions, severely affecting a patient’s quality of life. Consequently, a major focus within the ...

Forty years of tracking trees reveals how global change is impacting Amazon and Andean Forest diversity

2026-01-23
New research published in Nature Ecology and Evolution reveals significant recent shifts in tree diversity among the tropical forests of the Andes and Amazon, driven by global change. The study, led by Dr Belen Fadrique from the University of Liverpool, uses 40 years of records on tree species collected by hundreds of international botanists and ecologists in long-term plots to offer comprehensive insights into tree diversity change in the world’s most diverse forests. Key Findings At the continental level, ...

Breathing disruptions during sleep widespread in newborns with severe spina bifida

2026-01-23
Children with spina bifida, a malformation of the spinal cord that can lead to mobility impairments and hydrocephalus — a buildup of fluid in the brain — face significant risk of cognitive difficulties throughout their lives. A new multi-center study led by researchers at Washington University School of Medicine in St. Louis and Michigan Medicine finds that breathing problems during sleep are a widespread but often undetected issue among these babies and raises the possibility that early treatment might significantly improve ...

Whales may divide resources to co-exist under pressures from climate change

2026-01-23
The North Atlantic Ocean is warming up. Higher temperatures and increased human activity in the region can trigger abrupt changes in marine ecosystems, for example how species are distributed and what they eat. In a long-term study published in Frontiers in Marine Science, researchers in Canada have examined the diet of three rorqual whale species and how these whales might have adapted their feeding habits as climate change and increasing human presence reshape the ecosystem of the Gulf of St Lawrence (GSL), a seasonally important feeding area for many whale species. “A ...

Why wetland restoration needs citizens on the ground

2026-01-23
Wetland restoration is expanding worldwide, but long-term success often remains uncertain. Most projects rely on short-term, expert monitoring that ends long before restored wetlands stabilize, leaving major gaps in understanding how restored wetlands actually evolve over time. One increasingly discussed way to close these gaps is to extend monitoring beyond professional teams by engaging local communities and citizens in long-term observation. In a Perspective published (DOI: 10.1016/j.ese.2026.100656) in Environmental Science and Ecotechnology in January 2026, researchers from Aarhus University ...

Sharktober: Study links October shark bite spike to tiger shark reproduction

2026-01-23
New University of Hawaiʻi research confirms that “Sharktober” is real, revealing a statistically significant spike in shark bite incidents in Hawaiian waters every October. The study, which analyzed 30 years of data (1995–2024), found that about 20% of all recorded bites occurred in that single month, a frequency far exceeding any other time of the year. Researchers at UH Mānoa’s Hawai‘i Institute of Marine Biology (HIMB) Shark Lab published their findings in Frontiers in Marine Science. The research, led by HIMB Professor Carl G. Meyer, determined ...

PPPL launches STELLAR-AI platform to accelerate fusion energy research

2026-01-23
A new computing platform that pairs artificial intelligence (AI) with high performance computing aims to end the bottleneck holding back fusion energy research by speeding the simulations needed to advance the field.  The project — known as the Simulation, Technology, and Experiment Leveraging Learning-Accelerated Research enabled by AI (STELLAR-AI ) — will be led by the U.S. Department of Energy’s (DOE) Princeton Plasma Physics Laboratory (PPPL). STELLAR-AI will expand far beyond the Lab’s walls, however, bringing together national laboratories, universities, technology companies and industry partners to build the computational foundation ...

Breakthrough in development of reliable satellite-based positioning for dense urban areas

2026-01-23
Global navigation satellite systems (GNSS) are vital for positioning autonomous vehicles, buses, drones, and outdoor robots. Yet its accuracy often degrades in dense urban areas due to signal blockage and reflections. Now, researchers have developed a GNSS-only method that delivers stable, accurate positioning without relying on fragile carrier-phase ambiguity resolution. Tested across six challenging urban scenarios, the approach consistently outperformed existing methods, enabling safer and more reliable autonomous navigation.   Accurately determining position is critical for the safety and reliability of autonomous vehicles and outdoor ...

DNA-templated method opens new frontiers in synthesizing amorphous silver nanostructures

2026-01-23
A research team from the Institute of Physics, Chinese Academy of Sciences has developed a novel DNA origami-based technique to synthesize stable, monolithic amorphous silver nanostructures under ambient conditions. By using DNA scaffold with fivefold rotational symmetry, the method introduces geometric frustration that effectively suppresses crystallization in metallic silver, a traditionally challenging feat due to the natural tendency of silver to form crystalline structures. Detailed characterization and molecular ...

Stress-testing AI vision systems: Rethinking how adversarial images are generated

2026-01-23
Deep neural networks (DNNs) have become a cornerstone of modern AI technology, driving a thriving field of research in image-related tasks. These systems have found applications in medical diagnosis, automated data processing, computer vision, and various forms of industrial automation, to name a few. As our reliance on AI models grows, so does our need to test them thoroughly using adversarial examples. Simply put, adversarial examples are images that have been strategically modified with noise to trick an AI into making a mistake. Understanding adversarial image generation techniques is essential for identifying vulnerabilities in DNNs and for developing more secure, reliable systems. Despite ...

LAST 30 PRESS RELEASES:

Insilico Medicine receives IND approval from FDA for ISM8969, an AI-empowered potential best-in-class NLRP3 inhibitor

Combined aerobic-resistance exercise: Dual efficacy and efficiency for hepatic steatosis

Expert consensus outlines a standardized framework to evaluate clinical large language models

Bioengineered tissue as a revolutionary treatment for secondary lymphedema

Forty years of tracking trees reveals how global change is impacting Amazon and Andean Forest diversity

Breathing disruptions during sleep widespread in newborns with severe spina bifida

Whales may divide resources to co-exist under pressures from climate change

Why wetland restoration needs citizens on the ground

Sharktober: Study links October shark bite spike to tiger shark reproduction

PPPL launches STELLAR-AI platform to accelerate fusion energy research

Breakthrough in development of reliable satellite-based positioning for dense urban areas

DNA-templated method opens new frontiers in synthesizing amorphous silver nanostructures

Stress-testing AI vision systems: Rethinking how adversarial images are generated

Why a crowded office can be the loneliest place on earth

Choosing the right biochar can lock toxic cadmium in soil, study finds

Desperate race to resurrect newly-named zombie tree

New study links combination of hormone therapy and tirzepatide to greater weight loss after menopause

How molecules move in extreme water environments depends on their shape

Early-life exposure to a common pollutant harms fish development across generations

How is your corn growing? Aerial surveillance provides answers

Center for BrainHealth launches Fourth Annual BrainHealth Week in 2026

Why some messages are more convincing than others

National Foundation for Cancer Research CEO Sujuan Ba Named One of OncoDaily’s 100 Most Influential Oncology CEOs of 2025

New analysis disputes historic earthquake, tsunami and death toll on Greek island

Drexel study finds early intervention helps most autistic children acquire spoken language

Study finds Alzheimer's disease can be evaluated with brain stimulation

Cells that are not our own may unlock secrets about our health

Caring Cross and Boston Children’s Hospital collaborate to expand access to gene therapy for sickle cell disease and beta thalassemia

Mount Sinai review maps the path forward for cancer vaccines, highlighting promise of personalized and combination approaches

Illinois study: How a potential antibiotics ban could affect apple growers

[Press-News.org] Expert consensus outlines a standardized framework to evaluate clinical large language models
Consensus proposes retrospective workflows, metrics, multidisciplinary teams, design principles, feedback, and reporting standards for safe deployment of models in healthcare