(Press-News.org)
A new expert consensus made available online on 10 October 2025 and published in Volume 5, Issue 4 of the journal Intelligent Medicine on 1 November 2025, sets out a structured framework to assess large language models (LLMs) before they are introduced into clinical workflows. The guidance responds to the rapid uptake of artificial intelligence (AI) tools for diagnostic support, medical documentation, and patient communication, and the corresponding need for consistent evaluation of safety, effectiveness, and fairness.
The consensus formalizes retrospective evaluation—testing fully trained models on real or simulated clinical data in specific care contexts, without further modifying the models—to verify performance, ethical compliance, and operational readiness prior to deployment.
Developed in line with World Health Organization guideline methods and registered on the Practice Guideline Registration for Transparency (PREPARE) platform (ID: PREPARE-2025CN503), the consensus draws on literature review, Delphi procedures, and multidisciplinary expert deliberation. In the final round, 35 experts achieved agreement on six recommendations.
What does the framework include?
Evaluation workflows prioritizing scientific rigor, objectivity, comprehensiveness, and ethics (e.g., double-blind procedures, conflict-of-interest transparency).
Integrated metrics combining quantitative measures (accuracy, recall, F1-score; BLEU/ROUGE for generation) with structured qualitative ratings (e.g., mean opinion scores for accuracy, completeness, safety, practicality, professionalism).
Multidisciplinary teams spanning clinicians, data and computer engineers, ethicists, legal experts, and statisticians, with standardized training and role definitions.
Dataset design principles centered on clinical authenticity, broad representativeness across diseases, populations, and institutions, and fairness for vulnerable groups, with modular versioning and privacy/compliance safeguards.
Feedback and versioning mechanisms to update standards as technology, regulations, or application scope evolve, including transparent dispute-resolution processes.
Standardized reporting templates to improve transparency, reproducibility, and comparability across evaluations.
The consensus also defines six key LLM capability domains for assessment: medical knowledge question and answer; complex medical language understanding; diagnosis and treatment recommendation; medical documentation generation; multi-turn dialogue; and multimodal dialogue.
Emphasizing essential safeguards for patient data protection, bias mitigation, and the need for AI outputs to remain clinically explainable, the authors of the consensus are positioned to support the advancement of safer, more reliable, and ethically governed LLM applications within healthcare systems globally.
***
Reference
DOI: 10.1016/j.imed.2025.09.001
About the journal
Intelligent Medicine is a peer-reviewed, open-access journal focusing on the integration of artificial intelligence, data science, and digital technology in clinical medicine and public health. It is published by the Chinese Medical Association in partnership with Elsevier. To learn more about Intelligent Medicine, please visit https://www.sciencedirect.com/journal/intelligent-medicine
Funding information
The authors received no financial support for this research.
END
The rising incidence of cancer worldwide has led to an increasing number of surgeries that involve the removal of lymph nodes. Although these procedures play a major role in cancer staging and preventing the spread of malignancies, they sometimes come with severe long-term consequences. Since lymph nodes do not naturally regenerate once removed, their absence can lead to a condition known as secondary lymphedema. It manifests as chronic swelling, discomfort, and reduced mobility in affected limbs or regions, severely affecting a patient’s quality of life.
Consequently, a major focus within the ...
New research published in Nature Ecology and Evolution reveals significant recent shifts in tree diversity among the tropical forests of the Andes and Amazon, driven by global change.
The study, led by Dr Belen Fadrique from the University of Liverpool, uses 40 years of records on tree species collected by hundreds of international botanists and ecologists in long-term plots to offer comprehensive insights into tree diversity change in the world’s most diverse forests.
Key Findings
At the continental level, ...
Children with spina bifida, a malformation of the spinal cord that can lead to mobility impairments and hydrocephalus — a buildup of fluid in the brain — face significant risk of cognitive difficulties throughout their lives. A new multi-center study led by researchers at Washington University School of Medicine in St. Louis and Michigan Medicine finds that breathing problems during sleep are a widespread but often undetected issue among these babies and raises the possibility that early treatment might significantly improve ...
The North Atlantic Ocean is warming up. Higher temperatures and increased human activity in the region can trigger abrupt changes in marine ecosystems, for example how species are distributed and what they eat.
In a long-term study published in Frontiers in Marine Science, researchers in Canada have examined the diet of three rorqual whale species and how these whales might have adapted their feeding habits as climate change and increasing human presence reshape the ecosystem of the Gulf of St Lawrence (GSL), a seasonally important feeding area for many whale species.
“A ...
Wetland restoration is expanding worldwide, but long-term success often remains uncertain. Most projects rely on short-term, expert monitoring that ends long before restored wetlands stabilize, leaving major gaps in understanding how restored wetlands actually evolve over time. One increasingly discussed way to close these gaps is to extend monitoring beyond professional teams by engaging local communities and citizens in long-term observation.
In a Perspective published (DOI: 10.1016/j.ese.2026.100656) in Environmental Science and Ecotechnology in January 2026, researchers from Aarhus University ...
New University of Hawaiʻi research confirms that “Sharktober” is real, revealing a statistically significant spike in shark bite incidents in Hawaiian waters every October. The study, which analyzed 30 years of data (1995–2024), found that about 20% of all recorded bites occurred in that single month, a frequency far exceeding any other time of the year. Researchers at UH Mānoa’s Hawai‘i Institute of Marine Biology (HIMB) Shark Lab published their findings in Frontiers in Marine Science.
The research, led by HIMB Professor Carl G. Meyer, determined ...
A new computing platform that pairs artificial intelligence (AI) with high performance computing aims to end the bottleneck holding back fusion energy research by speeding the simulations needed to advance the field.
The project — known as the Simulation, Technology, and Experiment Leveraging Learning-Accelerated Research enabled by AI (STELLAR-AI ) — will be led by the U.S. Department of Energy’s (DOE) Princeton Plasma Physics Laboratory (PPPL). STELLAR-AI will expand far beyond the Lab’s walls, however, bringing together national laboratories, universities, technology companies and industry partners to build the computational foundation ...
Global navigation satellite systems (GNSS) are vital for positioning autonomous vehicles, buses, drones, and outdoor robots. Yet its accuracy often degrades in dense urban areas due to signal blockage and reflections. Now, researchers have developed a GNSS-only method that delivers stable, accurate positioning without relying on fragile carrier-phase ambiguity resolution. Tested across six challenging urban scenarios, the approach consistently outperformed existing methods, enabling safer and more reliable autonomous navigation.
Accurately determining position is critical for the safety and reliability of autonomous vehicles and outdoor ...
A research team from the Institute of Physics, Chinese Academy of Sciences has developed a novel DNA origami-based technique to synthesize stable, monolithic amorphous silver nanostructures under ambient conditions. By using DNA scaffold with fivefold rotational symmetry, the method introduces geometric frustration that effectively suppresses crystallization in metallic silver, a traditionally challenging feat due to the natural tendency of silver to form crystalline structures. Detailed characterization and molecular ...
Deep neural networks (DNNs) have become a cornerstone of modern AI technology, driving a thriving field of research in image-related tasks. These systems have found applications in medical diagnosis, automated data processing, computer vision, and various forms of industrial automation, to name a few. As our reliance on AI models grows, so does our need to test them thoroughly using adversarial examples. Simply put, adversarial examples are images that have been strategically modified with noise to trick an AI into making a mistake. Understanding adversarial image generation techniques is essential for identifying vulnerabilities in DNNs and for developing more secure, reliable systems.
Despite ...