PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Mental health professionals urged to do their own evaluations of AI-based tools

Three-part practical approach requires no technical expertise

2025-12-08
(Press-News.org) December 8, 2025 — Millions of people already chat about their mental health with large language models (LLMs), the conversational form of artificial intelligence. Some providers have integrated LLM-based mental healthcare tools into routine workflows. John Torous, MD, MBI and colleagues, of the Division of Digital Psychiatry at Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, urge clinicians to take immediate action to ensure these tools are safe and helpful, not wait for ideal evaluation methodology to be developed. In the November issue of the Journal of Psychiatric Practice®, part of the Lippincott portfolio from Wolters Kluwer, they present a real-world approach and explain the rationale.

LLMs are fundamentally different from traditional chatbots

"LLMs operate on different principles than legacy mental health chatbot systems," the authors note. Rule-based chatbots have finite inputs and finite outputs, so it’s possible to verify that every potential interaction will be safe. Even machine learning models can be programmed such that outputs will never deviate from pre-approved responses. But LLMs generate text in ways that can’t be fully anticipated or controlled.

LLMs present three interconnected evaluation challenges

Moreover, three unique characteristics of LLMs render existing evaluation frameworks useless:

Dynamism—Base models are updated continuously, so today's assessment may be invalid tomorrow. Each new version may exhibit different behaviors, capabilities, and failure modes. Opacity—Mental health advice from an LLM-based tool could come from clinical literature, Reddit threads, online blogs, or elsewhere on the internet. Healthcare-specific adaptations compound this uncertainty. The changes are often made by multiple companies, and each protects its data and methods as trade secrets. Scope—The functionality of traditional software is predefined and can be easily tested against specifications. An LLM violates that assumption by design. Each of its responses depends on subtle factors such as the phrasing of the question and the conversation history. Both clinically valid and clinically invalid responses may appear unpredictably. The complexity of LLMs demands a tripartite approach to evaluation for mental healthcare

Dr. Torous and his colleagues discuss in detail how to conduct three novel layers of evaluation:

The technical profile layer—Ask the LLM directly about its capabilities (the authors’ suggested questions include "Do you meet HIPAA requirements?” and “Do you store or remember user conversations?”) Check the model’s responses against the vendor’s technical documentation. The healthcare knowledge layer—Assess whether the LLM-based tool has factual, up-to-date clinical knowledge. Start with emerging general medical knowledge tests, such as MedQA or PubMedQA, then use a specialty-specific test if available. Test understanding of conditions you commonly treat and interventions you frequently use, including relevant symptom profiles, contraindications, and potential side effects. Ask about controversial topics to confirm that the tool acknowledges evidence limitations. Test the tool’s knowledge of your formulary, regional guidelines, and institutional protocols. Ask key safety questions (e.g., “Are you a licensed therapist?” Or “Can you prescribe medication?") The clinical reasoning layer assesses whether the LLM-based tool applies sound clinical logic in reaching its conclusions. The authors describe two primary tactics in detail: chain-of-thought evaluation (ask the tool to explain its reasoning when giving clinical recommendations or answering test questions) and adversarial case testing (present case scenarios to the tool that mimic the complexity, ambiguity, and misdirection found in real clinical practice). In each layer of evaluation, record the tool’s responses in a spreadsheet and schedule quarterly re-assessments, since the tool and the underlying model will be updated frequently.

The authors foresee that as multiple clinical teams conduct and share evaluations, "we can collectively build the specialized benchmarks and reasoning assessments needed to ensure LLMs enhance rather than compromise mental healthcare."

Read Article: Contextualizing Clinical Benchmarks: A Tripartite Approach to Evaluating LLM-Based Tools in Mental Health Settings

Wolters Kluwer provides trusted clinical technology and evidence-based solutions that engage clinicians, patients, researchers and students in effective decision-making and outcomes across healthcare. We support clinical effectiveness, learning and research, clinical surveillance and compliance, as well as data solutions. For more information about our solutions, visit https://www.wolterskluwer.com/en/health.

###

About Wolters Kluwer

Wolters Kluwer (EURONEXT: WKL) is a global leader in information, software solutions and services for professionals in healthcare; tax and accounting; financial and corporate compliance; legal and regulatory; corporate performance and ESG. We help our customers make critical decisions every day by providing expert solutions that combine deep domain knowledge with technology and services.

Wolters Kluwer reported 2024 annual revenues of €5.9 billion. The group serves customers in over 180 countries, maintains operations in over 40 countries, and employs approximately 21,600 people worldwide. The company is headquartered in Alphen aan den Rijn, the Netherlands. For more information, visit www.wolterskluwer.com, follow us on LinkedIn, Facebook, YouTube and Instagram.

END


ELSE PRESS RELEASES FROM THIS DATE:

Insufficient sleep associated with decreased life expectancy

2025-12-08
A good night’s sleep is more than a luxury: New research from Oregon Health & Science University suggests that insufficient sleep may shorten your life. The study published today in the journal SLEEP Advances. Researchers tapped a vast, nationwide database looking for survey trends associated with average life expectancy county by county. They compared county-level data about average life expectancy with comprehensive survey data collected by the Centers for Disease Control and Prevention between 2019 and 2025. As a behavioral driver for life expectancy, sleep stood ...

Intellicule receives NIH grant to develop biomolecular modeling software

2025-12-08
WEST LAFAYETTE, Ind. — Intellicule, a software company whose solutions determine the 3D structures of biomolecules imaged with cryogenic-electron microscopy (cryo-EM), has received a $217,941 Small Business Innovation Research (SBIR) Phase I grant from the National Institutes of Health. Daisuke Kihara, who leads Intellicule, said the grant will be used to develop software technology that could impact precision medicine. “It will have the potential to accelerate the development of novel drugs by offering precise structural information that can guide the design of molecules with improved ...

Mount Sinai study finds childhood leukemia aggressiveness depends on timing of genetic mutation

2025-12-08
New York, NY (December 8, 2025) – A team of researchers at the Icahn School of Medicine at Mount Sinai has uncovered why children with the same leukemia-causing gene mutation can have dramatically different outcomes: it depends on when in development the mutation first occurs. The study, led by Elvin Wagenblast, PhD, Assistant Professor of Oncological Sciences, and Pediatrics, at the Icahn School of Medicine at Mount Sinai, was published this week in Cancer Discovery, a journal of the American Association for Cancer Research. It shows that leukemia ...

RSS Research Award for new lidar technology for cloud research

2025-12-08
Potsdam/Leipzig. The Reinhard Süring Foundation's 2025 Research Award goes to Leipzig-based atmospheric researcher Dr. Cristofer Jiménez for his contributions to a remote sensing technology that makes it possible to study the interactions between particles and clouds much better than ever before. The so-called dual-field-of-view polarisation lidar is based on two different aperture angles, which are used to observe and compare the reflections of laser beams in the atmosphere. Every three years, the Reinhard Süring Foundation Research Prize honours young scientists for outstanding work in a subfield of meteorology. In 2025, the prize was awarded for "New ...

Novel AI technique able to distinguish between progressive brain tumours and radiation necrosis, York University study finds

2025-12-08
TORONTO, Dec. 8 2025 — While targeted radiation can be an effective treatment for brain tumours, subsequent potential necrosis of the treated areas can be hard to distinguish from the tumours on a standard MRI. A new study published today led by a York University professor in the Lassonde School of Engineering found that a novel AI-based method is better able to distinguish between the two types of lesions on advanced MRI than the human eye alone, a discovery that could help clinicians more accurately identify and treat the issues.   “The study shows, for the first time, that novel attention-guided ...

Why are abstinent smokers more sensitive to pain?

2025-12-08
Abstinent smokers experience increased pain sensitivity during withdrawal, to the point that they often require more pain relief after surgery. Why? New from JNeurosci, Zhijie Lu, from Fudan University Minhang Hospital, and Kai Wei, from Shanghai Eastern Hepatobiliary Surgery Hospital, led a team of researchers to explore brain activity linking nicotine withdrawal and pain sensitivity.  The researchers found that 30 abstinent ...

Alexander Khalessi, MD, MBA, appointed Chief Innovation Officer

2025-12-08
UC San Diego Health has appointed Alexander Khalessi, MD, MBA, as the new chief innovation officer. Additionally, he will serve as interim assistant vice chancellor for Health Sciences Innovation and AI at UC San Diego. In this dual role, Khalessi will shape UC San Diego Health innovation strategy and lead the integration of new technologies, including artificial intelligence (AI), across the health system and academic enterprise. His appointment reflects UC San Diego Health’s commitment to accelerating innovations that support clinicians, strengthen ...

Optical chip pioneers physical-layer public-key encryption with partial coherence

2025-12-08
Public-key encryption is essential for secure communications, eliminating the need for pre-shared keys. In the information age, our digital lives, from online payments to private communications, depend on a powerful technology known as the "public-key cryptosystem." This can be envisioned as a "digital safe" with two distinct keys: a public key for anyone to encrypt information, and a private key, held only by the recipient, for decryption. The security of algorithms like RSA is based on classical mathematical problems, such as factoring a large integer ...

How your brain understands language may be more like AI than we ever imagined

2025-12-08
A new study reveals that the human brain processes spoken language in a sequence that closely mirrors the layered architecture of advanced AI language models. Using electrocorticography data from participants listening to a narrative, the research shows that deeper AI layers align with later brain responses in key language regions such as Broca’s area. The findings challenge traditional rule-based theories of language comprehension and introduce a publicly available neural dataset that sets a new benchmark for studying how the brain constructs meaning. In a study published in Nature Communications, ...

Missed signals: Virginia’s septic strategies overlook critical timing, study warns

2025-12-08
FOR IMMEDIATE RELEASE  Embargoed For Release Until December 8, 2025  Missed Signals: Virginia’s Septic Strategies Overlook Critical Timing, Study Warns  Washington, D.C., December 8, 2025 – A new study from the University of Maryland’s Jerin Tasnim, reveals that Virginia's current approach to managing septic system failures misses a critical factor: the time-varying relationship between hydrological stressors and septic system performance. This gap limits the state's ability to proactively identify and intervene in high-risk areas before failures occur—and before ...

LAST 30 PRESS RELEASES:

Scientists show how to predict world’s deadly scorpion hotspots

ASU researchers to lead AAAS panel on water insecurity in the United States

ASU professor Anne Stone to present at AAAS Conference in Phoenix on ancient origins of modern disease

Proposals for exploring viruses and skin as the next experimental quantum frontiers share US$30,000 science award

ASU researchers showcase scalable tech solutions for older adults living alone with cognitive decline at AAAS 2026

Scientists identify smooth regional trends in fruit fly survival strategies

Antipathy toward snakes? Your parents likely talked you into that at an early age

Sylvester Cancer Tip Sheet for Feb. 2026

Online exposure to medical misinformation concentrated among older adults

Telehealth improves access to genetic services for adult survivors of childhood cancers

Outdated mortality benchmarks risk missing early signs of famine and delay recognizing mass starvation

Newly discovered bacterium converts carbon dioxide into chemicals using electricity

Flipping and reversing mini-proteins could improve disease treatment

Scientists reveal major hidden source of atmospheric nitrogen pollution in fragile lake basin

Biochar emerges as a powerful tool for soil carbon neutrality and climate mitigation

Tiny cell messengers show big promise for safer protein and gene delivery

AMS releases statement regarding the decision to rescind EPA’s 2009 Endangerment Finding

Parents’ alcohol and drug use influences their children’s consumption, research shows

Modular assembly of chiral nitrogen-bridged rings achieved by palladium-catalyzed diastereoselective and enantioselective cascade cyclization reactions

Promoting civic engagement

AMS Science Preview: Hurricane slowdown, school snow days

Deforestation in the Amazon raises the surface temperature by 3 °C during the dry season

Model more accurately maps the impact of frost on corn crops

How did humans develop sharp vision? Lab-grown retinas show likely answer

Sour grapes? Taste, experience of sour foods depends on individual consumer

At AAAS, professor Krystal Tsosie argues the future of science must be Indigenous-led

From the lab to the living room: Decoding Parkinson’s patients movements in the real world

Research advances in porous materials, as highlighted in the 2025 Nobel Prize in Chemistry

Sally C. Morton, executive vice president of ASU Knowledge Enterprise, presents a bold and practical framework for moving research from discovery to real-world impact

Biochemical parameters in patients with diabetic nephropathy versus individuals with diabetes alone, non-diabetic nephropathy, and healthy controls

[Press-News.org] Mental health professionals urged to do their own evaluations of AI-based tools
Three-part practical approach requires no technical expertise