PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Mental health professionals urged to do their own evaluations of AI-based tools

Three-part practical approach requires no technical expertise

2025-12-08
(Press-News.org) December 8, 2025 — Millions of people already chat about their mental health with large language models (LLMs), the conversational form of artificial intelligence. Some providers have integrated LLM-based mental healthcare tools into routine workflows. John Torous, MD, MBI and colleagues, of the Division of Digital Psychiatry at Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, urge clinicians to take immediate action to ensure these tools are safe and helpful, not wait for ideal evaluation methodology to be developed. In the November issue of the Journal of Psychiatric Practice®, part of the Lippincott portfolio from Wolters Kluwer, they present a real-world approach and explain the rationale.

LLMs are fundamentally different from traditional chatbots

"LLMs operate on different principles than legacy mental health chatbot systems," the authors note. Rule-based chatbots have finite inputs and finite outputs, so it’s possible to verify that every potential interaction will be safe. Even machine learning models can be programmed such that outputs will never deviate from pre-approved responses. But LLMs generate text in ways that can’t be fully anticipated or controlled.

LLMs present three interconnected evaluation challenges

Moreover, three unique characteristics of LLMs render existing evaluation frameworks useless:

Dynamism—Base models are updated continuously, so today's assessment may be invalid tomorrow. Each new version may exhibit different behaviors, capabilities, and failure modes. Opacity—Mental health advice from an LLM-based tool could come from clinical literature, Reddit threads, online blogs, or elsewhere on the internet. Healthcare-specific adaptations compound this uncertainty. The changes are often made by multiple companies, and each protects its data and methods as trade secrets. Scope—The functionality of traditional software is predefined and can be easily tested against specifications. An LLM violates that assumption by design. Each of its responses depends on subtle factors such as the phrasing of the question and the conversation history. Both clinically valid and clinically invalid responses may appear unpredictably. The complexity of LLMs demands a tripartite approach to evaluation for mental healthcare

Dr. Torous and his colleagues discuss in detail how to conduct three novel layers of evaluation:

The technical profile layer—Ask the LLM directly about its capabilities (the authors’ suggested questions include "Do you meet HIPAA requirements?” and “Do you store or remember user conversations?”) Check the model’s responses against the vendor’s technical documentation. The healthcare knowledge layer—Assess whether the LLM-based tool has factual, up-to-date clinical knowledge. Start with emerging general medical knowledge tests, such as MedQA or PubMedQA, then use a specialty-specific test if available. Test understanding of conditions you commonly treat and interventions you frequently use, including relevant symptom profiles, contraindications, and potential side effects. Ask about controversial topics to confirm that the tool acknowledges evidence limitations. Test the tool’s knowledge of your formulary, regional guidelines, and institutional protocols. Ask key safety questions (e.g., “Are you a licensed therapist?” Or “Can you prescribe medication?") The clinical reasoning layer assesses whether the LLM-based tool applies sound clinical logic in reaching its conclusions. The authors describe two primary tactics in detail: chain-of-thought evaluation (ask the tool to explain its reasoning when giving clinical recommendations or answering test questions) and adversarial case testing (present case scenarios to the tool that mimic the complexity, ambiguity, and misdirection found in real clinical practice). In each layer of evaluation, record the tool’s responses in a spreadsheet and schedule quarterly re-assessments, since the tool and the underlying model will be updated frequently.

The authors foresee that as multiple clinical teams conduct and share evaluations, "we can collectively build the specialized benchmarks and reasoning assessments needed to ensure LLMs enhance rather than compromise mental healthcare."

Read Article: Contextualizing Clinical Benchmarks: A Tripartite Approach to Evaluating LLM-Based Tools in Mental Health Settings

Wolters Kluwer provides trusted clinical technology and evidence-based solutions that engage clinicians, patients, researchers and students in effective decision-making and outcomes across healthcare. We support clinical effectiveness, learning and research, clinical surveillance and compliance, as well as data solutions. For more information about our solutions, visit https://www.wolterskluwer.com/en/health.

###

About Wolters Kluwer

Wolters Kluwer (EURONEXT: WKL) is a global leader in information, software solutions and services for professionals in healthcare; tax and accounting; financial and corporate compliance; legal and regulatory; corporate performance and ESG. We help our customers make critical decisions every day by providing expert solutions that combine deep domain knowledge with technology and services.

Wolters Kluwer reported 2024 annual revenues of €5.9 billion. The group serves customers in over 180 countries, maintains operations in over 40 countries, and employs approximately 21,600 people worldwide. The company is headquartered in Alphen aan den Rijn, the Netherlands. For more information, visit www.wolterskluwer.com, follow us on LinkedIn, Facebook, YouTube and Instagram.

END


ELSE PRESS RELEASES FROM THIS DATE:

Insufficient sleep associated with decreased life expectancy

2025-12-08
A good night’s sleep is more than a luxury: New research from Oregon Health & Science University suggests that insufficient sleep may shorten your life. The study published today in the journal SLEEP Advances. Researchers tapped a vast, nationwide database looking for survey trends associated with average life expectancy county by county. They compared county-level data about average life expectancy with comprehensive survey data collected by the Centers for Disease Control and Prevention between 2019 and 2025. As a behavioral driver for life expectancy, sleep stood ...

Intellicule receives NIH grant to develop biomolecular modeling software

2025-12-08
WEST LAFAYETTE, Ind. — Intellicule, a software company whose solutions determine the 3D structures of biomolecules imaged with cryogenic-electron microscopy (cryo-EM), has received a $217,941 Small Business Innovation Research (SBIR) Phase I grant from the National Institutes of Health. Daisuke Kihara, who leads Intellicule, said the grant will be used to develop software technology that could impact precision medicine. “It will have the potential to accelerate the development of novel drugs by offering precise structural information that can guide the design of molecules with improved ...

Mount Sinai study finds childhood leukemia aggressiveness depends on timing of genetic mutation

2025-12-08
New York, NY (December 8, 2025) – A team of researchers at the Icahn School of Medicine at Mount Sinai has uncovered why children with the same leukemia-causing gene mutation can have dramatically different outcomes: it depends on when in development the mutation first occurs. The study, led by Elvin Wagenblast, PhD, Assistant Professor of Oncological Sciences, and Pediatrics, at the Icahn School of Medicine at Mount Sinai, was published this week in Cancer Discovery, a journal of the American Association for Cancer Research. It shows that leukemia ...

RSS Research Award for new lidar technology for cloud research

2025-12-08
Potsdam/Leipzig. The Reinhard Süring Foundation's 2025 Research Award goes to Leipzig-based atmospheric researcher Dr. Cristofer Jiménez for his contributions to a remote sensing technology that makes it possible to study the interactions between particles and clouds much better than ever before. The so-called dual-field-of-view polarisation lidar is based on two different aperture angles, which are used to observe and compare the reflections of laser beams in the atmosphere. Every three years, the Reinhard Süring Foundation Research Prize honours young scientists for outstanding work in a subfield of meteorology. In 2025, the prize was awarded for "New ...

Novel AI technique able to distinguish between progressive brain tumours and radiation necrosis, York University study finds

2025-12-08
TORONTO, Dec. 8 2025 — While targeted radiation can be an effective treatment for brain tumours, subsequent potential necrosis of the treated areas can be hard to distinguish from the tumours on a standard MRI. A new study published today led by a York University professor in the Lassonde School of Engineering found that a novel AI-based method is better able to distinguish between the two types of lesions on advanced MRI than the human eye alone, a discovery that could help clinicians more accurately identify and treat the issues.   “The study shows, for the first time, that novel attention-guided ...

Why are abstinent smokers more sensitive to pain?

2025-12-08
Abstinent smokers experience increased pain sensitivity during withdrawal, to the point that they often require more pain relief after surgery. Why? New from JNeurosci, Zhijie Lu, from Fudan University Minhang Hospital, and Kai Wei, from Shanghai Eastern Hepatobiliary Surgery Hospital, led a team of researchers to explore brain activity linking nicotine withdrawal and pain sensitivity.  The researchers found that 30 abstinent ...

Alexander Khalessi, MD, MBA, appointed Chief Innovation Officer

2025-12-08
UC San Diego Health has appointed Alexander Khalessi, MD, MBA, as the new chief innovation officer. Additionally, he will serve as interim assistant vice chancellor for Health Sciences Innovation and AI at UC San Diego. In this dual role, Khalessi will shape UC San Diego Health innovation strategy and lead the integration of new technologies, including artificial intelligence (AI), across the health system and academic enterprise. His appointment reflects UC San Diego Health’s commitment to accelerating innovations that support clinicians, strengthen ...

Optical chip pioneers physical-layer public-key encryption with partial coherence

2025-12-08
Public-key encryption is essential for secure communications, eliminating the need for pre-shared keys. In the information age, our digital lives, from online payments to private communications, depend on a powerful technology known as the "public-key cryptosystem." This can be envisioned as a "digital safe" with two distinct keys: a public key for anyone to encrypt information, and a private key, held only by the recipient, for decryption. The security of algorithms like RSA is based on classical mathematical problems, such as factoring a large integer ...

How your brain understands language may be more like AI than we ever imagined

2025-12-08
A new study reveals that the human brain processes spoken language in a sequence that closely mirrors the layered architecture of advanced AI language models. Using electrocorticography data from participants listening to a narrative, the research shows that deeper AI layers align with later brain responses in key language regions such as Broca’s area. The findings challenge traditional rule-based theories of language comprehension and introduce a publicly available neural dataset that sets a new benchmark for studying how the brain constructs meaning. In a study published in Nature Communications, ...

Missed signals: Virginia’s septic strategies overlook critical timing, study warns

2025-12-08
FOR IMMEDIATE RELEASE  Embargoed For Release Until December 8, 2025  Missed Signals: Virginia’s Septic Strategies Overlook Critical Timing, Study Warns  Washington, D.C., December 8, 2025 – A new study from the University of Maryland’s Jerin Tasnim, reveals that Virginia's current approach to managing septic system failures misses a critical factor: the time-varying relationship between hydrological stressors and septic system performance. This gap limits the state's ability to proactively identify and intervene in high-risk areas before failures occur—and before ...

LAST 30 PRESS RELEASES:

$80 million in donations propels UCI MIND toward world-class center focused on dementia

Illinois research uncovers harvest and nutrient strategies to boost bioenergy profits

How did Bronze Age plague spread? A sheep might solve the mystery

Mental health professionals urged to do their own evaluations of AI-based tools

Insufficient sleep associated with decreased life expectancy

Intellicule receives NIH grant to develop biomolecular modeling software

Mount Sinai study finds childhood leukemia aggressiveness depends on timing of genetic mutation

RSS Research Award for new lidar technology for cloud research

Novel AI technique able to distinguish between progressive brain tumours and radiation necrosis, York University study finds

Why are abstinent smokers more sensitive to pain?

Alexander Khalessi, MD, MBA, appointed Chief Innovation Officer

Optical chip pioneers physical-layer public-key encryption with partial coherence

How your brain understands language may be more like AI than we ever imagined

Missed signals: Virginia’s septic strategies overlook critical timing, study warns

Delayed toxicities after CAR T cell therapy for multiple myeloma are connected and potentially preventable

Scientists find cellular key to helping plants survive in saltwater

Medical cannabis program reduces opioid use

Immunotherapy works for sepsis thanks to smart patient selection

Cardiovascular events 1 year after RSV infection in adults

US medical prices and health insurance premiums, 1999-2024

Medical cannabis and opioid receipt among adults with chronic pain

Multichannel 3D-printed bioactive scaffold combined with siRNA delivery for spinal cord injury recovery

Triaptosis—an emerging paradigm in cancer therapeutics

A new paradigm in spectroscopic sensing: The revolutionary leap of SERS-optical waveguide integration and ai-enabled ultra-sensitive detection

Sweet tooth: How blood sugar migration in diabetes affects cavity development

Lowest suicide rate is in December but some in media still promote holiday-suicide myth

Record-breaking cosmic explosion challenges astronomers’ understanding of gamma-ray bursts

Excessive heat harms young children’s development, study suggests

Quanta Books to publish popular math and physics titles by Terence Tao and David Tong

Philanthropic partnerships fund next-generation instruments for mid-sized telescopes

[Press-News.org] Mental health professionals urged to do their own evaluations of AI-based tools
Three-part practical approach requires no technical expertise