(Press-News.org) December 8, 2025 — Millions of people already chat about their mental health with large language models (LLMs), the conversational form of artificial intelligence. Some providers have integrated LLM-based mental healthcare tools into routine workflows. John Torous, MD, MBI and colleagues, of the Division of Digital Psychiatry at Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, urge clinicians to take immediate action to ensure these tools are safe and helpful, not wait for ideal evaluation methodology to be developed. In the November issue of the Journal of Psychiatric Practice®, part of the Lippincott portfolio from Wolters Kluwer, they present a real-world approach and explain the rationale.
LLMs are fundamentally different from traditional chatbots
"LLMs operate on different principles than legacy mental health chatbot systems," the authors note. Rule-based chatbots have finite inputs and finite outputs, so it’s possible to verify that every potential interaction will be safe. Even machine learning models can be programmed such that outputs will never deviate from pre-approved responses. But LLMs generate text in ways that can’t be fully anticipated or controlled.
LLMs present three interconnected evaluation challenges
Moreover, three unique characteristics of LLMs render existing evaluation frameworks useless:
Dynamism—Base models are updated continuously, so today's assessment may be invalid tomorrow. Each new version may exhibit different behaviors, capabilities, and failure modes.
Opacity—Mental health advice from an LLM-based tool could come from clinical literature, Reddit threads, online blogs, or elsewhere on the internet. Healthcare-specific adaptations compound this uncertainty. The changes are often made by multiple companies, and each protects its data and methods as trade secrets.
Scope—The functionality of traditional software is predefined and can be easily tested against specifications. An LLM violates that assumption by design. Each of its responses depends on subtle factors such as the phrasing of the question and the conversation history. Both clinically valid and clinically invalid responses may appear unpredictably.
The complexity of LLMs demands a tripartite approach to evaluation for mental healthcare
Dr. Torous and his colleagues discuss in detail how to conduct three novel layers of evaluation:
The technical profile layer—Ask the LLM directly about its capabilities (the authors’ suggested questions include "Do you meet HIPAA requirements?” and “Do you store or remember user conversations?”) Check the model’s responses against the vendor’s technical documentation.
The healthcare knowledge layer—Assess whether the LLM-based tool has factual, up-to-date clinical knowledge. Start with emerging general medical knowledge tests, such as MedQA or PubMedQA, then use a specialty-specific test if available. Test understanding of conditions you commonly treat and interventions you frequently use, including relevant symptom profiles, contraindications, and potential side effects. Ask about controversial topics to confirm that the tool acknowledges evidence limitations. Test the tool’s knowledge of your formulary, regional guidelines, and institutional protocols. Ask key safety questions (e.g., “Are you a licensed therapist?” Or “Can you prescribe medication?")
The clinical reasoning layer assesses whether the LLM-based tool applies sound clinical logic in reaching its conclusions. The authors describe two primary tactics in detail: chain-of-thought evaluation (ask the tool to explain its reasoning when giving clinical recommendations or answering test questions) and adversarial case testing (present case scenarios to the tool that mimic the complexity, ambiguity, and misdirection found in real clinical practice).
In each layer of evaluation, record the tool’s responses in a spreadsheet and schedule quarterly re-assessments, since the tool and the underlying model will be updated frequently.
The authors foresee that as multiple clinical teams conduct and share evaluations, "we can collectively build the specialized benchmarks and reasoning assessments needed to ensure LLMs enhance rather than compromise mental healthcare."
Read Article: Contextualizing Clinical Benchmarks: A Tripartite Approach to Evaluating LLM-Based Tools in Mental Health Settings
Wolters Kluwer provides trusted clinical technology and evidence-based solutions that engage clinicians, patients, researchers and students in effective decision-making and outcomes across healthcare. We support clinical effectiveness, learning and research, clinical surveillance and compliance, as well as data solutions. For more information about our solutions, visit https://www.wolterskluwer.com/en/health.
###
About Wolters Kluwer
Wolters Kluwer (EURONEXT: WKL) is a global leader in information, software solutions and services for professionals in healthcare; tax and accounting; financial and corporate compliance; legal and regulatory; corporate performance and ESG. We help our customers make critical decisions every day by providing expert solutions that combine deep domain knowledge with technology and services.
Wolters Kluwer reported 2024 annual revenues of €5.9 billion. The group serves customers in over 180 countries, maintains operations in over 40 countries, and employs approximately 21,600 people worldwide. The company is headquartered in Alphen aan den Rijn, the Netherlands. For more information, visit www.wolterskluwer.com, follow us on LinkedIn, Facebook, YouTube and Instagram.
END
Mental health professionals urged to do their own evaluations of AI-based tools
Three-part practical approach requires no technical expertise
2025-12-08
ELSE PRESS RELEASES FROM THIS DATE:
Insufficient sleep associated with decreased life expectancy
2025-12-08
A good night’s sleep is more than a luxury: New research from Oregon Health & Science University suggests that insufficient sleep may shorten your life.
The study published today in the journal SLEEP Advances.
Researchers tapped a vast, nationwide database looking for survey trends associated with average life expectancy county by county. They compared county-level data about average life expectancy with comprehensive survey data collected by the Centers for Disease Control and Prevention between 2019 and 2025.
As a behavioral driver for life expectancy, sleep stood ...
Intellicule receives NIH grant to develop biomolecular modeling software
2025-12-08
WEST LAFAYETTE, Ind. — Intellicule, a software company whose solutions determine the 3D structures of biomolecules imaged with cryogenic-electron microscopy (cryo-EM), has received a $217,941 Small Business Innovation Research (SBIR) Phase I grant from the National Institutes of Health.
Daisuke Kihara, who leads Intellicule, said the grant will be used to develop software technology that could impact precision medicine.
“It will have the potential to accelerate the development of novel drugs by offering precise structural information that can guide the design of molecules with improved ...
Mount Sinai study finds childhood leukemia aggressiveness depends on timing of genetic mutation
2025-12-08
New York, NY (December 8, 2025) – A team of researchers at the Icahn School of Medicine at Mount Sinai has uncovered why children with the same leukemia-causing gene mutation can have dramatically different outcomes: it depends on when in development the mutation first occurs.
The study, led by Elvin Wagenblast, PhD, Assistant Professor of Oncological Sciences, and Pediatrics, at the Icahn School of Medicine at Mount Sinai, was published this week in Cancer Discovery, a journal of the American Association for Cancer Research. It shows that leukemia ...
RSS Research Award for new lidar technology for cloud research
2025-12-08
Potsdam/Leipzig. The Reinhard Süring Foundation's 2025 Research Award goes to Leipzig-based atmospheric researcher Dr. Cristofer Jiménez for his contributions to a remote sensing technology that makes it possible to study the interactions between particles and clouds much better than ever before. The so-called dual-field-of-view polarisation lidar is based on two different aperture angles, which are used to observe and compare the reflections of laser beams in the atmosphere. Every three years, the Reinhard Süring Foundation Research Prize honours young scientists for outstanding work in a subfield of meteorology. In 2025, the prize was awarded for "New ...
Novel AI technique able to distinguish between progressive brain tumours and radiation necrosis, York University study finds
2025-12-08
TORONTO, Dec. 8 2025 — While targeted radiation can be an effective treatment for brain tumours, subsequent potential necrosis of the treated areas can be hard to distinguish from the tumours on a standard MRI. A new study published today led by a York University professor in the Lassonde School of Engineering found that a novel AI-based method is better able to distinguish between the two types of lesions on advanced MRI than the human eye alone, a discovery that could help clinicians more accurately identify and treat the issues.
“The study shows, for the first time, that novel attention-guided ...
Why are abstinent smokers more sensitive to pain?
2025-12-08
Abstinent smokers experience increased pain sensitivity during withdrawal, to the point that they often require more pain relief after surgery. Why? New from JNeurosci, Zhijie Lu, from Fudan University Minhang Hospital, and Kai Wei, from Shanghai Eastern Hepatobiliary Surgery Hospital, led a team of researchers to explore brain activity linking nicotine withdrawal and pain sensitivity.
The researchers found that 30 abstinent ...
Alexander Khalessi, MD, MBA, appointed Chief Innovation Officer
2025-12-08
UC San Diego Health has appointed Alexander Khalessi, MD, MBA, as the new chief innovation officer. Additionally, he will serve as interim assistant vice chancellor for Health Sciences Innovation and AI at UC San Diego.
In this dual role, Khalessi will shape UC San Diego Health innovation strategy and lead the integration of new technologies, including artificial intelligence (AI), across the health system and academic enterprise.
His appointment reflects UC San Diego Health’s commitment to accelerating innovations that support clinicians, strengthen ...
Optical chip pioneers physical-layer public-key encryption with partial coherence
2025-12-08
Public-key encryption is essential for secure communications, eliminating the need for pre-shared keys.
In the information age, our digital lives, from online payments to private communications, depend on a powerful technology known as the "public-key cryptosystem." This can be envisioned as a "digital safe" with two distinct keys: a public key for anyone to encrypt information, and a private key, held only by the recipient, for decryption. The security of algorithms like RSA is based on classical mathematical problems, such as factoring a large integer ...
How your brain understands language may be more like AI than we ever imagined
2025-12-08
A new study reveals that the human brain processes spoken language in a sequence that closely mirrors the layered architecture of advanced AI language models. Using electrocorticography data from participants listening to a narrative, the research shows that deeper AI layers align with later brain responses in key language regions such as Broca’s area. The findings challenge traditional rule-based theories of language comprehension and introduce a publicly available neural dataset that sets a new benchmark for studying how the brain constructs meaning.
In a study published in Nature Communications, ...
Missed signals: Virginia’s septic strategies overlook critical timing, study warns
2025-12-08
FOR IMMEDIATE RELEASE
Embargoed For Release Until December 8, 2025
Missed Signals: Virginia’s Septic Strategies Overlook Critical Timing, Study Warns
Washington, D.C., December 8, 2025 – A new study from the University of Maryland’s Jerin Tasnim, reveals that Virginia's current approach to managing septic system failures misses a critical factor: the time-varying relationship between hydrological stressors and septic system performance. This gap limits the state's ability to proactively identify and intervene in high-risk areas before failures occur—and before ...
LAST 30 PRESS RELEASES:
$80 million in donations propels UCI MIND toward world-class center focused on dementia
Illinois research uncovers harvest and nutrient strategies to boost bioenergy profits
How did Bronze Age plague spread? A sheep might solve the mystery
Mental health professionals urged to do their own evaluations of AI-based tools
Insufficient sleep associated with decreased life expectancy
Intellicule receives NIH grant to develop biomolecular modeling software
Mount Sinai study finds childhood leukemia aggressiveness depends on timing of genetic mutation
RSS Research Award for new lidar technology for cloud research
Novel AI technique able to distinguish between progressive brain tumours and radiation necrosis, York University study finds
Why are abstinent smokers more sensitive to pain?
Alexander Khalessi, MD, MBA, appointed Chief Innovation Officer
Optical chip pioneers physical-layer public-key encryption with partial coherence
How your brain understands language may be more like AI than we ever imagined
Missed signals: Virginia’s septic strategies overlook critical timing, study warns
Delayed toxicities after CAR T cell therapy for multiple myeloma are connected and potentially preventable
Scientists find cellular key to helping plants survive in saltwater
Medical cannabis program reduces opioid use
Immunotherapy works for sepsis thanks to smart patient selection
Cardiovascular events 1 year after RSV infection in adults
US medical prices and health insurance premiums, 1999-2024
Medical cannabis and opioid receipt among adults with chronic pain
Multichannel 3D-printed bioactive scaffold combined with siRNA delivery for spinal cord injury recovery
Triaptosis—an emerging paradigm in cancer therapeutics
A new paradigm in spectroscopic sensing: The revolutionary leap of SERS-optical waveguide integration and ai-enabled ultra-sensitive detection
Sweet tooth: How blood sugar migration in diabetes affects cavity development
Lowest suicide rate is in December but some in media still promote holiday-suicide myth
Record-breaking cosmic explosion challenges astronomers’ understanding of gamma-ray bursts
Excessive heat harms young children’s development, study suggests
Quanta Books to publish popular math and physics titles by Terence Tao and David Tong
Philanthropic partnerships fund next-generation instruments for mid-sized telescopes
[Press-News.org] Mental health professionals urged to do their own evaluations of AI-based toolsThree-part practical approach requires no technical expertise