New study reveals high rates of fabricated and inaccurate citations in LLM-generated mental health research

2025-11-17

(Press-News.org) (Toronto, November 17, 2025) A new study published in the peer-reviewed journal JMIR Mental Health by JMIR Publications highlights a critical risk in the growing use of Large Language Models (LLMs) like GPT-4o by researchers: the frequent fabrication and inaccuracy of bibliographic citations. The findings underscore an urgent need for rigorous human verification and institutional safeguards to protect research integrity, particularly in specialized and less publicly known fields within mental health.

Nearly 1 in 5 Citations Fabricated by GPT-4o in Literature Reviews

The article, titled "Influence of Topic Familiarity and Prompt Specificity on Citation Fabrication in Mental Health Research Using Large Language Models: Experimental Study," found that 19.9% of all citations generated by GPT-4o across six simulated literature reviews were entirely fabricated, meaning they could not be traced to any real publication. Furthermore, among the seemingly real citations, 45.4% contained bibliographic errors, most commonly incorrect or invalid Digital Object Identifiers (DOIs).

This timely research is highly relevant as academic journals have encountered instances of seemingly AI-hallucinated references in recent submissions. These bibliographic hallucinations and errors are not just formatting issues; they break the chain of verifiability, mislead readers, and fundamentally compromise the integrity and trustworthiness of scientific results and the cumulative knowledge base. This makes the need for careful scrutiny and verification paramount to safeguard academic rigor.

Reliability Varies by Topic Familiarity and Specificity

The research, conducted by a team including Jake Linardon, PhD, from Deakin University and his colleagues, systematically tested the reliability of GPT-4o's output across mental health topics with varying levels of public awareness and scientific maturity: major depressive disorder (high familiarity), binge eating disorder (moderate), and body dysmorphic disorder (low). They also tested general versus specialized review prompts (e.g., focusing on digital interventions).

Fabrication Risk is Highest for Less Familiar Topics: Fabrication rates were significantly higher for topics with lower public familiarity and research coverage, such as binge eating disorder (28%) and body dysmorphic disorder (29%), compared to major depressive disorder (6%).

Specialized Topics Pose a Higher Risk: While not universally true, stratified analysis showed that fabrication rates were significantly higher for specialized reviews (e.g., evidence for digital interventions) compared to general overviews for certain disorders, such as binge eating disorder.

Overall Inaccuracy is Pervasive: In total, nearly two-thirds of all citations generated by GPT-4o were either fabricated or contained errors, indicating a major reliability issue.

Urgent Call for Human Oversight and New Safeguards

The study’s conclusions issue a strong warning to the academic community: Citation fabrication and errors remain common in GPT-4o outputs. The authors stress that the reliability of LLM-generated citations is not fixed but is contingent on the topic and the way the prompt is designed.

Key Implications Highlighted in the Study:

Rigorous Verification is Mandatory: Researchers and students must subject all LLM-generated references to careful human verification to validate their accuracy and authenticity.

Journal and Institutional Role: Journal editors and publishers must implement stronger safeguards, potentially using detection software that flags citations that do not match existing sources, signaling a potential hallucination.

Policy and Training: Academic institutions must develop clear policies and training to equip users with the skills to critically assess LLM outputs and to design strategic prompts, especially when exploring less visible or highly specialized research topics.

Original article:

Linardon J, Jarman H, McClure Z, Anderson C, Liu C, Messer M. Influence of Topic Familiarity and Prompt Specificity on Citation Fabrication in Mental Health Research Using Large Language Models: Experimental Study. JMIR Ment Health 2025;12:e80371

URL: https://mental.jmir.org/2025/1/e80371

DOI: 10.2196/80371

About JMIR Publications

JMIR Publications is a leading open access publisher of digital health research and a champion of open science. With a focus on author advocacy and research amplification, JMIR Publications partners with researchers to advance their careers and maximize the impact of their work. As a technology organization with publishing at its core, we provide innovative tools and resources that go beyond traditional publishing, supporting researchers at every step of the dissemination process. Our portfolio features a range of peer-reviewed journals, including the renowned Journal of Medical Internet Research.

To learn more about JMIR Publications, please visit jmirpublications.com or connect with us via X, LinkedIn, YouTube, Facebook, and Instagram.

Head office: 130 Queens Quay East, Unit 1100, Toronto, ON, M5A 0P6 Canada

Media Contact:

Dennis O’Brien, Vice President, Communications & Partnerships

JMIR Publications

communications@jmir.org

The content of this communication is licensed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, published by JMIR Publications, is properly cited.

END

ELSE PRESS RELEASES FROM THIS DATE:

New 'heart percentile' calculator helps young adults grasp their long-term risk

2025-11-17

First tool to estimate percentiles of 30-year heart disease risk for adults ages 30–59 Aims to spark earlier prevention efforts amid rising diabetes and hypertension in young adults Men showed the highest long-term risk in national analysis Free online calculator is based on the American Heart Association’s PREVENT equations CHICAGO --- Just as saving for retirement starts early, so should protecting your heart. A new Northwestern Medicine study introduces a first-of-its-kind online calculator that uses percentiles to help younger adults forecast and understand their risk of a heart event over the next 30 years. ...

SwRI expands capabilities in large-scale heat exchanger testing

2025-11-17

SAN ANTONIO — November 17, 2025 — Southwest Research Institute (SwRI) has significantly expanded its heat exchanger performance evaluation capabilities with a new facility designed to industry standards, the Large-Scale Heat Exchanger Test Facility (LS-HXTF) that supports testing up to five megawatts of heat loads as well as a wider range of thermal performance testing. Heat exchangers efficiently transfer heat between two or more fluids without mixing for a wide variety of heating and cooling applications. The ...

CRISPR breakthrough reverses chemotherapy resistance in lung cancer

2025-11-17

WILMINGTON, DEL. (November 14, 2025) – In a major step forward for cancer care, researchers at ChristianaCare’s Gene Editing Institute have shown that disabling the NRF2 gene with CRISPR technology can reverse chemotherapy resistance in lung cancer. The approach restores drug sensitivity and slows tumor growth. The findings appear today in the journal Molecular Therapy Oncology. This breakthrough stems from more than a decade of research by the Gene Editing Institute into the NRF2 gene, a known driver of treatment resistance. The results were consistent across multiple in vitro studies using human lung cancer cell lines and in vivo animal models. “We’ve ...

Study reveals potential and beauty of the world unseen

2025-11-17

A University of Otago – Ōtākou Whakaihu Waka-led study has produced a detailed blueprint of a bacteriophage, furthering their potential in the fight against drug-resistant bacteria. Lead author Dr James Hodgkinson-Bean, who completed his PhD in the Department of Microbiology and Immunology, says bacteriophages are “extremely exciting” in the scientific world as researchers search for antibiotic alternatives to combat the increasing risk of antimicrobial resistance. “Bacteriophage viruses are non-harmful to all multi-cellular life and able to ...

Duke-NUS study: Over 90% of older adults with dementia undergo burdensome interventions in their final year

2025-11-17

Singapore, 17 November 2025—A new study by researchers from Duke-NUS Medical School has revealed that almost all community-dwelling older adults with advanced dementia in Singapore experience at least one potentially burdensome intervention in their last year of life. The findings highlight an urgent need for new strategies to support families and reduce unnecessary interventions at the end of life. Although the number of individuals living with dementia in the Asia-Pacific region is projected to rise to 71 million by ...

Not all PTSD therapies keep veterans in treatment, study warns

2025-11-17

About a quarter of U.S. service members and veterans who start psychotherapy for post-traumatic stress disorder quit before they finish treatment. But not all therapies are equal in their appeal, with some effective approaches reporting the highest dropout rates, according to research published by the American Psychological Association. PTSD affects about 7% of veterans at some point in their lives, slightly higher than the rate seen in the general U.S. adult population, according to the U.S. Department of Veterans Affairs. Beyond PTSD’s emotional impact, the American Heart Association notes that it can also ...

New research shows how friends’ support protects intercultural couples

2025-11-17

New research examines how social approval from different sources predicts relationship quality for intercultural couples. Researchers found that having supportive friends can be a powerful protective factor, especially when they face disapproval from family or society more broadly. The research, published in Social Psychological and Personality Science, advances research on intercultural relationships by drawing on a large sample of people in such relationships. This sample allowed researchers to study how social approval varies across cultural backgrounds, racial pairing, relationship length, and gender. “The results highlight that friends and family can play distinct roles: for example, ...

FAU Engineering secures NIH grant to explore how the brain learns to ‘see’

2025-11-17

Vision is one of the most fundamental senses, shaping how we perceive, navigate and interact with the world around us. Yet for more than 12 million Americans living with visual impairments, even small deficits can profoundly impact daily life, limiting independence and overall quality of life. Researchers have long recognized the potential of visual perceptual learning (VPL) – a process by which the brain improves its ability to detect subtle differences in visual stimuli, such as fine patterns or orientations – to enhance vision. VPL is already being explored ...

One of world’s most detailed virtual brain simulations is changing how we study the brain

2025-11-17

SEATTLE, WASH. — NOVEMBER 17, 2025 — Harnessing the muscle of one of the world’s fastest supercomputers, researchers have built one of the largest and most detailed biophysically realistic brain simulations of an animal ever. This virtual copy of a whole mouse cortex allows researchers to study the brain in a new way: simulating diseases like Alzheimer’s or epilepsy in the virtual world to watch in detail how damage spreads throughout neural networks or understanding cognition and consciousness. It simulates both form and function, with almost ten million neurons, 26 billion synapses, and 86 interconnected brain regions.  This spectacular achievement is the product ...

How early morning practices affect college athletes’ sleep

2025-11-17

COLUMBUS, Ohio – A study using more than 27,000 sleep records of collegiate athletes provides the best evidence to date that early morning team practices take a toll on healthy sleep. Researchers at The Ohio State University used data from wearable sleep trackers to measure sleep for 359 varsity athletes over five years. They found that when male athletes had team practices that began before 8 a.m., they averaged about 30 minutes less sleep the night before when compared to later morning workouts. Female athletes averaged about 20 minutes less sleep. Findings also showed evidence that ...

ASH 2025: Study suggests that a virtual program focusing on diet and exercise can help reduce side effects of lymphoma treatment

A sound defense: Noisy pupae puff away potential predators

Azacitidine–venetoclax combination outperforms standard care in acute myeloid leukemia patients eligible for intensive chemotherapy

Measurable residual disease shows strong potential as an early indicator of survival in patients with acute myeloid leukemia

Chemotherapy and radiation are comparable as pre-transplant conditioning for patients with b-acute lymphoblastic leukemia who have no measurable residual disease

New study reveals high rates of fabricated and inaccurate citations in LLM-generated mental health research

ELSE PRESS RELEASES FROM THIS DATE:

LAST 30 PRESS RELEASES: