(Press-News.org) When summarizing scientific studies, large language models (LLMs) like ChatGPT and DeepSeek produce inaccurate conclusions in up to 73% of cases, according to a new study by Uwe Peters (Utrecht University) and Benjamin Chin-Yee (Western University, Canada/University of Cambridge, UK). The researchers tested the most prominent LLMs and analyzed thousands of chatbot-generated science summaries, revealing that most models consistently produced broader conclusions than those in the summarized texts. Surprisingly, prompts for accuracy increased the problem and newer LLMs performed worse than older ones.
Almost 5,000 LLM-generated summaries analyzed
The study evaluated how accurately ten leading LLMs, including ChatGPT, DeepSeek, Claude, and LLaMA, summarize abstracts and full-length articles from top science and medical journals (e.g., Nature, Science, and Lancet). Testing LLMs over one year, the researchers collected 4,900 LLM-generated summaries. Six of ten models systematically exaggerated claims found in the original texts often in subtle but impactful ways, for instance, changing cautious, past-tense claims like “The treatment was effective in this study” to a more sweeping, present-tense version like “The treatment is effective.” These changes can mislead readers into believing that findings apply much more broadly than they actually do.
Accuracy prompts backfired
Strikingly, when the models where explicitly prompted to avoid inaccuracies, they were nearly twice as likely to produce overgeneralized conclusions than when given a simple summary request. “This effect is concerning,” Peters said: “Students, researchers, and policymakers may assume that if they ask ChatGPT to avoid inaccuracies, they’ll get a more reliable summary. Our findings prove the opposite.”
Do humans do better?
Peters and Chin-Yee also directly compared chatbot-generated to human-written summaries of the same articles. Unexpectedly, chatbots were nearly five times more likely to produce broad generalizations than their human counterparts. “Worryingly”, said Peters, “newer AI models, like ChatGPT-4o and DeepSeek, performed worse than older ones.”
Why are these exaggerations happening? “Previous studies found that overgeneralizations are common in science writing, so it’s not surprising that models trained on these texts reproduce that pattern”, Chin-Yee noted. Additionally, since human users likely often prefer LMM responses that sound helpful and widely applicable, through interactions, the models may learn to favor fluency and generality over precision, Peters suggested.
Reducing the risks
The researchers recommend using LLMs such as Claude, which had the highest generalization accuracy, setting chatbots to lower ‘temperature’ (the parameter fixing a chatbot’s ‘creativity’), and using prompts that enforce indirect, past-tense reporting in science summaries. Finally, “If we want AI to support science literacy rather than undermine it,” Peters said, “we need more vigilance and testing of these systems in science communication contexts.”
END
Prominent chatbots routinely exaggerate science findings, study shows
2025-05-13
ELSE PRESS RELEASES FROM THIS DATE:
First-ever long read datasets added to two Kids First studies
2025-05-13
This new Kids First data creates a fuller understanding of how genetics contributes to childhood cancers and congenital disorders, opening additional doors for prevention and treatment.
WHO: The Gabriella Miller Kids First Pediatric Research Program (Kids First), an initiative of the National Institutes of Health (NIH). Kids First data, tools, and resources are available via the Kids First Data Resource Center (DRC).
WHAT: The 2025 releases represent the first batch of long read sequencing data ...
Dual-laser technique lowers Brillouin sensing frequency to 200 MHz
2025-05-13
Scientists have developed a dual-laser Brillouin optical correlation-domain reflectometry (BOCDR) system that uses two frequency-modulated lasers. By scanning the relative modulation phase between the pump and reference lasers, the setup measures strain and temperature all along an optical fiber. In a proof-of-concept test on a 13-meter silica fiber, the team recorded Brillouin gain spectra (BGS) at only about 200 MHz—over 50 times lower than the usual 11 GHz band.
Their research was published in Journal of Physics: Photonics on April 25, 2025.
“The dual-laser approach makes BOCDR equipment simpler, more cost-effective, ...
Zhaoqi Yan named a 2025 Warren Alpert Distinguished Scholar
2025-05-13
SAN FRANCISCO—Zhaoqi Yan, PhD, a scientist at Gladstone Institutes, has been named a 2025 Warren Alpert Distinguished Scholar. The fellowship award is given annually to five postdoctoral researchers in the United States who demonstrate exceptional creativity in the field of neuroscience.
Yan studies how blood proteins that leak into the brain through damaged blood vessels can drive brain inflammation and neurodegeneration. Molecular mechanisms behind this dysfunction in the blood-brain barrier remain unclear, and effective therapeutic strategies are lacking—something Yan hopes to change.
With the support from the Warren Alpert Foundation, he will use cutting-edge techniques to ...
Editorial for the special issue on subwavelength optics
2025-05-13
The field of subwavelength optics has opened new avenues for investigating light–matter interactions by enabling the exploration of novel phenomena at the subwavelength scale. In recent decades, advancements in fundamental understanding and micro–nano-technologies have significantly propelled the development of subwavelength optics and its practical applications. For instance, progress in surface plasmon subwavelength optics, which facilitates the confinement of light at scales below the diffraction limit, forms a basis for transformative applications such as sub-diffraction-limit imaging, waveguiding and sensing. Moreover, advancements ...
Oyster fossils shatter myth of weak seasonality in greenhouse climate
2025-05-13
An international research team studying fossilized oyster shells has revealed substantial annual temperature variation in sea water during the Early Cretaceous. The finding overturns the assumption that Earth's greenhouse periods are marked by universally warmer and uniformly stable temperatures.
The team, led by Prof. DING Lin from the Institute of Tibetan Plateau Research at the Chinese Academy of Sciences (CAS), in collaboration with researchers from the Senckenberg Biodiversity and Climate ...
Researchers demonstrate 3-D printing technology to improve comfort, durability of ‘smart wearables’
2025-05-13
Imagine a T-shirt that could monitor your heart rate or blood pressure. Or a pair of socks that could provide feedback on your running stride.
It may be closer than you think, with new research from Washington State University demonstrating a particular 3-D ink printing method for so-called smart fabrics that continue to perform well after repeated washings and abrasion tests. The research, published in the journal ACS Omega, represents a breakthrough in smart fabric comfort and durability, as well as using a process that is more environmentally friendly.
Hang Liu, a textile researcher at WSU and the corresponding author of the paper, ...
USPSTF recommendation on screening for syphilis infection during pregnancy
2025-05-13
Bottom Line: The U.S. Preventive Services Task Force (USPSTF) recommends early, universal screening for syphilis infection during pregnancy; if an individual is not screened early in pregnancy, the USPSTF recommends screening at the first available opportunity. Untreated syphilis infection during pregnancy can be passed to the fetus, causing congenital syphilis. Congenital syphilis is associated with premature birth, low birth weight, stillbirth, neonatal death, and significant abnormalities in the infant such as deformed bones, anemia, enlarged liver and spleen, jaundice, brain and nerve problems (e.g., permanent vision or hearing ...
Butterflies hover differently from other flying organisms, thanks to body pitch
2025-05-13
WASHINGTON, May 13, 2025 – Butterflies’ flight trajectories often appear random or chaotic, and compared with other hovering insects, their bodies follow seemingly mysterious, jagged, jerking motions.
These unique hovering patterns, however, can potentially provide critical design insights for developing micro aerial vehicles (MAVs) with flapping wings. To help achieve these applications, researchers from Beihang University studied how butterflies use aerodynamic force generation to achieve hovering. They discuss ...
New approach to treating aggressive breast cancers shows significant improvement in survival
2025-05-13
A new treatment approach significantly improves survival rates for patients with aggressive, inherited breast cancers, according to Cambridge researchers.
In a trial where cancers were treated with chemotherapy followed by a targeted cancer drug before surgery, 100% of patients survived the critical three-year period post-surgery.
The discovery, published today in Nature Communications, could become the most effective treatment to date for patients with early-stage breast cancer with inherited BRCA1 and BRCA2 gene mutations.
Breast cancers with faulty copies of the BRCA1 and BRCA2 genes are challenging to treat, and came to public ...
African genetic ancestry, structural and social determinants of health, and mortality in Black adults
2025-05-13
About The Study: In this study, associations of structural and social determinants of health with mortality persisted with adjustment for percentage African genetic ancestry. The findings support the hypothesis that structural and social determinants of health should be the primary factors to consider for eliminating health disparities.
Corresponding Author: To contact the corresponding author, Hari S. Iyer, ScD, MPH, email hi97@cinj.rutgers.edu.
To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/
(doi:10.1001/jamanetworkopen.2025.10016)
Editor’s ...