(Press-News.org) Keeping up with the latest research is vital for scientists, but given that millions of scientific papers are published every year, that can prove difficult. Artificial intelligence systems show promise for quickly synthesizing seas of information, but they still tend to make things up, or “hallucinate.”
For instance, when a team led by researchers at the University of Washington and The Allen Institute for AI, or Ai2, studied a recent OpenAI model, GPT-4o, they found it fabricated 78-90% of its research citations. And general-purpose AI models like ChatGPT often can’t access papers that were published after their training data was collected.
So the UW and Ai2 research team built OpenScholar, an open-source AI model designed specifically to synthesize current scientific research. The team also created the first large, multi-domain benchmark for evaluating how well models can synthesize and cite scientific research. In tests, OpenScholar cited sources as accurately as human experts, and 16 scientists preferred its response to those written by subject experts 51% of the time.
The team published its findings Feb. 4 in Nature. The project’s code, data and a demo are publicly available and free to use.
“After we started this work, we put the demo online and quickly, we got a lot of queries, far more than we’d expected,” said senior author Hannaneh Hajishirzi, a UW associate professor in the Paul G. Allen School of Computer Science & Engineering and senior director at Ai2. “When we started looking through the responses we realized our colleagues and other scientists were actively using OpenScholar. It really speaks to the need for this sort of open-source, transparent system that can synthesize research.”
Researchers trained the model and then created a set of 45 million scientific papers for OpenScholar to pull from to ground its answers in established research. They coupled this with a technique called "retrieval-augmented generation,” which lets the model search for new sources, incorporate them and cite them after it’s been trained.
“Early on we experimented with using an AI model with Google’s search data, but we found it wasn’t very good on its own,” said lead author Akari Asai, a research scientist at Ai2 who completed this research as a UW doctoral student in the Allen School. “It might cite some research papers that weren’t the most relevant, or cite just one paper, or pull from a blog post randomly. We realized we needed to ground this in scientific papers. We then made the system flexible so that it could incorporate emerging research through results.”
To test their system, the team created ScholarQABench, a benchmark against which to test systems on scientific search. They gathered 3,000 queries and 250 longform answers written by experts in computer science, physics, biomedicine and neuroscience.
“AI is getting better and better at real world tasks,” Hajishirzi said. “But the big question ultimately is whether we can trust that its answers are correct.”
The team compared OpenScholar against other state-of-the-art AI models, such as OpenAI’s GPT-4o and two models from Meta. ScholarQABench automatically evaluated AI models’ answers on metrics such as their accuracy, writing quality and relevance.
OpenScholar outperformed all the systems it was tested against. The team had 16 scientists review answers from the models and compare them with human-written responses. The scientists preferred OpenScholar answers to human answers 51% of the time, but when they combined OpenScholar citation methods and pipelines with GPT-4o (a much bigger model), the scientists preferred the AI written answers to human answers 70% of the time. They picked answers from GPT-4o on its own only 32% of the time.
“Scientists see so many papers coming out every day that it’s impossible to keep up,” Asai said. “But the existing AI systems weren’t designed for scientists’ specific needs. We’ve already seen a lot of scientists using OpenScholar and because it’s open-source, others are building on this research and already improving on our results. We’re working on a followup model, DR Tulu, which builds on OpenScholar’s findings and performs multi-step search and information gathering to produce more comprehensive responses.”
Other co-authors include Jacqueline He, Rulin Shao, Weijia Shi, all UW doctoral students in the Allen School; Dan Weld, a UW professor emeritus in the Allen School and general manager and chief scientist at Ai2; Varsha Kishore, a UW postdoc in the Allen School and postdoc at Ai2; Luke Zettlemoyer, a UW professor in the Allen School; Pang Wei Koh, a UW assistant professor in the Allen School; Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D’Arcy, David Wadden, Matt Latzke, Jenna Sparks and Jena D. Hwang of Ai2; Wen-tau Yih of Meta; Minyang Tian, Shengyan Liu, Hao Tong and Bohao Wu of University of Illinois Urbana-Champaign; Pan Ji of University of North Carolina; Yanyu Xiong of Stanford University; and Graham Neubig of Carnegie Mellon University.
For more information, contact Asai at akaria@allenai.org and Hajishirzi at hannaneh@cs.washington.edu.
END
In a study, AI model OpenScholar synthesizes scientific research and cites sources as accurately as human experts
2026-02-04
ELSE PRESS RELEASES FROM THIS DATE:
New study reveals a minimalist bacterial defense that disrupts viral assembly
2026-02-04
University of Toronto researchers have expanded our understanding of bacterial immunity with the discovery of a new protein that can both sense and counteract viral infections.
In the new study, published today in Nature, researchers from U of T’s Temerty Faculty of Medicine describe how a single protein named Rip1 recognizes bacteriophages, the viruses that infect bacteria, and cause infected bacteria to die prematurely, thereby ending the chain of transmission.
“There are a lot of parallels between our immune system and bacterial ...
Scientists crack the rules of gene regulation with experimental elegance and AI
2026-02-04
Gene regulation is far more predictable than previously believed, scientists conclude after developing deep learning model PARM. This might bring an end to a scientific mystery: how genes know when to switch on or off. Today, scientists publish in Nature about their relentless back-and-forth between lab experiments and computation that enabled them to build this lightweight model. Scientists around the world can now start using this tool for reading these genetic instructions, creating leads for new cancer diagnostics, patient stratification, and future therapies.
“The ...
Scientists ID potential treatment for deadliest brain cancer
2026-02-04
UVA Comprehensive Cancer Center scientists have identified a molecule that blocks the gene responsible for glioblastoma, raising hopes that the molecule could become a much-needed new treatment for the deadliest brain cancer.
Researcher Hui Li, PhD, previously discovered the “oncogene” responsible for glioblastoma, a cancer for which there are no treatments that extend life for more than a few months. In his follow-up work, published in Science Translational Medicine, Li reports the identification of a small molecule that blocked the gene’s activity in both cell samples and lab mice. In mice, ...
If you want to feel gratitude in your life, embrace nostalgia, VCU research finds
2026-02-04
Did you skip your last high school reunion? If so, you may want to reconsider when the next anniversary rolls around. The experience could lead to increased feelings of gratitude, according to a new study led by Jeffrey Green, Ph.D., a professor of psychology in Virginia Commonwealth University’s College of Humanities and Sciences.
That’s because engaging in nostalgic experiences – or even just listening to nostalgic music, or drifting into a nostalgic reverie – can strengthen feelings of social connection, ...
Malaria: Newly identified “crown” stage controls parasite reproduction
2026-02-04
Researchers studying the malaria parasite Plasmodium falciparum have discovered a previously unknown stage in its life cycle that appears to be crucial for reproduction. This is important because malaria depends on the parasite’s rapid ability to multiply inside the human body, so stopping its reproduction could help prevent severe disease and save lives. Using a new live-imaging method, the team found that before the parasite can divide, a key structure inside the cell must reshape into a “Crown” form and connect to the cell’s nucleus. This ...
SwRI appoints Fuselier vice president of Space Science Division
2026-02-04
SAN ANTONIO — February 4, 2026 — Dr. Stephen Fuselier has been appointed as vice president of the Space Science Division of Southwest Research Institute (SwRI). A noted heliophysicist, Fuselier recently served as the co-chair of the National Academies of Solar and Space Physics Decadal Survey. NASA uses its recommendations to identify and prioritize the scientific questions and necessary observations required to answer them over the next 10 years.
“I’m thrilled to lead SwRI’s Space Science Division,” Fuselier said. ...
What's the ROI on R&D in aging? New simulation tool, silverlingings.bio, explores geroscience's impact on US GDP growth and individual health
2026-02-04
New York, NY — The American Federation for Aging Research (AFAR) is pleased to announce the release of silverlinings.bio, an interactive report and simulation tool developed by AFAR Scholar-in-Residence Raiany Romanni-Klein, PhD, with support from AFAR, the Amaranth Foundation, and the Methuselah Foundation.
Dr. Romanni-Klein spent the last two years working with a team of economists from Harvard, the Abundance Institute, and the University of Southern Carolina to develop an interactive simulation tool with returns on investments (ROI) for specific research & development (R&D) advancements in aging science ...
CFC replacements behind hundreds of thousands of tonnes of global ‘forever chemical’ pollution
2026-02-04
Chemicals brought in to help protect our ozone layer have had the unintended consequences of spreading vast quantities of a potentially toxic ‘forever chemical’ around the globe, a new study shows.
Atmospheric scientists, led by researchers at Lancaster University, have for the first time calculated that CFC replacement chemicals and anaesthetics are behind around a third of a million tonnes (335,500 tonnes) of a persistent forever chemical called trifluoroacetic acid (TFA) being deposited from the atmosphere across the Earth’s surface between the years 2000 and 2022.
And the rate ...
Pigs and grizzlies, not monkeys, hold clues to youthful human skin
2026-02-04
PULLMAN, Wash. — The secret to youthful appearance and repairing scars may lie in a microscopic skin structure humans share with pigs and grizzly bears — but, surprisingly, not monkeys.
While it had been thought these ridge and valley-like skin microstructures — called rete ridges — form during fetal growth, researchers at Washington State University’s College of Veterinary Medicine found they actually develop shortly after birth and identified a key molecular signal that drives their development.
The findings, published in the journal Nature, could lead to new therapies designed to reverse or slow skin aging and improve wound and scar ...
Innovative card deck by Case Western Reserve professor empowers kids to tackle stress head-on
2026-02-04
CLEVELAND—A Case Western Reserve University professor has developed an innovative card deck designed to help children manage stress and build emotional resilience in today’s challenging world.
Following the COVID-19 pandemic—and amid ongoing global and societal stressors—Jennifer King, an associate professor and co-director of the Center on Trauma and Adversity at the Jack, Joseph and Morton Mandel School of Applied Social Sciences at Case Western Reserve, realized that people needed to know the fundamentals of stress management. She created “Take a Break” micropractice cards in 2022 to help people relax, ...