(Press-News.org) PULLMAN, Wash. — Again and again, Washington State University professor Mesut Cicek and his colleagues fed hypotheses from scientific papers into ChatGPT and asked it to determine whether the statements had been upheld by research — whether they were true or false.
They did this with more than 700 hypotheses, repeating each query 10 times.
AI answered correctly 76.5% of the time when the experiment was run in 2024. When it was repeated in 2025, the accuracy improved to 80%. When accounting for random guessing, however, AI was only about 60% better than chance — closer to a low D than to high reliability.
It struggled most to identify hypotheses as false, getting those answers correct just 16.4% of the time. Furthermore, ChatGPT was inconsistent: Across 10 identical prompts, it consistently estimated only 73% of the statements accurately.
“We're not just talking about accuracy, we're talking about inconsistency, because if you ask
the same question again and again, you come up with different answers,” said Cicek, an associate professor in the Department of Marketing and International Business in WSU’s Carson College of Business and lead author of the new publication.
“We used 10 prompts with the same exact question. Everything was identical. It would answer true. Next, it says it’s false. It’s true, it’s false, false, true. There were several cases where there were five true, five false.”
The findings, published in the Rutgers Business Review, reinforce the need to apply skepticism and caution when using AI for critical tasks, especially those that involve nuance or complicated reasoning. They show that the Generative AI’s linguistic fluency is not yet matched by conceptual intelligence, and suggest the much-touted arrival of artificial general intelligence that can truly “think” is farther off than some are predicting, Cicek said.
“Current AI tools don't understand the world the way we do — they don't have a ‘brain,’” Cicek said. “They just memorize, and they can give you some insight, but they don't understand what they’re talking about.”
Cicek’s co-authors were Sevincgul Ulu of Southern Illinois University, Can Uslay of Rutgers University, and Kate Karniouchina of Northeastern University.
The researchers used 719 hypotheses from scientific papers published in business journals since 2021 to challenge the ability of the free, commonly available generative AI tools to answer questions that involve nuance and complexity. Whether research supports a given hypothesis is often a complicated question, with different factors that may qualify or balance the findings. Boiling it down to a simple true-or-false answer requires reasoning.
Cicek and his colleagues ran the experiment with the free version of ChatGPT-3.5 in 2024, and the free, updated ChatGPT-5 mini in 2025. Overall, the accuracy remained similar between the versions. When these responses were adjusted for random chance—the fact that a wild guess has a 50% likelihood of being correct—the accuracy was just 60% better than random chance in both years.
The results highlight a key gap in large language model AI tools: while they can produce fluent, convincing language, their ability to reason through complex questions often falls short, sometimes leading them to deliver persuasive explanations for incorrect answers, Cicek said.
The researchers concluded that business managers should emphasize the need to verify AI results, treat them with skepticism and provide training in what AI can, and can’t, do well.
In the current paper, Cicek focused only on results with ChatGPT, but he has run similar tests with other AI tools and found comparable results. The study also builds upon past work of Cicek’s that raises reasons to be cautious about AI hype. A paper published in 2024 reported results of a national survey that found consumers were less likely to want to buy products when they were marketed with an emphasis on AI.
“Always be skeptical,” he said. “I'm not against AI. I’m using it. But you need to be very careful.”
END
AI gets a D: Study shows inaccuracies, inconsistency in ChatGPT answers
2026-03-16
ELSE PRESS RELEASES FROM THIS DATE:
FAU researchers find concerning rise in US teen obesity over a decade
2026-03-16
Nearly 1 in 5 teens in the United States is obese, putting their long-term health at serious risk. Obesity in adolescence leads to many deleterious medical conditions including diabetes, high blood pressure, sleep apnea, and mental health struggles with low self-esteem and depression.
Understanding patterns of obesity and weight-loss efforts in U.S. adolescents is critical for shaping effective clinical and public health interventions. Yet, data remain sparse on whether and how adolescents attempt to lose weight.
To explore these issues, researchers from Florida Atlantic University’s Charles E. Schmidt ...
New study offers insight into tissue-specific gene regulation of sheep
2026-03-16
PULLMAN, Wash. — Livestock breeders could soon have more tools to improve the health and quality of their animals, thanks to a recent study that sheds new light on regulatory elements in the sheep genome.
Previous research demonstrated that several areas of the genome, regardless of species, are responsible for modulating or regulating gene expression. This study, the first of its kind on sheep, resulted in a detailed map that illuminates more specifically where those gene promoters and enhancers are located. The findings could help livestock breeders select for beneficial traits such as efficient food digestion or muscle development, while avoiding traits associated with disease.
“A ...
Researchers find low response rate by clinicians to elevated levels of Lp(a)
2026-03-16
(Boston)—Elevated Lipoprotein(a) [Lp(a)] is an independent, genetically determined risk factor for atherosclerotic cardiovascular disease (ASCVD), with levels >50 mg/dL affecting 20–30% of the global population. Despite therapeutic limitations, interest in Lp(a) has increased, driven by its prognostic value and the emergence of targeted therapies. However, with increasing guideline-directed Lp(a) testing, clinician response to elevated concentrations, especially in the absence of guideline-based treatment indications, remains unclear.
In a new study and presentation at the American College of ...
Jeonbuk National University researchers develop clustering-based framework for water level forecasting
2026-03-16
Reliable and scalable water level prediction is crucial in hydrology for effective water resources management, especially when considering challenges owing to climate change, urbanization, improper land use, and high-water demand. It directly impacts the availability and distribution of freshwater in rivers and reservoirs. Therefore, accurate forecasting via early warning systems is a highly useful technique for flood mitigation, agricultural irrigation, ecosystem and environmental sustainability, and numerous other applications. In this regard, physically-based hydrodynamic river models can be used. However, these tools require enormous amounts of data, making them less useful in data-scarce ...
Reduced air pollution from climate mitigation could boost crop yields and lower hunger risk
2026-03-16
Climate change threatens global food security; however, climate mitigation policies may increase hunger risk by driving competition for land through bioenergy production and afforestation. Based on simulations from six global agroeconomic models, researchers from The University of Tokyo, Ritsumeikan University, Kyoto University, National Institute for Environmental Studies, and E-Konzal Co. Ltd, together with collaborators from other countries, report that the ozone reduction benefits of climate mitigation ...
Scientists reveal a new class of molten planet
2026-03-16
UNDER EMBARGO UNTIL 10:00 GMT / 6:00 ET MONDAY 16 MARCH 2026
A study led by the University of Oxford has identified a new type of planet beyond our Solar System – one that stores large amounts of sulphur deep within a permanent ocean of magma. The findings have been published today (16 March) in Nature Astronomy.
The exoplanet (a planet that orbits a star outside the Solar System), known as L 98-59 d, orbits a small red star about 35 light-years from Earth. Recent observations from the James Webb Space Telescope ...
Plastic bottles transformed into Parkinson’s drug using bacteria
2026-03-16
A drug to treat Parkinson’s disease can be made from waste plastic bottles using a pioneering method, a study shows.
The approach harnesses the power of bacteria to transform post-consumer plastic into L-DOPA, a frontline medication for the neurological disorder.
It is the first time a natural, biological process has been engineered to turn plastic waste into a therapeutic for a neurological disease, researchers say.
Scientists at the University of Edinburgh engineered E. coli bacteria to turn a type of plastic used widely in food and drink ...
New alliance clinical trial aims to improve outcomes in brain tumors
2026-03-16
A new clinical trial led by the Alliance for Clinical Trials in Oncology will investigate if a combination of drug therapies after radiation therapy improves outcomes for people with newly-diagnosed, grade 3 IDH-mutant astrocytoma, a type of brain cancer. Supported in part by a grant from the National Cancer Institute, the study (Alliance A072301) will look at whether adding the oral medication vorasidenib to the standard oral chemotherapy can help keep the cancer from coming back after radiation.
People diagnosed with IDH-mutant, grade 3 astrocytoma usually receive surgery, followed by radiation and temozolomide, an oral chemotherapy pill. Temozolomide works by damaging the DNA of tumor ...
Intensive therapy approaches benefit infants and toddlers with cerebral palsy
2026-03-16
Infants and toddlers with unilateral cerebral palsy, which affects the brain’s control of muscles on one side of the body, show lasting improvements in hand and arm function when they receive early, high-dose therapy, according to a new multisite clinical trial led by Virginia Tech researchers at the Fralin Biomedical Research Institute at VTC.
The Baby CHAMP study — short for Children with Hemiparesis Arm-and-Hand Movement Project — directly compared three therapist-delivered interventions: two forms of constraint-induced movement therapy, which limit the stronger arm to encourage use of the weaker one when ...
National Poll: 1 in 3 parents fear their teen or young adult could cause a crash
2026-03-16
ANN ARBOR, Mich. – Motor vehicle crashes remain a leading cause of death for teens and young adults, yet many families may underestimate the risks close to home, suggests a new national poll.
One in three parents worry their teen or young adult driver could cause an accident, according to the University of Michigan Health C.S. Mott Children’s Hospital National Poll on Children’s Health.
Yet, nearly all parents believe their child drives as well as or better than other young drivers and relatively few said they imposed consequences for their teen’s unsafe driving behaviors.
“Our report suggests a ...