(Press-News.org) Researchers at the National Institutes of Health (NIH) found that an artificial intelligence (AI) model solved medical quiz questions—designed to test health professionals’ ability to diagnose patients based on clinical images and a brief text summary—with high accuracy. However, physician-graders found the AI model made mistakes when describing images and explaining how its decision-making led to the correct answer. The findings, which shed light on AI’s potential in the clinical setting, were published in npj Digital Medicine. The study was led by researchers from NIH’s National Library of Medicine (NLM) and Weill Cornell Medicine, New York City.
“Integration of AI into health care holds great promise as a tool to help medical professionals diagnose patients faster, allowing them to start treatment sooner,” said NLM Acting Director, Stephen Sherry, Ph.D. “However, as this study shows, AI is not advanced enough yet to replace human experience, which is crucial for accurate diagnosis.”
The AI model and human physicians answered questions from the New England Journal of Medicine (NEJM)’s Image Challenge. The challenge is an online quiz that provides real clinical images and a short text description that includes details about the patient’s symptoms and presentation, then asks users to choose the correct diagnosis from multiple-choice answers.
The researchers tasked the AI model to answer 207 image challenge questions and provide a written rationale to justify each answer. The prompt specified that the rationale should include a description of the image, a summary of relevant medical knowledge, and provide step-by-step reasoning for how the model chose the answer.
Nine physicians from various institutions were recruited, each with a different medical specialty, and answered their assigned questions first in a “closed-book” setting, (without referring to any external materials such as online resources) and then in an “open-book” setting (using external resources). The researchers then provided the physicians with the correct answer, along with the AI model’s answer and corresponding rationale. Finally, the physicians were asked to score the AI model’s ability to describe the image, summarize relevant medical knowledge, and provide its step-by-step reasoning.
The researchers found that the AI model and physicians scored highly in selecting the correct diagnosis. Interestingly, the AI model selected the correct diagnosis more often than physicians in closed-book settings, while physicians with open-book tools performed better than the AI model, especially when answering the questions ranked most difficult.
Importantly, based on physician evaluations, the AI model often made mistakes when describing the medical image and explaining its reasoning behind the diagnosis—even in cases where it made the correct final choice. In one example, the AI model was provided with a photo of a patient’s arm with two lesions. A physician would easily recognize that both lesions were caused by the same condition. However, because the lesions were presented at different angles—causing the illusion of different colors and shapes—the AI model failed to recognize that both lesions could be related to the same diagnosis.
The researchers argue that these findings underpin the importance of evaluating multi-modal AI technology further before introducing it into the clinical setting.
"This technology has the potential to help clinicians augment their capabilities with data-driven insights that may lead to improved clinical decision-making,” said NLM Senior Investigator and corresponding author of the study, Zhiyong Lu, Ph.D. “Understanding the risks and limitations of this technology is essential to harnessing its potential in medicine.”
The study used an AI model known as GPT-4V (Generative Pre-trained Transformer 4 with Vision), which is a ‘multimodal AI model’ that can process combinations of multiple types of data, including text and images. The researchers note that while this is a small study, it sheds light on multi-modal AI’s potential to aid physicians’ medical decision-making. More research is needed to understand how such models compare to physicians’ ability to diagnose patients.
The study was co-authored by collaborators from NIH’s National Eye Institute and the NIH Clinical Center; the University of Pittsburgh; UT Southwestern Medical Center, Dallas; New York University Grossman School of Medicine, New York City; Harvard Medical School and Massachusetts General Hospital, Boston; Case Western Reserve University School of Medicine, Cleveland; University of California San Diego, La Jolla; and the University of Arkansas, Little Rock.
The National Library of Medicine (NLM) is a leader in research in biomedical informatics and data science and the world’s largest biomedical library. NLM conducts and supports research in methods for recording, storing, retrieving, preserving, and communicating health information. NLM creates resources and tools that are used billions of times each year by millions of people to access and analyze molecular biology, biotechnology, toxicology, environmental health, and health services information. Additional information is available at www.nlm.nih.gov.
END
NIH findings shed light on risks and benefits of integrating AI into medical decision-making
AI model scored well on medical diagnostic quiz, but made mistakes explaining answers
2024-07-23
ELSE PRESS RELEASES FROM THIS DATE:
Expiring medications could pose challenge on long space missions
2024-07-23
DURHAM, N.C. -- Medications used by astronauts on the International Space Station might not be good enough for a three-year journey to Mars.
A new study led by Duke Health shows that over half of the medicines stocked in space -- staples such as pain relievers, antibiotics, allergy medicines, and sleep aids -- would expire before astronauts could return to Earth.
Astronauts could end up relying on ineffective or even harmful drugs, according to the study appearing July 23 in npj Microgravity, a Nature journal.
“It doesn’t necessarily mean ...
Study of urban moss raises concerns about lead levels in older Portland neighborhoods
2024-07-23
CORVALLIS, Ore. – Lead levels in moss are as much as 600 times higher in older Portland, Oregon, neighborhoods where lead-sheathed telecommunications cables were once used compared to lead levels in nearby rural areas, a new study of urban moss has found.
The findings raise concerns about lead exposure in pre-1960 neighborhoods where the cables were common and in some cases are still in place even though they are no longer in use, said Alyssa Shiel, an environmental geochemist at Oregon State University, and the study’s ...
Preclinical model offers new insights into Parkinson’s disease process
2024-07-23
A new preclinical model offers a unique platform for studying the Parkinson’s disease process and suggests a relatively easy method for detecting the disease in people, according to a new study led by Weill Cornell Medicine researchers.
In the study, published July 23 in Nature Communications, the researchers showed that knocking out a key component involved in protein transportation in the light-sensing rod cells of mice leads to the retinal accumulation of the aggregates of a protein called alpha-synuclein found in patients with Parkinson’s disease.
“This is a really unique model involving a pathology that seems more like human Parkinson’s than what we see in ...
New rapid method for determining virus infectivity
2024-07-23
A new method that can rapidly determine whether a virus is infectious or non-infectious could revolutionise the response to future pandemics.
Called FAIRY (Fluorescence Assay for vIRal IntegritY), the assay can screen viruses against virucidal antivirals in minutes, allowing for the effectiveness of antiviral measures, such as disinfectants that break the chain of infection, to be quickly determined.
Dr Samuel Jones from Birmingham’s School of Chemistry led the research team that developed the FAIRY assay. ...
HIV vaccines tested in PrEPVacc fail to reduce infections
2024-07-23
The results of the PrEPVacc HIV vaccine trial conducted in Eastern and Southern Africa, which ran between 2020 and 2024, show conclusively that neither of the two experimental vaccine regimens tested reduced HIV infections among the study population.
Vaccinations in the PrEPVacc trial were stopped in November 2023 (and publicly announced in December 2023) when it became clear to independent experts monitoring the study data that there was little or no chance of the vaccines demonstrating efficacy in preventing HIV acquisition.
The PrEPVacc ...
Study by TU Graz shows that abrasion emissions from trains are not negligible
2024-07-23
In addition to exhaust emissions, abrasion emissions from tyres and brakes have become increasingly important when assessing the environmental impact of traffic. However, the focus here was on road vehicles; rail was hardly considered. In a study commissioned by the German Centre for Rail Transport Research (DZSF), researchers from the Institute of Thermodynamics and Sustainable Propulsion Systems at Graz University of Technology (TU Graz) have now been able to prove that so-called non-exhaust emissions from rail transport also have a relevant influence on air quality and soil pollution.
Half of the daily particulate matter limit due to trains ...
Heat-sensitive trees move uphill seeking climate change respite
2024-07-23
Trees in the Brazilian Atlantic Forest are migrating in search of more favourable temperatures with species in mountain forests moving uphill to escape rising heat caused by climate change, a new study reveals.
Most species in higher parts of the Brazilian Atlantic Forest are moving upwards as temperatures rise, but scientists say that those trees which thrive in colder temperatures are at risk of dying out as the world continues to warm.
Researchers studying the forest, which stretches along the Brazil’s Atlantic seaboard, have also discovered that some trees in ...
Arm robots are not the answer for stroke rehabilitation
2024-07-23
Commercial arm robots are increasingly deployed in order to aid stroke patients in their recovery. Around 80% of patients have problems with their arm function. Robots are also seen as a solution for financial, and staffing, shortcomings in the healthcare sector. However, research led by Amsterdam UMC now shows that they offer no clinically meaningful effects for patients. The research is published today in Neurology.
"In particular countries such as China, Japan and South Korea, but also in North America and Europe, are UL-Robots seen more ...
Staying hip to orthopedic advances: Comparing traditional and new hip replacement stems
2024-07-23
Osaka, Japan — Needing a hip replacement is unfortunate, but even more unfortunate is to need to do it again.
Surgeons at Osaka Metropolitan University have provided new insights into the performance of two types of stems used in total hip replacement surgery. Their findings are expected to contribute to the enhancement of long-term outcomes, improving patients’ quality of life and reducing the need for revision surgeries.
Their paper was published in The Bone & Joint Journal on June 1.
The hip joint, which connects the femur, or thighbone, to the pelvis, plays a crucial ...
Brain care score for dementia and stroke also predicts late-life depression
2024-07-23
Late-life depression, typically defined as depression with onset in individuals over 60 years of age, can affect up to a third or more of people older than 60 and can be debilitating. But, like other neurological conditions, an individual’s risk may be influenced by lifestyle choices. Researchers from Mass General Brigham previously developed and validated the Brain Care Score (BCS) for helping patients and clinicians identify lifestyle changes that may reduce their risk of dementia and stroke. Now, with collaborators at Yale University, they have shown that a higher BCS is also associated with a ...
LAST 30 PRESS RELEASES:
Improving immunotherapies for kidney cancer
Billing patients for portal messages could decrease message volume and ease physician workload
Study of Sherpas highlights key role of kidneys in acclimatization to high altitudes
Smartphone app can help reduce opioid use and keep patients in treatment, UT Health San Antonio study shows
Improved health care value cannot be achieved by hospital mergers and acquisitions alone
People who are immunocompromised may not produce enough protective antibodies against RSV after vaccination
Does coffee prevent head and neck cancer?
AI replaces humans in identifying causes of fuel cell malfunctions
Pitfalls of FDA-approved germline cancer predisposition tests
A rising trend of 'murderous verbs' in movies over 50 years
Brain structure differences are associated with early use of substances among adolescents
Pain coping skills training for patients receiving hemodialysis
Trends of violence in movies during the past half century
Major depressive disorder and driving behavior among older adults
John Howington, MD, MBA, FCCP, to become the 87th President of the American College of Chest Physicians
Preclinical study finds surges in estrogen promote binge drinking in females
Coming AI economy will sell your decisions before you take them, researchers warn
NASA’s Parker Solar Probe makes history with closest pass to Sun
Are we ready for the ethical challenges of AI and robots?
Nanotechnology: Light enables an "impossibile" molecular fit
Estimated vaccine effectiveness for pediatric patients with severe influenza
Changes to the US preventive services task force screening guidelines and incidence of breast cancer
Urgent action needed to protect the Parma wallaby
Societal inequality linked to reduced brain health in aging and dementia
Singles differ in personality traits and life satisfaction compared to partnered people
President Biden signs bipartisan HEARTS Act into law
Advanced DNA storage: Cheng Zhang and Long Qian’s team introduce epi-bit method in Nature
New hope for male infertility: PKU researchers discover key mechanism in Klinefelter syndrome
Room-temperature non-volatile optical manipulation of polar order in a charge density wave
Coupled decline in ocean pH and carbonate saturation during the Palaeocene–Eocene Thermal Maximum
[Press-News.org] NIH findings shed light on risks and benefits of integrating AI into medical decision-makingAI model scored well on medical diagnostic quiz, but made mistakes explaining answers