(Press-News.org) A team of Brigham researchers analyzed GPT-4’s performance in four clinical decision support scenarios: generating clinical vignettes, diagnostic reasoning, clinical plan generation and subjective patient assessments.
When prompted to generate clinical vignettes for medical education, GPT-4 failed to model the demographic diversity of medical conditions, exaggerating known demographic prevalence differences in 89% of diseases.
When evaluating patient perception, GPT-4 produced significantly different responses by gender or race/ethnicity for 23% of cases.
Large language models (LLMs) like ChatGPT and GPT-4 have the potential to assist in clinical practice to automate administrative tasks, draft clinical notes, communicate with patients, and even support clinical decision making. However, preliminary studies suggest the models can encode and perpetuate social biases that could adversely affect historically marginalized groups. A new study by investigators from Brigham and Women’s Hospital, a founding member of the Mass General Brigham healthcare system, evaluated the tendency of GPT-4 to encode and exhibit racial and gender biases in four clinical decision support roles. Their results are published in The Lancet Digital Health.
“While most of the focus is on using LLMs for documentation or administrative tasks, there is also excitement about the potential to use LLMs to support clinical decision making,” said corresponding author Emily Alsentzer, PhD, a postdoctoral researcher in the Division of General Internal Medicine at Brigham and Women's Hospital. “We wanted to systematically assess whether GPT-4 encodes racial and gender biases that impact its ability to support clinical decision making."
Alsentzer and colleagues tested four applications of GPT-4 using the Azure OpenAI platform. First, they prompted GPT-4 to generate patient vignettes that can be used in medical education. Next, they tested GPT-4's ability to correctly develop a differential diagnosis and treatment plan for 19 different patient cases from a NEJM Healer, a medical education tool that presents challenging clinical cases to medical trainees. Finally, they assessed how GPT-4 makes inferences about a patient’s clinical presentation using eight case vignettes that were originally generated to measure implicit bias. For each application, the authors assessed whether GPT-4’s outputs were biased by race or gender.
For the medical education task, the researchers constructed ten prompts that required GPT-4 to generate a patient presentation for a supplied diagnosis. They ran each prompt 100 times and found that GPT-4 exaggerated known differences in disease prevalence by demographic group.
"One striking example is when GPT-4 is prompted to generate a vignette for a patient with sarcoidosis: GPT-4 describes a Black woman 81% of the time," Alsentzer explains. "While sarcoidosis is more prevalent in Black patients and in women, it’s not 81% of all patients."
Next, when GPT-4 was prompted to develop a list of 10 possible diagnoses for the NEJM Healer cases, changing the gender or race/ethnicity of the patient significantly affected its ability to prioritize the correct top diagnosis in 37% of cases.
"In some cases, GPT-4’s decision making reflects known gender and racial biases in the literature," Alsentzer said. "In the case of pulmonary embolism, the model ranked panic attack/anxiety as a more likely diagnosis for women than men. It also ranked sexually transmitted diseases, such as acute HIV and syphilis, as more likely for patients from racial minority backgrounds compared to white patients."
When asked to evaluate subjective patient traits such as honesty, understanding, and pain tolerance, GPT-4 produced significantly different responses by race, ethnicity, and gender for 23% of the questions. For example, GPT-4 was significantly more likely to rate Black male patients as abusing the opioid Percocet than Asian, Black, Hispanic, and white female patients when the answers should have been identical for all the simulated patient cases.
Limitations of the current study include testing GPT-4's responses using a limited number of simulated prompts and analyzing model performance using only a few traditional categories of demographic identities. Future work should investigate biases using clinical notes from the electronic health record.
"While LLM-based tools are currently being deployed with a clinician in the loop to verify the model’s outputs, it is very challenging for clinicians to detect systemic biases when viewing individual patient cases," Alsentzer said. “It is critical that we perform bias evaluations for each intended use of LLMs, just as we do for other machine learning models in the medical domain. Our work can help start a conversation about GPT-4’s potential to propagate bias in clinical decision support applications.”
Authorship: Additional BWH authors include Jorge A Rodriguez, David W Bates, and Raja-Elie E Abdulnour. Additional authors include Travis Zack, Eric Lehman, Mirac Suzgun, Leo Anthony Celi, Judy Gichoya, Dan Jurafsky, Peter Szolovits, and Atul J Butte.
Disclosures: Alsentzer reports personal fees from Canopy Innovations, Fourier Health, and Xyla; and grants from Microsoft Research. Abdulnour is an employee of Massachusetts Medical Society, which owns NEJM Healer (NEJM Healer cases were used in the study). Additional author disclosures can be found in the paper.
Funding: T32 NCI Hematology/Oncology Training Fellowship; Open Philanthropy and the National Science Foundation (IIS-2128145); and a philanthropic gift from Priscilla Chan and Mark Zuckerberg.
Paper cited: Zack, T; Lehman, E et al. “Assessing the potential of GPT-4 to perpetuate racial and gender biases in healthcare: a model evaluation study” The Lancet Digital Health DOI: 10.1016/S2589-7500(23)00225-X
END
Study assesses GPT-4’s potential to perpetuate racial, gender biases in clinical decision making
A team of Brigham researchers analyzed GPT-4’s performance in four clinical decision support scenarios: generating clinical vignettes, diagnostic reasoning, clinical plan generation and subjective patient assessments.
2023-12-18
ELSE PRESS RELEASES FROM THIS DATE:
Apes remember friends they haven’t seen for decades
2023-12-18
Apes recognize photos of groupmates they haven’t seen for more than 25 years and respond even more enthusiastically to pictures of their friends, a new study finds.
The work, which demonstrates the longest-lasting social memory ever documented outside of humans, and underscores how human culture evolved from the common ancestors we share with apes, our closest relatives, was published today in the journal Proceedings of the National Academy of Sciences.
“Chimpanzees and bonobos recognize individuals even though they haven’t seen them for ...
Scientists might be using a flawed strategy to predict how species will fare under climate change
2023-12-18
EMBARGO LIFTS DEC. 18, 2023, AT 3:00 PM U.S. EASTERN TIME
As the world heats up, and the climate shifts, life will migrate, adapt or go extinct. For decades, scientists have deployed a specific method to predict how a species will fare during this time of great change. But according to new research, that method might be producing results that are misleading or wrong.
University of Arizona researchers and their team members at the U.S. Forest Service and Brown University found that the method – commonly referred to as space-for-time substitution – failed to accurately predict how a widespread tree of the Western U.S. called the ...
Mesopotamian bricks unveil the strength of Earth’s ancient magnetic field
2023-12-18
Ancient bricks inscribed with the names of Mesopotamian kings have yielded important insights into a mysterious anomaly in Earth’s magnetic field 3,000 years ago, according to a new study involving UCL researchers.
The research, published in the Proceedings of the National Academy of Sciences (PNAS), describes how changes in the Earth’s magnetic field imprinted on iron oxide grains within ancient clay bricks, and how scientists were able to reconstruct these changes from the names of the kings inscribed on the bricks.
The team hopes that using this “archaeomagnetism,” which looks for signatures ...
Move over dolphins. Chimps and bonobos can recognize long-lost friends and family — for decades
2023-12-18
Researchers led by a University of California, Berkeley, comparative psychologist have found that great apes and chimpanzees, our closest living relatives, can recognize groupmates they haven't seen in over two decades — evidence of what’s believed to be the longest-lasting nonhuman memory ever recorded.
The findings also bolster the theory that long-term memory in humans, chimpanzees and bonobos likely comes from our shared common ancestor that lived between 6 million and 9 million years ago.
The team used infrared eye-tracking cameras to record where bonobos and chimps gazed when they were shown side-by-side images of other bonobos ...
First observation of how water molecules move near a metal electrode
2023-12-18
A collaborative team of experimental and computational physical chemists from South Korea and the United States have made an important discovery in the field of electrochemistry, shedding light on the movement of water molecules near metal electrodes. This research holds profound implications for the advancement of next-generation batteries utilizing aqueous electrolytes.
In the nanoscale realm, chemists typically utilize laser light to illuminate molecules and measure spectroscopic properties to visualize molecules. However, studying the behavior of ...
Harnessing nanotechnology to understand tumor behavior
2023-12-18
A study conducted by pre-PhD researcher Pablo S. Valera and recently published in PNAS demonstrates the potential of surface-enhanced Raman spectroscopy (SERS) to explore metabolites secreted by cancer cells in cancer research. The study, which has been led by Ikerbasque Research Professors Luis Liz-Marzán (from CIC biomaGUNE) and Arkaitz Carracedo (of CIC bioGUNE) and in which other researchers from both centers, also members of the Networking Biomedical Research Centre (CIBER), have participated as well, provides valuable information to guide more specific experiments to reveal ...
Exercise-induced Pgc-1α expression inhibits fat accumulation in aged skeletal muscles
2023-12-18
Myosteatosis, or aging-related fat accumulation in skeletal muscles, is a leading cause of declines in muscle strength and quality of life in elderly adults.
Older adults who are sedentary and develop accumulated fat in the skeletal muscle are often prescribed exercise by their doctors to combat the condition. If scientists were to develop a new therapy, such as medications, to combat myosteatosis, they would need to replicate the mechanism by which exercise might reduce fat accumulation in muscles.
Fibro-adipogenic ...
NASA’s Webb rings in holidays with ringed planet Uranus
2023-12-18
NASA’s James Webb Space Telescope recently trained its sights on unusual and enigmatic Uranus, an ice giant that spins on its side. Webb captured this dynamic world with rings, moons, storms, and other atmospheric features – including a seasonal polar cap. The image expands upon a two-color version released earlier this year, adding additional wavelength coverage for a more detailed look.
With its exquisite sensitivity, Webb captured Uranus’ dim inner and outer rings, including the ...
Memory research: Breathing in sleep impacts memory processes
2023-12-18
How are memories consolidated during sleep? In 2021, researchers led by Dr. Thomas Schreiner, leader of the Emmy Noether junior research group at LMU’s Department of Psychology, had already shown there was a direct relationship between the emergence of certain sleep-related brain activity patterns and the reactivation of memory contents during sleep. However, it was still unclear whether these rhythms are orchestrated by a central pacemaker. So the researchers joined up with scientists from the Max Planck Institute for Human Development in Berlin and the University of Oxford to reanalyze the data. Their results have identified ...
Alexander Zholents recognized with 2023 Dieter Möhl Award
2023-12-18
Zholents was honored for his work on the theory of optical stochastic cooling.
Alexander Zholents, a senior physicist at the U.S. Department of Energy’s (DOE) Argonne National Laboratory and distinguished fellow in the Accelerator Systems division is one of the recipients of this year’s Dieter Möhl Award.
The award is presented by CERN, the European laboratory for particle physics. It is in tribute to the late Dieter Möhl, a pioneer in the realm of particle beam cooling. The awards celebrate both early career and lifetime achievements in the field of beam cooling and its applications.
“I am deeply honored to receive this award,” said Zholents. “The ...
LAST 30 PRESS RELEASES:
Fatty muscles raise the risk of serious heart disease regardless of overall body weight
HKU ecologists uncover significant ecological impact of hybrid grouper release through religious practices
New register opens to crown Champion Trees across the U.S.
A unified approach to health data exchange
New superconductor with hallmark of unconventional superconductivity discovered
Global HIV study finds that cardiovascular risk models underestimate for key populations
New study offers insights into how populations conform or go against the crowd
Development of a high-performance AI device utilizing ion-controlled spin wave interference in magnetic materials
WashU researchers map individual brain dynamics
Technology for oxidizing atmospheric methane won’t help the climate
US Department of Energy announces Early Career Research Program for FY 2025
PECASE winners: 3 UVA engineering professors receive presidential early career awards
‘Turn on the lights’: DAVD display helps navy divers navigate undersea conditions
MSU researcher’s breakthrough model sheds light on solar storms and space weather
Nebraska psychology professor recognized with Presidential Early Career Award
New data shows how ‘rage giving’ boosted immigrant-serving nonprofits during the first Trump Administration
Unique characteristics of a rare liver cancer identified as clinical trial of new treatment begins
From lab to field: CABBI pipeline delivers oil-rich sorghum
Stem cell therapy jumpstarts brain recovery after stroke
Polymer editing can upcycle waste into higher-performance plastics
Research on past hurricanes aims to reduce future risk
UT Health San Antonio, UTSA researchers receive prestigious 2025 Hill Prizes for medicine and technology
Panorama of our nearest galactic neighbor unveils hundreds of millions of stars
A chain reaction: HIV vaccines can lead to antibodies against antibodies
Bacteria in polymers form cables that grow into living gels
Rotavirus protein NSP4 manipulates gastrointestinal disease severity
‘Ding-dong:’ A study finds specific neurons with an immune doorbell
A major advance in biology combines DNA and RNA and could revolutionize cancer treatments
Neutrophil elastase as a predictor of delivery in pregnant women with preterm labor
NIH to lead implementation of National Plan to End Parkinson’s Act
[Press-News.org] Study assesses GPT-4’s potential to perpetuate racial, gender biases in clinical decision makingA team of Brigham researchers analyzed GPT-4’s performance in four clinical decision support scenarios: generating clinical vignettes, diagnostic reasoning, clinical plan generation and subjective patient assessments.