PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Study assesses GPT-4’s potential to perpetuate racial, gender biases in clinical decision making

A team of Brigham researchers analyzed GPT-4’s performance in four clinical decision support scenarios: generating clinical vignettes, diagnostic reasoning, clinical plan generation and subjective patient assessments.

2023-12-18
(Press-News.org) A team of Brigham researchers analyzed GPT-4’s performance in four clinical decision support scenarios: generating clinical vignettes, diagnostic reasoning, clinical plan generation and subjective patient assessments. 

When prompted to generate clinical vignettes for medical education, GPT-4 failed to model the demographic diversity of medical conditions, exaggerating known demographic prevalence differences in 89% of diseases. 

When evaluating patient perception, GPT-4 produced significantly different responses by gender or race/ethnicity for 23% of cases. 

Large language models (LLMs) like ChatGPT and GPT-4 have the potential to assist in clinical practice to automate administrative tasks, draft clinical notes, communicate with patients, and even support clinical decision making. However, preliminary studies suggest the models can encode and perpetuate social biases that could adversely affect historically marginalized groups. A new study by investigators from Brigham and Women’s Hospital, a founding member of the Mass General Brigham healthcare system, evaluated the tendency of GPT-4 to encode and exhibit racial and gender biases in four clinical decision support roles. Their results are published in The Lancet Digital Health. 

“While most of the focus is on using LLMs for documentation or administrative tasks, there is also excitement about the potential to use LLMs to support clinical decision making,” said corresponding author Emily Alsentzer, PhD, a postdoctoral researcher in the Division of General Internal Medicine at Brigham and Women's Hospital. “We wanted to systematically assess whether GPT-4 encodes racial and gender biases that impact its ability to support clinical decision making." 

Alsentzer and colleagues tested four applications of GPT-4 using the Azure OpenAI platform. First, they prompted GPT-4 to generate patient vignettes that can be used in medical education. Next, they tested GPT-4's ability to correctly develop a differential diagnosis and treatment plan for 19 different patient cases from a NEJM Healer, a medical education tool that presents challenging clinical cases to medical trainees. Finally, they assessed how GPT-4 makes inferences about a patient’s clinical presentation using eight case vignettes that were originally generated to measure implicit bias. For each application, the authors assessed whether GPT-4’s outputs were biased by race or gender.  

For the medical education task, the researchers constructed ten prompts that required GPT-4 to generate a patient presentation for a supplied diagnosis. They ran each prompt 100 times and found that GPT-4 exaggerated known differences in disease prevalence by demographic group.  

"One striking example is when GPT-4 is prompted to generate a vignette for a patient with sarcoidosis: GPT-4 describes a Black woman 81% of the time," Alsentzer explains. "While sarcoidosis is more prevalent in Black patients and in women, it’s not 81% of all patients."  

Next, when GPT-4 was prompted to develop a list of 10 possible diagnoses for the NEJM Healer cases, changing the gender or race/ethnicity of the patient significantly affected its ability to prioritize the correct top diagnosis in 37% of cases.  

"In some cases, GPT-4’s decision making reflects known gender and racial biases in the literature," Alsentzer said. "In the case of pulmonary embolism, the model ranked panic attack/anxiety as a more likely diagnosis for women than men. It also ranked sexually transmitted diseases, such as acute HIV and syphilis, as more likely for patients from racial minority backgrounds compared to white patients." 

 
When asked to evaluate subjective patient traits such as honesty, understanding, and pain tolerance, GPT-4 produced significantly different responses by race, ethnicity, and gender for 23% of the questions. For example, GPT-4 was significantly more likely to rate Black male patients as abusing the opioid Percocet than Asian, Black, Hispanic, and white female patients when the answers should have been identical for all the simulated patient cases. 

Limitations of the current study include testing GPT-4's responses using a limited number of simulated prompts and analyzing model performance using only a few traditional categories of demographic identities. Future work should investigate biases using clinical notes from the electronic health record. 

"While LLM-based tools are currently being deployed with a clinician in the loop to verify the model’s outputs, it is very challenging for clinicians to detect systemic biases when viewing individual patient cases," Alsentzer said. “It is critical that we perform bias evaluations for each intended use of LLMs, just as we do for other machine learning models in the medical domain. Our work can help start a conversation about GPT-4’s potential to propagate bias in clinical decision support applications.” 

Authorship: Additional BWH authors include Jorge A Rodriguez, David W Bates, and Raja-Elie E Abdulnour. Additional authors include Travis Zack, Eric Lehman, Mirac Suzgun, Leo Anthony Celi, Judy Gichoya, Dan Jurafsky, Peter Szolovits, and Atul J Butte. 

Disclosures: Alsentzer reports personal fees from Canopy Innovations, Fourier Health, and Xyla; and grants from Microsoft Research. Abdulnour is an employee of Massachusetts Medical Society, which owns NEJM Healer (NEJM Healer cases were used in the study). Additional author disclosures can be found in the paper. 

Funding: T32 NCI Hematology/Oncology Training Fellowship; Open Philanthropy and the National Science Foundation (IIS-2128145); and a philanthropic gift from Priscilla Chan and Mark Zuckerberg. 

Paper cited: Zack, T; Lehman, E et al. “Assessing the potential of GPT-4 to perpetuate racial and gender biases in healthcare: a model evaluation study” The Lancet Digital Health DOI: 10.1016/S2589-7500(23)00225-X 

 

END


ELSE PRESS RELEASES FROM THIS DATE:

Apes remember friends they haven’t seen for decades

Apes remember friends they haven’t seen for decades
2023-12-18
Apes recognize photos of groupmates they haven’t seen for more than 25 years and respond even more enthusiastically to pictures of their friends, a new study finds. The work, which demonstrates the longest-lasting social memory ever documented outside of humans, and underscores how human culture evolved from the common ancestors we share with apes, our closest relatives, was published today in the journal Proceedings of the National Academy of Sciences. “Chimpanzees and bonobos recognize individuals even though they haven’t seen them for ...

Scientists might be using a flawed strategy to predict how species will fare under climate change

Scientists might be using a flawed strategy to predict how species will fare under climate change
2023-12-18
EMBARGO LIFTS DEC. 18, 2023, AT 3:00 PM U.S. EASTERN TIME As the world heats up, and the climate shifts, life will migrate, adapt or go extinct. For decades, scientists have deployed a specific method to predict how a species will fare during this time of great change. But according to new research, that method might be producing results that are misleading or wrong. University of Arizona researchers and their team members at the U.S. Forest Service and Brown University found that the method – commonly referred to as space-for-time substitution – failed to accurately predict how a widespread tree of the Western U.S. called the ...

Mesopotamian bricks unveil the strength of Earth’s ancient magnetic field

Mesopotamian bricks unveil the strength of Earth’s ancient magnetic field
2023-12-18
Ancient bricks inscribed with the names of Mesopotamian kings have yielded important insights into a mysterious anomaly in Earth’s magnetic field 3,000 years ago, according to a new study involving UCL researchers. The research, published in the Proceedings of the National Academy of Sciences (PNAS), describes how changes in the Earth’s magnetic field imprinted on iron oxide grains within ancient clay bricks, and how scientists were able to reconstruct these changes from the names of the kings inscribed on the bricks. The team hopes that using this “archaeomagnetism,” which looks for signatures ...

Move over dolphins. Chimps and bonobos can recognize long-lost friends and family — for decades

2023-12-18
Researchers led by a University of California, Berkeley, comparative psychologist have found that great apes and chimpanzees, our closest living relatives, can recognize groupmates they haven't seen in over two decades — evidence of what’s believed to be the longest-lasting nonhuman memory ever recorded.  The findings also bolster the theory that long-term memory in humans, chimpanzees and bonobos likely comes from our shared common ancestor that lived between 6 million and 9 million years ago. The team used infrared eye-tracking cameras to record where bonobos and chimps gazed when they were shown side-by-side images of other bonobos ...

First observation of how water molecules move near a metal electrode

First observation of how water molecules move near a metal electrode
2023-12-18
A collaborative team of experimental and computational physical chemists from South Korea and the United States have made an important discovery in the field of electrochemistry, shedding light on the movement of water molecules near metal electrodes. This research holds profound implications for the advancement of next-generation batteries utilizing aqueous electrolytes. In the nanoscale realm, chemists typically utilize laser light to illuminate molecules and measure spectroscopic properties to visualize molecules. However, studying the behavior of ...

Harnessing nanotechnology to understand tumor behavior

Harnessing nanotechnology to understand tumor behavior
2023-12-18
A study conducted by pre-PhD researcher Pablo S. Valera and recently published in PNAS demonstrates the potential of surface-enhanced Raman spectroscopy (SERS) to explore metabolites secreted by cancer cells in cancer research. The study, which has been led by Ikerbasque Research Professors Luis Liz-Marzán (from CIC biomaGUNE) and Arkaitz Carracedo (of CIC bioGUNE) and in which other researchers from both centers, also members of the Networking Biomedical Research Centre (CIBER), have participated as well, provides valuable information to guide more specific experiments to reveal ...

Exercise-induced Pgc-1α expression inhibits fat accumulation in aged skeletal muscles

2023-12-18
Myosteatosis, or aging-related fat accumulation in skeletal muscles, is a leading cause of declines in muscle strength and quality of life in elderly adults. Older adults who are sedentary and develop accumulated fat in the skeletal muscle are often prescribed exercise by their doctors to combat the condition. If scientists were to develop a new therapy, such as medications, to combat myosteatosis, they would need to replicate the mechanism by which exercise might reduce fat accumulation in muscles.    Fibro-adipogenic ...

NASA’s Webb rings in holidays with ringed planet Uranus

NASA’s Webb rings in holidays with ringed planet Uranus
2023-12-18
NASA’s James Webb Space Telescope recently trained its sights on unusual and enigmatic Uranus, an ice giant that spins on its side. Webb captured this dynamic world with rings, moons, storms, and other atmospheric features – including a seasonal polar cap. The image expands upon a two-color version released earlier this year, adding additional wavelength coverage for a more detailed look. With its exquisite sensitivity, Webb captured Uranus’ dim inner and outer rings, including the ...

Memory research: Breathing in sleep impacts memory processes

2023-12-18
How are memories consolidated during sleep? In 2021, researchers led by Dr. Thomas Schreiner, leader of the Emmy Noether junior research group at LMU’s Department of Psychology, had already shown there was a direct relationship between the emergence of certain sleep-related brain activity patterns and the reactivation of memory contents during sleep. However, it was still unclear whether these rhythms are orchestrated by a central pacemaker. So the researchers joined up with scientists from the Max Planck Institute for Human Development in Berlin and the University of Oxford to reanalyze the data. Their results have identified ...

Alexander Zholents recognized with 2023 Dieter Möhl Award

2023-12-18
Zholents was honored for his work on the theory of optical stochastic cooling. Alexander Zholents, a senior physicist at the U.S. Department of Energy’s (DOE) Argonne National Laboratory and distinguished fellow in the Accelerator Systems division is one of the recipients of this year’s Dieter Möhl Award. The award is presented by CERN, the European laboratory for particle physics. It is in tribute to the late Dieter Möhl, a pioneer in the realm of particle beam cooling. The awards celebrate both early career and lifetime achievements in the field of beam cooling and its applications. “I am deeply honored to receive this award,” said Zholents. ​“The ...

LAST 30 PRESS RELEASES:

Heart rhythm disorder traced to bacterium lurking in our gums

American Society of Plant Biologists names 2025 award recipients

Protecting Iceland’s towns from lava flows – with dirt

Noninvasive intracranial source signal localization and decoding with high spatiotemporal resolution

A smarter way to make sulfones: Using molecular oxygen and a functional catalyst

Self-assembly of a large metal-peptide capsid nanostructure through geometric control

Fatty liver in pregnancy may increase risk of preterm birth

World record for lithium-ion conductors

Researchers map 7,000-year-old genetic mutation that protects against HIV

KIST leads next-generation energy storage technology with development of supercapacitor that overcomes limitations

Urine, not water for efficient production of green hydrogen

Chip-scale polydimethylsiloxane acousto-optic phase modulator boosts higher-resolution plasmonic comb spectroscopy

Blood test for many cancers could potentially thwart progression to late stage in up to half of cases

Women non-smokers still around 50% more likely than men to develop COPD

AI tool uses face photos to estimate biological age and predict cancer outcomes

North Korea’s illegal wildlife trade threatens endangered species

Health care workers, firefighters have increased PFAS levels, study finds

Turning light into usable energy

Important step towards improving diagnosis and treatment of brain metastases

Maternal cardiometabolic health during pregnancy associated with higher blood pressure in children, NIH study finds

Mercury levels in the atmosphere have decreased throughout the 21st century

This soft robot “thinks” with its legs

Biologists identify targets for new pancreatic cancer treatments

Simple tweaks to a gene underlie the stench of rotten-smelling flowers

Simple, effective interventions reduce emissions from Bangladesh’s informal brick kilns

Ultrasound-guided 3D bioprinting enables deep-tissue implant fabrication in vivo

Soft limbs of flexible tubes and air enable dynamic, autonomous robotic locomotion

Researchers develop practical solution to reduce emissions and improve air quality from brick manufacturing in Bangladesh

Durham University scientists solve 500-million-year fossil mystery

Red alert for our closest relatives

[Press-News.org] Study assesses GPT-4’s potential to perpetuate racial, gender biases in clinical decision making
A team of Brigham researchers analyzed GPT-4’s performance in four clinical decision support scenarios: generating clinical vignettes, diagnostic reasoning, clinical plan generation and subjective patient assessments.