(Press-News.org) A team of Brigham researchers analyzed GPT-4’s performance in four clinical decision support scenarios: generating clinical vignettes, diagnostic reasoning, clinical plan generation and subjective patient assessments.
When prompted to generate clinical vignettes for medical education, GPT-4 failed to model the demographic diversity of medical conditions, exaggerating known demographic prevalence differences in 89% of diseases.
When evaluating patient perception, GPT-4 produced significantly different responses by gender or race/ethnicity for 23% of cases.
Large language models (LLMs) like ChatGPT and GPT-4 have the potential to assist in clinical practice to automate administrative tasks, draft clinical notes, communicate with patients, and even support clinical decision making. However, preliminary studies suggest the models can encode and perpetuate social biases that could adversely affect historically marginalized groups. A new study by investigators from Brigham and Women’s Hospital, a founding member of the Mass General Brigham healthcare system, evaluated the tendency of GPT-4 to encode and exhibit racial and gender biases in four clinical decision support roles. Their results are published in The Lancet Digital Health.
“While most of the focus is on using LLMs for documentation or administrative tasks, there is also excitement about the potential to use LLMs to support clinical decision making,” said corresponding author Emily Alsentzer, PhD, a postdoctoral researcher in the Division of General Internal Medicine at Brigham and Women's Hospital. “We wanted to systematically assess whether GPT-4 encodes racial and gender biases that impact its ability to support clinical decision making."
Alsentzer and colleagues tested four applications of GPT-4 using the Azure OpenAI platform. First, they prompted GPT-4 to generate patient vignettes that can be used in medical education. Next, they tested GPT-4's ability to correctly develop a differential diagnosis and treatment plan for 19 different patient cases from a NEJM Healer, a medical education tool that presents challenging clinical cases to medical trainees. Finally, they assessed how GPT-4 makes inferences about a patient’s clinical presentation using eight case vignettes that were originally generated to measure implicit bias. For each application, the authors assessed whether GPT-4’s outputs were biased by race or gender.
For the medical education task, the researchers constructed ten prompts that required GPT-4 to generate a patient presentation for a supplied diagnosis. They ran each prompt 100 times and found that GPT-4 exaggerated known differences in disease prevalence by demographic group.
"One striking example is when GPT-4 is prompted to generate a vignette for a patient with sarcoidosis: GPT-4 describes a Black woman 81% of the time," Alsentzer explains. "While sarcoidosis is more prevalent in Black patients and in women, it’s not 81% of all patients."
Next, when GPT-4 was prompted to develop a list of 10 possible diagnoses for the NEJM Healer cases, changing the gender or race/ethnicity of the patient significantly affected its ability to prioritize the correct top diagnosis in 37% of cases.
"In some cases, GPT-4’s decision making reflects known gender and racial biases in the literature," Alsentzer said. "In the case of pulmonary embolism, the model ranked panic attack/anxiety as a more likely diagnosis for women than men. It also ranked sexually transmitted diseases, such as acute HIV and syphilis, as more likely for patients from racial minority backgrounds compared to white patients."
When asked to evaluate subjective patient traits such as honesty, understanding, and pain tolerance, GPT-4 produced significantly different responses by race, ethnicity, and gender for 23% of the questions. For example, GPT-4 was significantly more likely to rate Black male patients as abusing the opioid Percocet than Asian, Black, Hispanic, and white female patients when the answers should have been identical for all the simulated patient cases.
Limitations of the current study include testing GPT-4's responses using a limited number of simulated prompts and analyzing model performance using only a few traditional categories of demographic identities. Future work should investigate biases using clinical notes from the electronic health record.
"While LLM-based tools are currently being deployed with a clinician in the loop to verify the model’s outputs, it is very challenging for clinicians to detect systemic biases when viewing individual patient cases," Alsentzer said. “It is critical that we perform bias evaluations for each intended use of LLMs, just as we do for other machine learning models in the medical domain. Our work can help start a conversation about GPT-4’s potential to propagate bias in clinical decision support applications.”
Authorship: Additional BWH authors include Jorge A Rodriguez, David W Bates, and Raja-Elie E Abdulnour. Additional authors include Travis Zack, Eric Lehman, Mirac Suzgun, Leo Anthony Celi, Judy Gichoya, Dan Jurafsky, Peter Szolovits, and Atul J Butte.
Disclosures: Alsentzer reports personal fees from Canopy Innovations, Fourier Health, and Xyla; and grants from Microsoft Research. Abdulnour is an employee of Massachusetts Medical Society, which owns NEJM Healer (NEJM Healer cases were used in the study). Additional author disclosures can be found in the paper.
Funding: T32 NCI Hematology/Oncology Training Fellowship; Open Philanthropy and the National Science Foundation (IIS-2128145); and a philanthropic gift from Priscilla Chan and Mark Zuckerberg.
Paper cited: Zack, T; Lehman, E et al. “Assessing the potential of GPT-4 to perpetuate racial and gender biases in healthcare: a model evaluation study” The Lancet Digital Health DOI: 10.1016/S2589-7500(23)00225-X
END
Study assesses GPT-4’s potential to perpetuate racial, gender biases in clinical decision making
A team of Brigham researchers analyzed GPT-4’s performance in four clinical decision support scenarios: generating clinical vignettes, diagnostic reasoning, clinical plan generation and subjective patient assessments.
2023-12-18
ELSE PRESS RELEASES FROM THIS DATE:
Apes remember friends they haven’t seen for decades
2023-12-18
Apes recognize photos of groupmates they haven’t seen for more than 25 years and respond even more enthusiastically to pictures of their friends, a new study finds.
The work, which demonstrates the longest-lasting social memory ever documented outside of humans, and underscores how human culture evolved from the common ancestors we share with apes, our closest relatives, was published today in the journal Proceedings of the National Academy of Sciences.
“Chimpanzees and bonobos recognize individuals even though they haven’t seen them for ...
Scientists might be using a flawed strategy to predict how species will fare under climate change
2023-12-18
EMBARGO LIFTS DEC. 18, 2023, AT 3:00 PM U.S. EASTERN TIME
As the world heats up, and the climate shifts, life will migrate, adapt or go extinct. For decades, scientists have deployed a specific method to predict how a species will fare during this time of great change. But according to new research, that method might be producing results that are misleading or wrong.
University of Arizona researchers and their team members at the U.S. Forest Service and Brown University found that the method – commonly referred to as space-for-time substitution – failed to accurately predict how a widespread tree of the Western U.S. called the ...
Mesopotamian bricks unveil the strength of Earth’s ancient magnetic field
2023-12-18
Ancient bricks inscribed with the names of Mesopotamian kings have yielded important insights into a mysterious anomaly in Earth’s magnetic field 3,000 years ago, according to a new study involving UCL researchers.
The research, published in the Proceedings of the National Academy of Sciences (PNAS), describes how changes in the Earth’s magnetic field imprinted on iron oxide grains within ancient clay bricks, and how scientists were able to reconstruct these changes from the names of the kings inscribed on the bricks.
The team hopes that using this “archaeomagnetism,” which looks for signatures ...
Move over dolphins. Chimps and bonobos can recognize long-lost friends and family — for decades
2023-12-18
Researchers led by a University of California, Berkeley, comparative psychologist have found that great apes and chimpanzees, our closest living relatives, can recognize groupmates they haven't seen in over two decades — evidence of what’s believed to be the longest-lasting nonhuman memory ever recorded.
The findings also bolster the theory that long-term memory in humans, chimpanzees and bonobos likely comes from our shared common ancestor that lived between 6 million and 9 million years ago.
The team used infrared eye-tracking cameras to record where bonobos and chimps gazed when they were shown side-by-side images of other bonobos ...
First observation of how water molecules move near a metal electrode
2023-12-18
A collaborative team of experimental and computational physical chemists from South Korea and the United States have made an important discovery in the field of electrochemistry, shedding light on the movement of water molecules near metal electrodes. This research holds profound implications for the advancement of next-generation batteries utilizing aqueous electrolytes.
In the nanoscale realm, chemists typically utilize laser light to illuminate molecules and measure spectroscopic properties to visualize molecules. However, studying the behavior of ...
Harnessing nanotechnology to understand tumor behavior
2023-12-18
A study conducted by pre-PhD researcher Pablo S. Valera and recently published in PNAS demonstrates the potential of surface-enhanced Raman spectroscopy (SERS) to explore metabolites secreted by cancer cells in cancer research. The study, which has been led by Ikerbasque Research Professors Luis Liz-Marzán (from CIC biomaGUNE) and Arkaitz Carracedo (of CIC bioGUNE) and in which other researchers from both centers, also members of the Networking Biomedical Research Centre (CIBER), have participated as well, provides valuable information to guide more specific experiments to reveal ...
Exercise-induced Pgc-1α expression inhibits fat accumulation in aged skeletal muscles
2023-12-18
Myosteatosis, or aging-related fat accumulation in skeletal muscles, is a leading cause of declines in muscle strength and quality of life in elderly adults.
Older adults who are sedentary and develop accumulated fat in the skeletal muscle are often prescribed exercise by their doctors to combat the condition. If scientists were to develop a new therapy, such as medications, to combat myosteatosis, they would need to replicate the mechanism by which exercise might reduce fat accumulation in muscles.
Fibro-adipogenic ...
NASA’s Webb rings in holidays with ringed planet Uranus
2023-12-18
NASA’s James Webb Space Telescope recently trained its sights on unusual and enigmatic Uranus, an ice giant that spins on its side. Webb captured this dynamic world with rings, moons, storms, and other atmospheric features – including a seasonal polar cap. The image expands upon a two-color version released earlier this year, adding additional wavelength coverage for a more detailed look.
With its exquisite sensitivity, Webb captured Uranus’ dim inner and outer rings, including the ...
Memory research: Breathing in sleep impacts memory processes
2023-12-18
How are memories consolidated during sleep? In 2021, researchers led by Dr. Thomas Schreiner, leader of the Emmy Noether junior research group at LMU’s Department of Psychology, had already shown there was a direct relationship between the emergence of certain sleep-related brain activity patterns and the reactivation of memory contents during sleep. However, it was still unclear whether these rhythms are orchestrated by a central pacemaker. So the researchers joined up with scientists from the Max Planck Institute for Human Development in Berlin and the University of Oxford to reanalyze the data. Their results have identified ...
Alexander Zholents recognized with 2023 Dieter Möhl Award
2023-12-18
Zholents was honored for his work on the theory of optical stochastic cooling.
Alexander Zholents, a senior physicist at the U.S. Department of Energy’s (DOE) Argonne National Laboratory and distinguished fellow in the Accelerator Systems division is one of the recipients of this year’s Dieter Möhl Award.
The award is presented by CERN, the European laboratory for particle physics. It is in tribute to the late Dieter Möhl, a pioneer in the realm of particle beam cooling. The awards celebrate both early career and lifetime achievements in the field of beam cooling and its applications.
“I am deeply honored to receive this award,” said Zholents. “The ...
LAST 30 PRESS RELEASES:
Study links wind-blown dust from receding Salton Sea to reduced lung function in area children
Multidisciplinary study finds estrogen could aid in therapies for progressive multiple sclerosis
Final day of scientific sessions reveals critical insights for clinical practice at AAO-HNSF Annual Meeting and OTO EXPO
Social adversity and triple-negative breast cancer incidence among black women
Rapid vs standard induction to injectable extended-release buprenorphine
Galvanizing blood vessel cells to expand for organ transplantation
Common hospice medications linked to higher risk of death in people with dementia
SNU researchers develop innovative heating and cooling technology using ‘a single material’ to stay cool in summer and warm in winter without electricity
SNU researchers outline a roadmap for next-generation 2D semiconductor 'gate stack' technology
The fundamental traditional Chinese medicine constitution theory serves as a crucial basis for the development and application of food and medicine homology products
Outfoxed: New research reveals Australia’s rapid red fox invasion
SwRI’s Dr. Chris Thomas named AIAA Associate Fellow
National Collegiate Athletic Association (NCAA) funding for research on academic advising experiences of Division I Black/African American student-athletes at minority serving institutions
Johri developing artificial intelligence literacy among undergraduate engineering and technology students
Boston Children’s receives a $35 million donation to accelerate development of therapeutic options for children with brain disorders through the Rosamund Stone Zander and Hansjoerg Wyss Translational
Quantum crystals offer a blueprint for the future of computing and chemistry
Looking beyond speech recognition to evaluate cochlear implants
Tracking infectious disease spread via commuting pattern data
Underweight children cost the NHS as much per child as children with obesity, Oxford study finds.
Wetland plant-fungus combo cleans up ‘forever chemicals’ in a pilot study
Traditional Chinese medicine combined with peginterferon α-2b in chronic hepatitis B
APS and SPR honor Dr. Wendy K. Chung with the 2026 Mary Ellen Avery Neonatal Research Award
The Gabriella Miller Kids First Data Resource Center (Kids First DRC) has launched the Variant Workbench
Yeast survives Martian conditions
Calcium could be key to solving stability issues in sodium-ion batteries
Can smoother surfaces prevent hydrogen embrittlement?
Heart rate changes predict depression treatment success with magnetic brain stimulation
Genetics pioneer transforms global depression research through multi-omics discoveries
MDMA psychiatric applications synthesized: Comprehensive review examines PTSD treatment and emerging therapeutic indications
Psychedelics offer new therapeutic framework for stress-related psychiatric disorders
[Press-News.org] Study assesses GPT-4’s potential to perpetuate racial, gender biases in clinical decision makingA team of Brigham researchers analyzed GPT-4’s performance in four clinical decision support scenarios: generating clinical vignettes, diagnostic reasoning, clinical plan generation and subjective patient assessments.