PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Study reveals why AI models that analyze medical images can be biased

These models, which can predict a patient’s race, gender, and age, seem to use those traits as shortcuts when making medical diagnoses.

2024-06-28
(Press-News.org)

Artificial intelligence models often play a role in medical diagnoses, especially when it comes to analyzing images such as X-rays. However, studies have found that these models don’t always perform well across all demographic groups, usually faring worse on women and people of color. 

These models have also been shown to develop some surprising abilities. In 2022, MIT researchers reported that AI models can make accurate predictions about a patient’s race from their chest X-rays — something that the most skilled radiologists can’t do. 

That research team has now found that the models that are most accurate at making demographic predictions also show the biggest “fairness gaps” — that is, discrepancies in their ability to accurately diagnose images of people of different races or genders. The findings suggest that these models may be using “demographic shortcuts” when making their diagnostic evaluations, which lead to incorrect results for women, Black people, and other groups, the researchers say.

“It’s well-established that high-capacity machine-learning models are good predictors of human demographics such as self-reported race or sex or age. This paper re-demonstrates that capacity, and then links that capacity to the lack of performance across different groups, which has never been done,” says Marzyeh Ghassemi, an MIT associate professor of electrical engineering and computer science, a member of MIT’s Institute for Medical Engineering and Science, and the senior author of the study.

The researchers also found that they could retrain the models in a way that improves their fairness. However, their approached to “debiasing” worked best when the models were tested on the same types of patients they were trained on, such as patients from the same hospital. When these models were applied to patients from different hospitals, the fairness gaps reappeared. 

“I think the main takeaways are, first, you should thoroughly evaluate any external models on your own data because any fairness guarantees that model developers provide on their training data may not transfer to your population. Second, whenever sufficient data is available, you should train models on your own data,” says Haoran Zhang, an MIT graduate student and one of the lead authors of the new paper. MIT graduate student Yuzhe Yang is also a lead author of the paper, which will appear in Nature Medicine. Judy Gichoya, an associate professor of radiology and imaging sciences at Emory University School of Medicine, and Dina Katabi, the Thuan and Nicole Pham Professor of Electrical Engineering and Computer Science at MIT, are also authors of the paper. 

Removing bias

As of May 2024, the FDA has approved 882 AI-enabled medical devices, with 671 of them designed to be used in radiology. Since 2022, when Ghassemi and her colleagues showed that these diagnostic models can accurately predict race, they and other researchers have shown that such models are also very good at predicting gender and age, even though the models are not trained on those tasks.

“Many popular machine learning models have superhuman demographic prediction capacity — radiologists cannot detect self-reported race from a chest X-ray,” Ghassemi says. “These are models that are good at predicting disease, but during training are learning to predict other things that may not be desirable.” In this study, the researchers set out to explore why these models don’t work as well for certain groups. In particular, they wanted to see if the models were using demographic shortcuts to make predictions that ended up being less accurate for some groups. These shortcuts can arise in AI models when they use demographic attributes to determine whether a medical condition is present, instead of relying on other features of the images. 

Using publicly available chest X-ray datasets from Beth Israel Deaconess Medical Center in Boston, the researchers trained models to predict whether patients had one of three different medical conditions: fluid buildup in the lungs, collapsed lung, or enlargement of the heart. Then, they tested the models on X-rays that were held out from the training data. 

Overall, the models performed well, but most of them displayed “fairness gaps” — that is, discrepancies between accuracy rates for men and women, and for white and Black patients. 

The models were also able to predict the gender, race, and age of the X-ray subjects. Additionally, there was a significant correlation between each model’s accuracy in making demographic predictions and the size of its fairness gap. This suggests that the models may be using demographic categorizations as a shortcut to make their disease predictions.

The researchers then tried to reduce the fairness gaps using two types of strategies. For one set of models, they trained them to optimize “subgroup robustness,” meaning that the models are rewarded for having better performance on the subgroup for which they have the worst performance, and penalized if their error rate for one group is higher than the others. 

In another set of models, the researchers forced them to remove any demographic information from the images, using “group adversarial” approaches. Both of these strategies worked fairly well, the researchers found. 

“For in-distribution data, you can use existing state-of-the-art methods to reduce fairness gaps without making significant trade-offs in overall performance,” Ghassemi says. “Subgroup robustness methods force models to be sensitive to mispredicting a specific group, and group adversarial methods try to remove group information completely.”

Not always fairer

However, those approaches only worked when the models were tested on data from the same types of patients that they were trained on — for example, only patients from the Beth Israel Deaconess Medical Center dataset. 

When the researchers tested the models that had been “debiased” using the BIDMC data to analyze patients from five other hospital datasets, they found that the models’ overall accuracy remained high, but some of them exhibited large fairness gaps.

“If you debias the model in one set of patients, that fairness does not necessarily hold as you move to a new set of patients from a different hospital in a different location,” Zhang says.

This is worrisome because in many cases, hospitals use models that have been developed on data from other hospitals, especially in cases where an off-the-shelf model is purchased, the researchers say.

“We found that even state-of-the-art models which are optimally performant in data similar to their training sets are not optimal — that is, they do not make the best trade-off between overall and subgroup performance — in novel settings,” Ghassemi says. “Unfortunately, this is actually how a model is likely to be deployed. Most models are trained and validated with data from one hospital, or one source, and then deployed widely.”

The researchers found that the models that were debiased using group adversarial approaches showed slightly more fairness when tested on new patient groups that those debiased with subgroup robustness methods. They now plan to try to develop and test additional methods to see if they can create models that do a better job of making fair predictions on new datasets.

The findings suggest that hospitals that use these types of AI models should evaluate them on their own patient population before beginning to use them, to make sure they aren’t giving inaccurate results for certain groups.

The research was funded by a Google Research Scholar Award, the Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program, RSNA Health Disparities, the Lacuna Fund, the Gordon and Betty Moore Foundation, the National Institute of Biomedical Imaging and Bioengineering, and the National Heart, Lung, and Blood Institute.

###

Written by Anne Trafton, MIT News

END



ELSE PRESS RELEASES FROM THIS DATE:

New class of Mars quakes reveals daily meteorite strikes

New class of Mars quakes reveals daily meteorite strikes
2024-06-28
In brief: For the first time, researchers use seismic data to estimate a global meteorite impact rate showing meteoroids the size of a basketball impact Mars on a near daily basis. Impact-generated seismic signals show meteorite impacts to be 5-times more abundant than previously thought. Seismic data offers a new tool, in addition to observational data, for calculating meteorite impact rates and planning future Mars missions An international team of researchers, co-lead by ETH Zurich and Imperial College London, have derived the first estimate of global meteorite impacts on Mars using seismic data. Their findings indicate between 280 to 360 meteorites strike ...

Gene therapy halts progression of rare genetic condition in young boy

Gene therapy halts progression of rare genetic condition in young boy
2024-06-28
When Michael Pirovolakis received an individualized gene therapy in a single-patient clinical trial at The Hospital for Sick Children (SickKids) in March 2022, the course of his condition was dramatically altered.  Michael has spastic paraplegia type 50 (SPG50), an “ultra-rare” progressive neurodegenerative disorder that causes developmental delays, speech impairment, seizures, a progressive paralysis of all four limbs, and typically fatal by adulthood. Approximately 80 children around the world ...

New predictors of metastasis in patients with early-stage pancreatic cancer

New predictors of metastasis in patients with early-stage pancreatic cancer
2024-06-28
Researchers at Weill Cornell Medicine with an international team have used liver biopsies to identify cellular and molecular markers that can potentially be used to predict whether and when pancreatic cancer will spread to an individual’s liver or elsewhere, such as the lung. The study, published on June 28 in Nature Medicine, proposes that information from a liver biopsy—a small tissue sample collected for lab analysis—when pancreatic cancer is diagnosed may help guide doctors in personalizing treatment, such as liver-directed immunotherapies, before cancer cells have the chance to metastasize. Only 10 percent of people with pancreatic ...

Climate change to shift tropical rains northward

Climate change to shift tropical rains northward
2024-06-28
  A study led by a UC Riverside atmospheric scientist predicts that unchecked carbon emissions will force tropical rains to shift northward in the coming decades, which would profoundly impact agriculture and economies near the Earth's equator. The northward rain shift would be caused by complex changes in the atmosphere spurred by carbon emissions that influence the formation of the intertropical convergence zones. Those zones are essentially atmospheric engines that drive about a third of the world’s precipitation, Liu and his co-authors report in a paper published Friday, June 28, in the journal Nature Climate Change. Tropical ...

City of Hope study suggests changing the gut microbiome improves health outcomes for people newly diagnosed with metastatic kidney cancer

City of Hope study suggests changing the gut microbiome improves health outcomes for people newly diagnosed with metastatic kidney cancer
2024-06-28
LOS ANGELES — Physician scientists from City of Hope, one of the largest cancer research and treatment organizations in the United States, found that people with metastatic kidney cancer who orally took a live biotherapeutic product called CBM588 while in treatment with immunotherapy and enzymatic tyrosine kinase inhibitors experienced improved health outcomes. The phase 1 trial was published today in Nature Medicine. Microorganisms in the gut modulate the immune system. City of Hope researchers are now in discussions with the global SWOG Cancer ...

Surprising meteorite impact rate on Mars can act as ‘cosmic clock’

Surprising meteorite impact rate on Mars can act as ‘cosmic clock’
2024-06-28
Seismic signals have suggested Mars gets hit by around 300 basketball-sized meteorites every year, providing a new tool for dating planetary surfaces. The new research, led by scientists at Imperial College London and ETH Zurich working as part of NASA's InSight mission, has shed light on how often ‘marsquakes’ caused by meteorite impacts occur on Mars. The researchers found that Mars experiences around 280 to 360 meteorite impacts every year that produce craters larger than eight metres in diameter and shake the red planet’s ...

Air pollution exposure during childhood linked directly to adult bronchitis symptoms in new research

2024-06-28
A new study brings fresh revelations about the connection between early-life exposure to air pollution and lung health later in life. A research team led by the Keck School of Medicine of USC has shown that exposure to air pollution during childhood is directly associated with bronchitis symptoms as an adult. To date, many investigations in the field have established intuitive links that are less direct than that: Air pollution exposure while young is consistently associated with lung problems during childhood — and childhood lung problems are consistently associated with lung issues as an adult.   The current study, published in the American Journal of Respiratory ...

Kids given ‘digital pacifiers’ to calm tantrums fail to learn how to regulate emotions, study finds

2024-06-28
Children learn much about self-regulation – that is affective, mental, and behavioral responses to certain situations – during their first few years of life. Some of these behaviors are about children’s ability to choose a deliberate response over an automatic one. This is known as effortful control, which is learned from the environment, first and foremost through children’s relationship with their parents. In recent years, giving children digital devices to control their responses to emotions, especially if they’re negative, has ...

No evidence that England’s new ‘biodiversity boost’ planning policy will help birds or butterflies

No evidence that England’s new ‘biodiversity boost’ planning policy will help birds or butterflies
2024-06-28
A new legal requirement for developers to demonstrate a biodiversity boost in planning applications could make a more meaningful impact on nature recovery if improvements are made to the way nature’s value is calculated, say researchers at the University of Cambridge. From 2024, the UK’s Environment Act requires planning applications to demonstrate an overall biodiversity net gain of at least 10% as calculated using a new statutory biodiversity metric. The researchers trialled the metric by using it to calculate the biodiversity value of 24 sites across England. These sites have all been monitored over the long-term, allowing the team to compare biodiversity ...

Visual explanations of machine learning models to estimate charge states in quantum dots

Visual explanations of machine learning models to estimate charge states in quantum dots
2024-06-28
A group of researchers has successfully demonstrated automatic charge state recognition in quantum dot devices using machine learning techniques, representing a significant step towards automating the preparation and tuning of quantum bits (qubits) for quantum information processing. Semiconductor qubits use semiconductor materials to create quantum bits. These materials are common in traditional electronics, making them integrable with conventional semiconductor technology. This compatibility is why scientists consider them strong candidates for future ...

LAST 30 PRESS RELEASES:

Case Western Reserve University awarded $1.5 million to study vaginal bacterial linked to serious health risks

The next evolution of AI begins with ours

Using sunlight to recycle black plastics

ODS FeCrAl alloys endure liquid metal flow at 600 °C resembling a fusion blanket environment

A genetic key to understanding mitochondrial DNA depletion syndrome

The future of edge AI: Dye-sensitized solar cell-based synaptic device

Bats’ amazing plan B for when they can’t hear

Common thyroid medicine linked to bone loss

Vaping causes immediate effects on vascular function

A new clock to structure sleep

Study reveals new way to unlock blood-brain barrier, potentially opening doors to treat brain and nerve diseases

Viking colonizers of Iceland and nearby Faroe Islands had very different origins, study finds

One in 20 people in Canada skip doses, don’t fill prescriptions because of cost

Wildlife monitoring technologies used to intimidate and spy on women, study finds

Around 450,000 children disadvantaged by lack of school support for color blindness

Reality check: making indoor smartphone-based augmented reality work

Overthinking what you said? It’s your ‘lizard brain’ talking to newer, advanced parts of your brain

Black men — including transit workers — are targets for aggression on public transportation, study shows

Troubling spike in severe pregnancy-related complications for all ages in Illinois

Alcohol use identified by UTHealth Houston researchers as most common predictor of escalated cannabis vaping among youths in Texas

Need a landing pad for helicopter parenting? Frame tasks as learning

New MUSC Hollings Cancer Center research shows how Golgi stress affects T-cells' tumor-fighting ability

#16to365: New resources for year-round activism to end gender-based violence and strengthen bodily autonomy for all

Earliest fish-trapping facility in Central America discovered in Maya lowlands

São Paulo to host School on Disordered Systems

New insights into sleep uncover key mechanisms related to cognitive function

USC announces strategic collaboration with Autobahn Labs to accelerate drug discovery

Detroit health professionals urge the community to act and address the dangers of antimicrobial resistance

3D-printing advance mitigates three defects simultaneously for failure-free metal parts 

Ancient hot water on Mars points to habitable past: Curtin study

[Press-News.org] Study reveals why AI models that analyze medical images can be biased
These models, which can predict a patient’s race, gender, and age, seem to use those traits as shortcuts when making medical diagnoses.