PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

How to assess a general-purpose AI model’s reliability before it’s deployed

A new technique enables users to compare several large models and choose the one that works best for their task

2024-07-16
(Press-News.org) CAMBRIDGE, MA — Foundation models are massive deep-learning models that have been pretrained on an enormous amount of general-purpose, unlabeled data. They can be applied to a variety of tasks, like generating images or answering customer questions.

But these models, which serve as the backbone for powerful artificial intelligence tools like ChatGPT and DALL-E, can offer up incorrect or misleading information. In a safety-critical situation, such as a pedestrian approaching a self-driving car, these mistakes could have serious consequences.

To help prevent such mistakes, researchers from MIT and the MIT-IBM Watson AI Lab developed a technique to estimate the reliability of foundation models before they are deployed to a specific task.

They do this by considering a set of foundation models that are slightly different from one another. Then they use their algorithm to assess the consistency of the representations each model learns about the same test data point. If the representations are consistent, it means the model is reliable.

When they compared their technique to state-of-the-art baseline methods, it was better at capturing the reliability of foundation models on a variety of downstream classification tasks.

Someone could use this technique to decide if a model should be applied in a certain setting, without the need to test it on a real-world dataset. This could be especially useful when datasets may not be accessible due to privacy concerns, like in health care settings. In addition, the technique could be used to rank models based on reliability scores, enabling a user to select the best one for their task.

“All models can be wrong, but models that know when they are wrong are more useful. The problem of quantifying uncertainty or reliability is more challenging for these foundation models because their abstract representations are difficult to compare. Our method allows one to quantify how reliable a representation model is for any given input data,” says senior author Navid Azizan, the Esther and Harold E. Edgerton Assistant Professor in the MIT Department of Mechanical Engineering and the Institute for Data, Systems, and Society (IDSS), and a member of the Laboratory for Information and Decision Systems (LIDS).

He is joined on a paper about the work by lead author Young-Jin Park, a LIDS graduate student; Hao Wang, a research scientist at the MIT-IBM Watson AI Lab; and Shervin Ardeshir, a senior research scientist at Netflix. The paper will be presented at the Conference on Uncertainty in Artificial Intelligence.

Measuring consensus

Traditional machine-learning models are trained to perform a specific task. These models typically make a concrete prediction based on an input. For instance, the model might tell you whether a certain image contains a cat or a dog. In this case, assessing reliability could be a matter of looking at the final prediction to see if the model is right.

But foundation models are different. The model is pretrained using general data, in a setting where its creators don’t know all downstream tasks it will be applied to. Users adapt it to their specific tasks after it has already been trained.

Unlike traditional machine-learning models, foundation models don’t give concrete outputs like “cat” or “dog” labels. Instead, they generate an abstract representation based on an input data point.

To assess the reliability of a foundation model, the researchers used an ensemble approach by training several models which share many properties but are slightly different from one another.

“Our idea is like measuring the consensus. If all those foundation models are giving consistent representations for any data in our dataset, then we can say this model is reliable,” Park says.

But they ran into a problem: How could they compare abstract representations?

“These models just output a vector, comprised of some numbers, so we can’t compare them easily,” he adds.

They solved this problem using an idea called neighborhood consistency.

For their approach, the researchers prepare a set of reliable reference points to test on the ensemble of models. Then, for each model, they investigate the reference points located near that model’s representation of the test point.

By looking at the consistency of neighboring points, they can estimate the reliability of the models.

Aligning the representations

Foundation models map data points to what is known as a representation space. One way to think about this space is as a sphere. Each model maps similar data points to the same part of its sphere, so images of cats go in one place and images of dogs go in another.

But each model would map animals differently in its own sphere, so while cats may be grouped near the South Pole of one sphere, another model could map cats somewhere in the Northern Hemisphere.

The researchers use the neighboring points like anchors to align those spheres so they can make the representations comparable. If a data point’s neighbors are consistent across multiple representations, then one should be confident about the reliability of the model’s output for that point.

When they tested this approach on a wide range of classification tasks, they found that it was much more consistent than baselines. Plus, it wasn’t tripped up by challenging test points that caused other methods to fail.

Moreover, their approach can be used to assess reliability for any input data, so one could evaluate how well a model works for a particular type of individual, such as a patient with certain characteristics.

“Even if the models all have average performance overall, from an individual point of view, you’d prefer the one that works best for that individual,” Wang says.

However, one limitation comes from the fact that they must train an ensemble of foundation models, which is computationally expensive. In the future, they plan to find more efficient ways to build multiple models, perhaps by using small perturbations of a single model.

##

This work is funded, in part, by the MIT-IBM Watson AI Lab, MathWorks, and Amazon.

END


ELSE PRESS RELEASES FROM THIS DATE:

Advancing quantum research – DOE inks MOU with Department of Defense

2024-07-16
WASHINGTON, D.C. - Today, the U.S. Department of Energy (DOE) and the Defense Advanced Research Projects Agency (DARPA) announce a Memorandum of Understanding (MOU) to coordinate efforts to move the needle on quantum computing.   “Realizing practical quantum computers has the potential to dramatically accelerate the pace of discovery across the science and technology landscape,” said Ceren Susut, DOE Associate Director of Science for the Advanced Scientific Computing Research program. “The Office of Science is proud to bring decades of experience in fundamental science for quantum ...

Transporting precious cargo using the body’s own delivery system

2024-07-16
Each cell in the body has its own unique delivery system that scientists are working on harnessing to move revolutionary biological drugs — molecules like proteins, RNA and combinations of the two — to specific diseased parts of the body. A new study from Northwestern University hijacked the transit system and sent tiny, virus-sized containers to effectively deliver an engineered protein to its target cell and trigger a change in the cell’s gene expression. The success came from encouraging engineered proteins to move toward a specific cell membrane structure that the researchers found increased a protein’s likelihood of latching onto the container. Published ...

SwRI, UTD jointly fund project to evaluate space sensor in unique facility

SwRI, UTD jointly fund project to evaluate space sensor in unique facility
2024-07-16
SAN ANTONIO — July 16, 2024 — Researchers from Southwest Research Institute (SwRI) and The University of Texas at Dallas (UTD) are collaborating to evaluate a next-generation sensor designed to measure neutral gas velocities in the Earth’s upper atmosphere. The project, led by SwRI’s Dr. Joo Hwang and UTD’s Dr. Phillip Anderson, is supported by a grant from the new SwRI/UTD Seed Projects for Research, INnovation, and Technology (SPRINT) Program. Another SPRINT project is researching domestic lithium independence, looking at ...

Nature-based solutions to disaster risk from climate change are cost effective, UMmass Amherst study confirms

Nature-based solutions to disaster risk from climate change are cost effective, UMmass Amherst study confirms
2024-07-16
AMHERST, Mass. – A new global assessment of scientific literature led by researchers at the University of Massachusetts Amherst finds that nature-based solutions (NbS) are an economically effective method to mitigate risks from a range of disasters—from floods and hurricanes to heatwaves and landslides—which are only expected to intensify as Earth continues to warm. NbS are interventions where an ecosystem is either preserved, sustainably managed or restored to provide benefits to society and to nature. For instance, they can mitigate risk from a natural disaster, or facilitate climate mitigation and adaptation. NbS ...

Decline in global adolescent fertility rates is counteracted by increasing teen births in Sub-Saharan Africa

2024-07-16
July 16, 2024-- A new report from Columbia University Mailman School of Public Health and the Columbia Aging Center with colleagues from the Norwegian Institute of Public Health highlights a troubling trend: while global adolescent fertility rates have significantly declined, sub-Saharan Africa is experiencing an increase in teen births. This region's share of global adolescent births surged from 12 percent in 1950 to 47 percent in 2020 and is projected to reach a clear majority – a full 67 percent - by ...

Apps and AI could help personalize depression diagnosis and treatment 

2024-07-16
New research at the University of Illinois Chicago is testing whether digital tools can help predict which patients with depression will benefit from specific treatments and help deliver those treatments to them on demand. Two new grants awarding over $10 million to UIC will help Dr. Jun Ma and colleagues in the College of Medicine investigate the use of a smartphone app, an AI voice assistant and other technologies to diagnose and treat depression. The researchers hope these tools will both broaden access to psychiatric care and ...

Researchers create new template of the human brain

2024-07-16
The human brain is responsible for critical functions, including perception, memory, language, thinking, consciousness, and emotions. To understand how the brain works, scientists often use neuroimaging to record participants’ brain activity when the brain is performing a task or at rest. Brain functions are systematically organized on the cerebral cortex, the outer layer of the human brain. Researchers often use what is called a "cortical surface model" to analyze neuroimaging data and study the functional organization of the ...

Study identifies protein that helps COVID-19 virus evade immune system

Study identifies protein that helps COVID-19 virus evade immune system
2024-07-16
An article published in the journal Cell describes a study that enabled a group of researchers to discover how SARS-CoV-2 evades the cytotoxic immune response by identifying a protein called ORF6 that is a key factor in this mechanism.  The cytotoxic immune response involves T-lymphocytes that kill pathogens when they recognize cells bearing a specific antigen while sparing neighboring uninfected cells. The study was led by Wilfredo Garcia-Beltran and Julie Boucau, research scientists at the Ragon ...

Scientists use machine learning to predict diversity of tree species in forests

Scientists use machine learning to predict diversity of tree species in forests
2024-07-16
A collaborative team of researchers led by Ben Weinstein of the University of Florida, Oregon, US, used machine learning to generate highly detailed maps of over 100 million individual trees from 24 sites across the U.S., publishing their findings July 16th in the open-access journal PLOS Biology. These maps provide information about individual tree species and conditions, which can greatly aid conservation efforts and other ecological projects.  Ecologists have long collected data on tree species to better understand a forest’s unique ecosystem. Historically, this has been done by surveying small plots of land and extrapolating those findings, though this cannot account for ...

Machine learning helps define new subtypes of Parkinson’s disease

2024-07-16
Researchers at Weill Cornell Medicine have used machine learning to define three subtypes of Parkinson’s disease based on the pace at which the disease progresses. In addition to having the potential to become an important diagnostic and prognostic tool, these subtypes are marked by distinct driver genes. If validated, these markers could also suggest ways the subtypes can be targeted with new and existing drugs. The research was published on July 10 in npj Digital Medicine. “Parkinson’s disease is highly heterogeneous, which means that ...

LAST 30 PRESS RELEASES:

Breakthrough study reveals unexpected cause of winter ozone pollution

nTIDE January 2025 Jobs Report: Encouraging signs in disability employment: A slow but positive trajectory

Generative AI: Uncovering its environmental and social costs

Lower access to air conditioning may increase need for emergency care for wildfire smoke exposure

Dangerous bacterial biofilms have a natural enemy

Food study launched examining bone health of women 60 years and older

CDC awards $1.25M to engineers retooling mine production and safety

Using AI to uncover hospital patients’ long COVID care needs

$1.9M NIH grant will allow researchers to explore how copper kills bacteria

New fossil discovery sheds light on the early evolution of animal nervous systems

A battle of rafts: How molecular dynamics in CAR T cells explain their cancer-killing behavior

Study shows how plant roots access deeper soils in search of water

Study reveals cost differences between Medicare Advantage and traditional Medicare patients in cancer drugs

‘What is that?’ UCalgary scientists explain white patch that appears near northern lights

How many children use Tik Tok against the rules? Most, study finds

Scientists find out why aphasia patients lose the ability to talk about the past and future

Tickling the nerves: Why crime content is popular

Intelligent fight: AI enhances cervical cancer detection

Breakthrough study reveals the secrets behind cordierite’s anomalous thermal expansion

Patient-reported influence of sociopolitical issues on post-Dobbs vasectomy decisions

Radon exposure and gestational diabetes

EMBARGOED UNTIL 1600 GMT, FRIDAY 10 JANUARY 2025: Northumbria space physicist honoured by Royal Astronomical Society

Medicare rules may reduce prescription steering

Red light linked to lowered risk of blood clots

Menarini Group and Insilico Medicine enter a second exclusive global license agreement for an AI discovered preclinical asset targeting high unmet needs in oncology

Climate fee on food could effectively cut greenhouse gas emissions in agriculture while ensuring a social balance

Harnessing microwave flow reaction to convert biomass into useful sugars

Unveiling the secrets of bone strength: the role of biglycan and decorin

Revealing the “true colors” of a single-atom layer of metal alloys

New data on atmosphere from Earth to the edge of space

[Press-News.org] How to assess a general-purpose AI model’s reliability before it’s deployed
A new technique enables users to compare several large models and choose the one that works best for their task