PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

University of Toronto Engineering study finds bigger datasets might not always be better for AI models

New research on materials science datasets shows the amount of training data can be significantly reduced with minimal impact on the performance of the model

2023-11-13
(Press-News.org) From ChatGPT to DALL-E, deep learning artificial intelligence (AI) algorithms are being applied to an ever-growing range of fields. A new study from University of Toronto Engineering researchers, published in Nature Communications, suggests that one of the fundamental assumptions of deep learning models — that they require enormous amounts of training data — may not be as solid as once thought.   

Professor Jason Hattrick-Simpers and his team are focused on the design of next-generation materials, from catalysts that convert captured carbon into fuels to non-stick surfaces that keep airplane wings ice-free.  

One of the challenges in the field is the enormous potential search space. For example, the Open Catalyst Project contains more than 200 million data points for potential catalyst materials, all of which still cover only a tiny portion of the vast chemical space that may, for example, hide the right catalyst to help us address climate change.  

“AI models can help us efficiently search this space and narrow our choices down to those families of materials that will be most promising,” says Hattrick-Simpers.  

“Traditionally, a significant amount of data is considered necessary to train accurate AI models. But a dataset like the one from the Open Catalyst Project is so large that you need very powerful supercomputers to be able to tackle it. So, there’s a question of equity; we need to find a way to identify smaller datasets that folks without access to huge amounts of computing power can train their models on.”  

But this leads to a second challenge: many of the smaller materials datasets currently available have been developed for a specific domain — for example, improving the performance of battery electrodes.  

This means that they tend to cluster around a few chemical compositions similar to those already in use today and may be missing possibilities that could be more promising, but less intuitively obvious.  

“Imagine if you wanted to build a model to predict students’ final grades based on previous test scores,” says Dr. Kangming Li, a postdoctoral fellow in Hattrick-Simpers’ lab.  

“If you trained it only on students from Canada, it might do perfectly well in that context, but it might fail to accurately predict grades for students from France or Japan. That’s the situation we are up against in the world of materials.”  

One possible solution to address the above challenges is to identify subsets of data from within very large datasets that are easier to process, but which nevertheless retain the full range of information and diversity present in the original.  

To better understand how the qualities of datasets affect the models they are used to train, Li designed methods to identify high-quality subsets of data from previously published materials datasets, such as JARVIS, The Materials Project, and the Open Quantum Materials Database (OQMD). Together, these databases contain information on more than a million different materials.  

Li built a computer model that predicted material properties and trained it in two ways: one used the original dataset, but the other used a subset of that same data that was approximately 95% smaller.   

“What we found was that when trying to predict the properties of a material that was contained within the domain of the dataset, the model that had been trained on only 5% of the data performed about the same as the one that had been trained on all the data,” says Li.  

“Conversely, when trying to predict the properties of a material that was outside the domain of the dataset, both of them did similarly poorly.”  

Li says that the findings suggest a way of measuring the amount of redundancy in a given dataset: if more data does not improve model performance, it could be an indicator that those additional data are redundant and do not provide new information for the models to learn.   

“Our results also reveal a concerning degree of redundancy hidden within these highly sought-after large datasets,” says Li.    

The study also underlines what AI experts from many fields are finding to be true: that even models trained on relatively small datasets can perform well if the data is of high enough quality.  

“All this grew out of the fact that in terms of using AI to speed up materials discovery, we’re just getting started,” says Hattrick-Simpers.  

“What it suggests is that as we go forward, we need to be really thoughtful about how we build our datasets. That’s true whether it’s done from the top down, as in selecting a subset of data from a much larger dataset, or from the bottom up, as in sampling new materials to include.  

“We need to pay attention to the information richness, rather than just gathering as much data as we can.” 

END


ELSE PRESS RELEASES FROM THIS DATE:

Acupuncture may offer limited relief to patients with chronic hives

2023-11-13
Annals of Internal Medicine Tip Sheet @Annalsofim Below please find summaries of new articles that will be published in the next issue of Annals of Internal Medicine. The summaries are not intended to substitute for the full articles as a source of information. This information is under strict embargo and by taking it into possession, media representatives are committing to the terms of the embargo not only on their own behalf, but also on behalf of the organization they represent. ---------------------------- 1. Acupuncture may offer limited relief to patients with chronic hives   Abstract: https://www.acpjournals.org/doi/10.7326/M23-1043 Editorial: ...

Virologic rebound observed in 20% of patients treated with nirmatrelvir-ritonavir

2023-11-13
Embargoed for release until 5:00 p.m. ET on Monday 13 November 2023  Annals of Internal Medicine Tip Sheet  @Annalsofim  Below please find summaries of new articles that will be published in the next issue of Annals of Internal Medicine. The summaries are not intended to substitute for the full articles as a source of information. This information is under strict embargo and by taking it into possession, media representatives are committing to the terms of the embargo not only on their own behalf, but also on behalf ...

One in five patients experience rebound COVID after taking Paxlovid, new study finds

2023-11-13
A new study by investigators from Mass General Brigham found that one in five individuals taking Nirmatrelvir-ritonavir therapy, commonly known as Paxlovid, to treat severe symptoms of COVID-19, experienced a positive test result and shedding of live and potentially contagious virus following an initial recovery and negative test—a phenomenon known as virologic rebound. By contrast, people not taking Paxlovid only experienced rebound about 2 percent of the time. Results are published in Annals of Internal Medicine. “We conducted this study to address lingering questions about Paxlovid and virologic rebound in COVID-19 treatment,” said corresponding ...

Scientists discover key to a potential natural cancer treatment’s potency

Scientists discover key to a potential natural cancer treatment’s potency
2023-11-13
JUPITER, Fla. — Slumbering among thousands of bacterial strains in a collection of natural specimens at The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, several fragile vials held something unexpected, and possibly very useful. Writing in the journal Nature Chemical Biology, a team led by chemist Ben Shen, Ph.D., described discovery of two new enzymes, ones with uniquely useful properties that could help in the fight against human diseases including cancer. The discovery, published ...

Mount Sinai researchers find more than 4,700 gene clusters crucial for prognosis in 32 cancer types

Mount Sinai researchers find more than 4,700 gene clusters crucial for prognosis in 32 cancer types
2023-11-13
New York, NY (November 13, 2023)—Researchers at the Mount Sinai Center for Transformative Disease Modeling have released a groundbreaking study identifying 4,749 key gene clusters, termed “prognostic modules,” that significantly influence the progression of 32 different types of cancer. The study, published in Genome Research, serves as a comprehensive resource and lays the foundation for the development of next-generation cancer treatments and diagnostic markers. Despite significant progress in cancer research, understanding the disease's genetic intricacies ...

Ammonia fuel offers great benefits but demands careful action

2023-11-13
Ammonia, a main component of many fertilizers, could play a key role in a carbon-free fuel system as a convenient way to transport and store clean hydrogen. The chemical, made of hydrogen and nitrogen (NH3), can also itself be burned as a zero-carbon fuel. However, new research led by Princeton University illustrates that even though it may not be a source of carbon pollution, ammonia’s widespread use in the energy sector could pose a grave risk to the nitrogen cycle and climate without proper engineering precautions. Publishing their findings in PNAS, the interdisciplinary team of 12 researchers found that a well-engineered ammonia economy could help the world achieve ...

Low-intensity fires reduce wildfire risk by 60%, study finds

2023-11-13
November 13, 2023-- There is no longer any question of how to prevent high-intensity, often catastrophic, wildfires that have become increasingly frequent across the Western U.S., according to a new study by researchers at Columbia University Mailman School of Public Health and Stanford University. The analysis reveals that low-intensity burning, such as controlled or prescribed fires, managed wildfires, and tribal cultural burning, can dramatically reduce the risk of devastating fires for years at a time. The findings are some of the first to rigorously quantify the value of low-intensity fire and be released while Congress is reassessing ...

Astrophysicist uses NSF funding to grow the number of deaf, hard-of-hearing, and Hispanic researchers

2023-11-13
Astrophysicist Jason Nordhaus is breaking cultural and disciplinary boundaries by helping to grow the number of deaf, hard-of-hearing, and Hispanic researchers. And, in doing so, he is enabling these future scientists to drive discoveries in one of his areas of expertise—neutron star astrophysics.  Nordhaus, an associate professor of physics at Rochester Institute of Technology’s National Technical Institute for the Deaf, has earned a National Science Foundation grant that connects NTID with Texas Tech University, a Hispanic Serving Institution. Through a series of unique summer research exchanges ...

A ‘fish cartel’ for Africa could benefit the countries, and their seas

A ‘fish cartel’ for Africa could benefit the countries, and their seas
2023-11-13
Banding together to sell fishing rights could generate economic benefits for African countries, which receive far less from access to their fisheries on the global market than other countries do from theirs. By joining forces, UC Santa Barbara researchers say in a paper published in the journal Nature Communications, African fisheries would not just secure more competitive access fees, they could also protect their seas’ biodiversity. “If African countries created a ‘fish cartel’ to sell fishing rights to foreign vessels, they could increase their fish biomass by 16% and make 23% more in profits,” ...

Absorbable scaffold outperforms angioplasty for lower-leg artery disease

2023-11-13
In patients with severe artery blockage in the lower leg, an artery-supporting device called a resorbable scaffold is superior to angioplasty, which has been the standard treatment, according to the results of a large international clinical trial co-led by researchers at Weill Cornell Medicine and NewYork-Presbyterian. Angioplasty involves the widening of a narrowed artery with a small, balloon-like mechanism. A resorbable scaffold is a stent-like structure that props the artery open but is biodegradable and dissolves within a few years, avoiding some of the potential complications of a permanent ...

LAST 30 PRESS RELEASES:

Sexual health symptoms may correlate with poor adherence to adjuvant endocrine therapy in Black women with breast cancer

Black patients with triple-negative breast cancer may be less likely to receive immunotherapy than white patients

Affordable care act may increase access to colon cancer care for underserved groups

UK study shows there is less stigma against LGBTQ people than you might think, but people with mental health problems continue to experience higher levels of stigma

Bringing lost proteins back home

Better than blood tests? Nanoparticle potential found for assessing kidneys

Texas A&M and partner USAging awarded 2024 Immunization Neighborhood Champion Award

UTEP establishes collaboration with DoD, NSA to help enhance U.S. semiconductor workforce

Study finds family members are most common perpetrators of infant and child homicides in the U.S.

Researchers secure funds to create a digital mental health tool for Spanish-speaking Latino families

UAB startup Endomimetics receives $2.8 million Small Business Innovation Research grant

Scientists turn to human skeletons to explore origins of horseback riding

UCF receives prestigious Keck Foundation Award to advance spintronics technology

Cleveland Clinic study shows bariatric surgery outperforms GLP-1 diabetes drugs for kidney protection

Study reveals large ocean heat storage efficiency during the last deglaciation

Fever drives enhanced activity, mitochondrial damage in immune cells

A two-dose schedule could make HIV vaccines more effective

Wastewater monitoring can detect foodborne illness, researchers find

Kowalski, Salonvaara receive ASHRAE Distinguished Service Awards

SkAI launched to further explore universe

SLU researchers identify sex-based differences in immune responses against tumors

Evolved in the lab, found in nature: uncovering hidden pH sensing abilities

Unlocking the potential of patient-derived organoids for personalized sarcoma treatment

New drug molecule could lead to new treatments for Parkinson’s disease in younger patients

Deforestation in the Amazon is driven more by domestic demand than by the export market

Demand-side actions could help construction sector deliver on net-zero targets

Research team discovers molecular mechanism for a bacterial infection

What role does a tailwind play in cycling’s ‘Everesting’?

Projections of extreme temperature–related deaths in the US

Wearable device–based intervention for promoting patient physical activity after lung cancer surgery

[Press-News.org] University of Toronto Engineering study finds bigger datasets might not always be better for AI models
New research on materials science datasets shows the amount of training data can be significantly reduced with minimal impact on the performance of the model