PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

University of Toronto Engineering study finds bigger datasets might not always be better for AI models

New research on materials science datasets shows the amount of training data can be significantly reduced with minimal impact on the performance of the model

2023-11-13
(Press-News.org) From ChatGPT to DALL-E, deep learning artificial intelligence (AI) algorithms are being applied to an ever-growing range of fields. A new study from University of Toronto Engineering researchers, published in Nature Communications, suggests that one of the fundamental assumptions of deep learning models — that they require enormous amounts of training data — may not be as solid as once thought.   

Professor Jason Hattrick-Simpers and his team are focused on the design of next-generation materials, from catalysts that convert captured carbon into fuels to non-stick surfaces that keep airplane wings ice-free.  

One of the challenges in the field is the enormous potential search space. For example, the Open Catalyst Project contains more than 200 million data points for potential catalyst materials, all of which still cover only a tiny portion of the vast chemical space that may, for example, hide the right catalyst to help us address climate change.  

“AI models can help us efficiently search this space and narrow our choices down to those families of materials that will be most promising,” says Hattrick-Simpers.  

“Traditionally, a significant amount of data is considered necessary to train accurate AI models. But a dataset like the one from the Open Catalyst Project is so large that you need very powerful supercomputers to be able to tackle it. So, there’s a question of equity; we need to find a way to identify smaller datasets that folks without access to huge amounts of computing power can train their models on.”  

But this leads to a second challenge: many of the smaller materials datasets currently available have been developed for a specific domain — for example, improving the performance of battery electrodes.  

This means that they tend to cluster around a few chemical compositions similar to those already in use today and may be missing possibilities that could be more promising, but less intuitively obvious.  

“Imagine if you wanted to build a model to predict students’ final grades based on previous test scores,” says Dr. Kangming Li, a postdoctoral fellow in Hattrick-Simpers’ lab.  

“If you trained it only on students from Canada, it might do perfectly well in that context, but it might fail to accurately predict grades for students from France or Japan. That’s the situation we are up against in the world of materials.”  

One possible solution to address the above challenges is to identify subsets of data from within very large datasets that are easier to process, but which nevertheless retain the full range of information and diversity present in the original.  

To better understand how the qualities of datasets affect the models they are used to train, Li designed methods to identify high-quality subsets of data from previously published materials datasets, such as JARVIS, The Materials Project, and the Open Quantum Materials Database (OQMD). Together, these databases contain information on more than a million different materials.  

Li built a computer model that predicted material properties and trained it in two ways: one used the original dataset, but the other used a subset of that same data that was approximately 95% smaller.   

“What we found was that when trying to predict the properties of a material that was contained within the domain of the dataset, the model that had been trained on only 5% of the data performed about the same as the one that had been trained on all the data,” says Li.  

“Conversely, when trying to predict the properties of a material that was outside the domain of the dataset, both of them did similarly poorly.”  

Li says that the findings suggest a way of measuring the amount of redundancy in a given dataset: if more data does not improve model performance, it could be an indicator that those additional data are redundant and do not provide new information for the models to learn.   

“Our results also reveal a concerning degree of redundancy hidden within these highly sought-after large datasets,” says Li.    

The study also underlines what AI experts from many fields are finding to be true: that even models trained on relatively small datasets can perform well if the data is of high enough quality.  

“All this grew out of the fact that in terms of using AI to speed up materials discovery, we’re just getting started,” says Hattrick-Simpers.  

“What it suggests is that as we go forward, we need to be really thoughtful about how we build our datasets. That’s true whether it’s done from the top down, as in selecting a subset of data from a much larger dataset, or from the bottom up, as in sampling new materials to include.  

“We need to pay attention to the information richness, rather than just gathering as much data as we can.” 

END


ELSE PRESS RELEASES FROM THIS DATE:

Acupuncture may offer limited relief to patients with chronic hives

2023-11-13
Annals of Internal Medicine Tip Sheet @Annalsofim Below please find summaries of new articles that will be published in the next issue of Annals of Internal Medicine. The summaries are not intended to substitute for the full articles as a source of information. This information is under strict embargo and by taking it into possession, media representatives are committing to the terms of the embargo not only on their own behalf, but also on behalf of the organization they represent. ---------------------------- 1. Acupuncture may offer limited relief to patients with chronic hives   Abstract: https://www.acpjournals.org/doi/10.7326/M23-1043 Editorial: ...

Virologic rebound observed in 20% of patients treated with nirmatrelvir-ritonavir

2023-11-13
Embargoed for release until 5:00 p.m. ET on Monday 13 November 2023  Annals of Internal Medicine Tip Sheet  @Annalsofim  Below please find summaries of new articles that will be published in the next issue of Annals of Internal Medicine. The summaries are not intended to substitute for the full articles as a source of information. This information is under strict embargo and by taking it into possession, media representatives are committing to the terms of the embargo not only on their own behalf, but also on behalf ...

One in five patients experience rebound COVID after taking Paxlovid, new study finds

2023-11-13
A new study by investigators from Mass General Brigham found that one in five individuals taking Nirmatrelvir-ritonavir therapy, commonly known as Paxlovid, to treat severe symptoms of COVID-19, experienced a positive test result and shedding of live and potentially contagious virus following an initial recovery and negative test—a phenomenon known as virologic rebound. By contrast, people not taking Paxlovid only experienced rebound about 2 percent of the time. Results are published in Annals of Internal Medicine. “We conducted this study to address lingering questions about Paxlovid and virologic rebound in COVID-19 treatment,” said corresponding ...

Scientists discover key to a potential natural cancer treatment’s potency

Scientists discover key to a potential natural cancer treatment’s potency
2023-11-13
JUPITER, Fla. — Slumbering among thousands of bacterial strains in a collection of natural specimens at The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, several fragile vials held something unexpected, and possibly very useful. Writing in the journal Nature Chemical Biology, a team led by chemist Ben Shen, Ph.D., described discovery of two new enzymes, ones with uniquely useful properties that could help in the fight against human diseases including cancer. The discovery, published ...

Mount Sinai researchers find more than 4,700 gene clusters crucial for prognosis in 32 cancer types

Mount Sinai researchers find more than 4,700 gene clusters crucial for prognosis in 32 cancer types
2023-11-13
New York, NY (November 13, 2023)—Researchers at the Mount Sinai Center for Transformative Disease Modeling have released a groundbreaking study identifying 4,749 key gene clusters, termed “prognostic modules,” that significantly influence the progression of 32 different types of cancer. The study, published in Genome Research, serves as a comprehensive resource and lays the foundation for the development of next-generation cancer treatments and diagnostic markers. Despite significant progress in cancer research, understanding the disease's genetic intricacies ...

Ammonia fuel offers great benefits but demands careful action

2023-11-13
Ammonia, a main component of many fertilizers, could play a key role in a carbon-free fuel system as a convenient way to transport and store clean hydrogen. The chemical, made of hydrogen and nitrogen (NH3), can also itself be burned as a zero-carbon fuel. However, new research led by Princeton University illustrates that even though it may not be a source of carbon pollution, ammonia’s widespread use in the energy sector could pose a grave risk to the nitrogen cycle and climate without proper engineering precautions. Publishing their findings in PNAS, the interdisciplinary team of 12 researchers found that a well-engineered ammonia economy could help the world achieve ...

Low-intensity fires reduce wildfire risk by 60%, study finds

2023-11-13
November 13, 2023-- There is no longer any question of how to prevent high-intensity, often catastrophic, wildfires that have become increasingly frequent across the Western U.S., according to a new study by researchers at Columbia University Mailman School of Public Health and Stanford University. The analysis reveals that low-intensity burning, such as controlled or prescribed fires, managed wildfires, and tribal cultural burning, can dramatically reduce the risk of devastating fires for years at a time. The findings are some of the first to rigorously quantify the value of low-intensity fire and be released while Congress is reassessing ...

Astrophysicist uses NSF funding to grow the number of deaf, hard-of-hearing, and Hispanic researchers

2023-11-13
Astrophysicist Jason Nordhaus is breaking cultural and disciplinary boundaries by helping to grow the number of deaf, hard-of-hearing, and Hispanic researchers. And, in doing so, he is enabling these future scientists to drive discoveries in one of his areas of expertise—neutron star astrophysics.  Nordhaus, an associate professor of physics at Rochester Institute of Technology’s National Technical Institute for the Deaf, has earned a National Science Foundation grant that connects NTID with Texas Tech University, a Hispanic Serving Institution. Through a series of unique summer research exchanges ...

A ‘fish cartel’ for Africa could benefit the countries, and their seas

A ‘fish cartel’ for Africa could benefit the countries, and their seas
2023-11-13
Banding together to sell fishing rights could generate economic benefits for African countries, which receive far less from access to their fisheries on the global market than other countries do from theirs. By joining forces, UC Santa Barbara researchers say in a paper published in the journal Nature Communications, African fisheries would not just secure more competitive access fees, they could also protect their seas’ biodiversity. “If African countries created a ‘fish cartel’ to sell fishing rights to foreign vessels, they could increase their fish biomass by 16% and make 23% more in profits,” ...

Absorbable scaffold outperforms angioplasty for lower-leg artery disease

2023-11-13
In patients with severe artery blockage in the lower leg, an artery-supporting device called a resorbable scaffold is superior to angioplasty, which has been the standard treatment, according to the results of a large international clinical trial co-led by researchers at Weill Cornell Medicine and NewYork-Presbyterian. Angioplasty involves the widening of a narrowed artery with a small, balloon-like mechanism. A resorbable scaffold is a stent-like structure that props the artery open but is biodegradable and dissolves within a few years, avoiding some of the potential complications of a permanent ...

LAST 30 PRESS RELEASES:

From camera to lab: Dr. Etienne Sibille transforms brain aging and depression research

Depression rates in LGBTQIA+ students are three times higher than their peers, new research suggests

Most parents don’t ask about firearms in the homes their kids visit

Beer-only drinkers’ diets are worse than wine drinkers

Eco-friendly biomass pretreatment method yields efficient biofuels and adsorbents

How graph convolutions amplify popularity bias for recommendation?

New lignin-based hydrogel breakthrough for wound healing and controlled drug release

Enhancing compatibility and biodegradability of PLA/biomass composites via forest residue torrefaction

Time alone heightens ‘threat alert’ in teenagers – even when connecting on social media

Study challenges long-held theories on how migratory birds navigate 

Unlocking the secrets of ketosis

AI analysis of PET/CT images can predict side effects of immunotherapy in lung cancer

Making an impact. Research studies a new side of helmet safety: faceguard failures

Specific long term condition combinations have major role in NHS ‘winter pressures’

Men often struggle with transition to fatherhood amid lack of targeted information and support

More green space linked to fewer preventable deaths in most deprived areas of UK

Immunotherapy drug pembrolizumab improves outcomes for patients with soft tissue sarcoma

A formula for life? New model calculates chances of intelligent beings in our Universe and beyond

Could a genetic flaw be the key to stopping people craving sugary treats?

Experts urge complex systems approach to assess A.I. risks

Fossil fuel CO2 emissions increase again in 2024

Winners of Applied Microbiology International Horizon Awards 2024 announced

A toolkit for unraveling the links between intimate partner violence, trauma and substance misuse

Can everyday physical activity improve cognitive health in middle age?

Updated guidance reaffirms CPR with breaths essential for cardiac arrest following drowning

Study reveals medical boards rarely discipline physician misinformation

New treatment helps children with rare spinal condition regain ability to walk

'Grow Your Own' teacher prep pipeline at the University of Louisiana at Lafayette funded by US Department of Education

Lab-grown human immune system uncovers weakened response in cancer patients

More than 5 million Americans would be eligible for psychedelic therapy, study finds

[Press-News.org] University of Toronto Engineering study finds bigger datasets might not always be better for AI models
New research on materials science datasets shows the amount of training data can be significantly reduced with minimal impact on the performance of the model