PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

University of Toronto Engineering study finds bigger datasets might not always be better for AI models

New research on materials science datasets shows the amount of training data can be significantly reduced with minimal impact on the performance of the model

2023-11-13
(Press-News.org) From ChatGPT to DALL-E, deep learning artificial intelligence (AI) algorithms are being applied to an ever-growing range of fields. A new study from University of Toronto Engineering researchers, published in Nature Communications, suggests that one of the fundamental assumptions of deep learning models — that they require enormous amounts of training data — may not be as solid as once thought.   

Professor Jason Hattrick-Simpers and his team are focused on the design of next-generation materials, from catalysts that convert captured carbon into fuels to non-stick surfaces that keep airplane wings ice-free.  

One of the challenges in the field is the enormous potential search space. For example, the Open Catalyst Project contains more than 200 million data points for potential catalyst materials, all of which still cover only a tiny portion of the vast chemical space that may, for example, hide the right catalyst to help us address climate change.  

“AI models can help us efficiently search this space and narrow our choices down to those families of materials that will be most promising,” says Hattrick-Simpers.  

“Traditionally, a significant amount of data is considered necessary to train accurate AI models. But a dataset like the one from the Open Catalyst Project is so large that you need very powerful supercomputers to be able to tackle it. So, there’s a question of equity; we need to find a way to identify smaller datasets that folks without access to huge amounts of computing power can train their models on.”  

But this leads to a second challenge: many of the smaller materials datasets currently available have been developed for a specific domain — for example, improving the performance of battery electrodes.  

This means that they tend to cluster around a few chemical compositions similar to those already in use today and may be missing possibilities that could be more promising, but less intuitively obvious.  

“Imagine if you wanted to build a model to predict students’ final grades based on previous test scores,” says Dr. Kangming Li, a postdoctoral fellow in Hattrick-Simpers’ lab.  

“If you trained it only on students from Canada, it might do perfectly well in that context, but it might fail to accurately predict grades for students from France or Japan. That’s the situation we are up against in the world of materials.”  

One possible solution to address the above challenges is to identify subsets of data from within very large datasets that are easier to process, but which nevertheless retain the full range of information and diversity present in the original.  

To better understand how the qualities of datasets affect the models they are used to train, Li designed methods to identify high-quality subsets of data from previously published materials datasets, such as JARVIS, The Materials Project, and the Open Quantum Materials Database (OQMD). Together, these databases contain information on more than a million different materials.  

Li built a computer model that predicted material properties and trained it in two ways: one used the original dataset, but the other used a subset of that same data that was approximately 95% smaller.   

“What we found was that when trying to predict the properties of a material that was contained within the domain of the dataset, the model that had been trained on only 5% of the data performed about the same as the one that had been trained on all the data,” says Li.  

“Conversely, when trying to predict the properties of a material that was outside the domain of the dataset, both of them did similarly poorly.”  

Li says that the findings suggest a way of measuring the amount of redundancy in a given dataset: if more data does not improve model performance, it could be an indicator that those additional data are redundant and do not provide new information for the models to learn.   

“Our results also reveal a concerning degree of redundancy hidden within these highly sought-after large datasets,” says Li.    

The study also underlines what AI experts from many fields are finding to be true: that even models trained on relatively small datasets can perform well if the data is of high enough quality.  

“All this grew out of the fact that in terms of using AI to speed up materials discovery, we’re just getting started,” says Hattrick-Simpers.  

“What it suggests is that as we go forward, we need to be really thoughtful about how we build our datasets. That’s true whether it’s done from the top down, as in selecting a subset of data from a much larger dataset, or from the bottom up, as in sampling new materials to include.  

“We need to pay attention to the information richness, rather than just gathering as much data as we can.” 

END


ELSE PRESS RELEASES FROM THIS DATE:

Acupuncture may offer limited relief to patients with chronic hives

2023-11-13
Annals of Internal Medicine Tip Sheet @Annalsofim Below please find summaries of new articles that will be published in the next issue of Annals of Internal Medicine. The summaries are not intended to substitute for the full articles as a source of information. This information is under strict embargo and by taking it into possession, media representatives are committing to the terms of the embargo not only on their own behalf, but also on behalf of the organization they represent. ---------------------------- 1. Acupuncture may offer limited relief to patients with chronic hives   Abstract: https://www.acpjournals.org/doi/10.7326/M23-1043 Editorial: ...

Virologic rebound observed in 20% of patients treated with nirmatrelvir-ritonavir

2023-11-13
Embargoed for release until 5:00 p.m. ET on Monday 13 November 2023  Annals of Internal Medicine Tip Sheet  @Annalsofim  Below please find summaries of new articles that will be published in the next issue of Annals of Internal Medicine. The summaries are not intended to substitute for the full articles as a source of information. This information is under strict embargo and by taking it into possession, media representatives are committing to the terms of the embargo not only on their own behalf, but also on behalf ...

One in five patients experience rebound COVID after taking Paxlovid, new study finds

2023-11-13
A new study by investigators from Mass General Brigham found that one in five individuals taking Nirmatrelvir-ritonavir therapy, commonly known as Paxlovid, to treat severe symptoms of COVID-19, experienced a positive test result and shedding of live and potentially contagious virus following an initial recovery and negative test—a phenomenon known as virologic rebound. By contrast, people not taking Paxlovid only experienced rebound about 2 percent of the time. Results are published in Annals of Internal Medicine. “We conducted this study to address lingering questions about Paxlovid and virologic rebound in COVID-19 treatment,” said corresponding ...

Scientists discover key to a potential natural cancer treatment’s potency

Scientists discover key to a potential natural cancer treatment’s potency
2023-11-13
JUPITER, Fla. — Slumbering among thousands of bacterial strains in a collection of natural specimens at The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, several fragile vials held something unexpected, and possibly very useful. Writing in the journal Nature Chemical Biology, a team led by chemist Ben Shen, Ph.D., described discovery of two new enzymes, ones with uniquely useful properties that could help in the fight against human diseases including cancer. The discovery, published ...

Mount Sinai researchers find more than 4,700 gene clusters crucial for prognosis in 32 cancer types

Mount Sinai researchers find more than 4,700 gene clusters crucial for prognosis in 32 cancer types
2023-11-13
New York, NY (November 13, 2023)—Researchers at the Mount Sinai Center for Transformative Disease Modeling have released a groundbreaking study identifying 4,749 key gene clusters, termed “prognostic modules,” that significantly influence the progression of 32 different types of cancer. The study, published in Genome Research, serves as a comprehensive resource and lays the foundation for the development of next-generation cancer treatments and diagnostic markers. Despite significant progress in cancer research, understanding the disease's genetic intricacies ...

Ammonia fuel offers great benefits but demands careful action

2023-11-13
Ammonia, a main component of many fertilizers, could play a key role in a carbon-free fuel system as a convenient way to transport and store clean hydrogen. The chemical, made of hydrogen and nitrogen (NH3), can also itself be burned as a zero-carbon fuel. However, new research led by Princeton University illustrates that even though it may not be a source of carbon pollution, ammonia’s widespread use in the energy sector could pose a grave risk to the nitrogen cycle and climate without proper engineering precautions. Publishing their findings in PNAS, the interdisciplinary team of 12 researchers found that a well-engineered ammonia economy could help the world achieve ...

Low-intensity fires reduce wildfire risk by 60%, study finds

2023-11-13
November 13, 2023-- There is no longer any question of how to prevent high-intensity, often catastrophic, wildfires that have become increasingly frequent across the Western U.S., according to a new study by researchers at Columbia University Mailman School of Public Health and Stanford University. The analysis reveals that low-intensity burning, such as controlled or prescribed fires, managed wildfires, and tribal cultural burning, can dramatically reduce the risk of devastating fires for years at a time. The findings are some of the first to rigorously quantify the value of low-intensity fire and be released while Congress is reassessing ...

Astrophysicist uses NSF funding to grow the number of deaf, hard-of-hearing, and Hispanic researchers

2023-11-13
Astrophysicist Jason Nordhaus is breaking cultural and disciplinary boundaries by helping to grow the number of deaf, hard-of-hearing, and Hispanic researchers. And, in doing so, he is enabling these future scientists to drive discoveries in one of his areas of expertise—neutron star astrophysics.  Nordhaus, an associate professor of physics at Rochester Institute of Technology’s National Technical Institute for the Deaf, has earned a National Science Foundation grant that connects NTID with Texas Tech University, a Hispanic Serving Institution. Through a series of unique summer research exchanges ...

A ‘fish cartel’ for Africa could benefit the countries, and their seas

A ‘fish cartel’ for Africa could benefit the countries, and their seas
2023-11-13
Banding together to sell fishing rights could generate economic benefits for African countries, which receive far less from access to their fisheries on the global market than other countries do from theirs. By joining forces, UC Santa Barbara researchers say in a paper published in the journal Nature Communications, African fisheries would not just secure more competitive access fees, they could also protect their seas’ biodiversity. “If African countries created a ‘fish cartel’ to sell fishing rights to foreign vessels, they could increase their fish biomass by 16% and make 23% more in profits,” ...

Absorbable scaffold outperforms angioplasty for lower-leg artery disease

2023-11-13
In patients with severe artery blockage in the lower leg, an artery-supporting device called a resorbable scaffold is superior to angioplasty, which has been the standard treatment, according to the results of a large international clinical trial co-led by researchers at Weill Cornell Medicine and NewYork-Presbyterian. Angioplasty involves the widening of a narrowed artery with a small, balloon-like mechanism. A resorbable scaffold is a stent-like structure that props the artery open but is biodegradable and dissolves within a few years, avoiding some of the potential complications of a permanent ...

LAST 30 PRESS RELEASES:

CT scans unwrap secrets of ancient Egyptian life

Clinical data gaps keeping life-saving antibiotics from children

For people with traumatic brain injury and their caregivers, recovery of basic communication is an “acceptable” outcome

Insilico Medicine receives USD 5 million milestone payment from Menarini Group following First-in-Human (FIH) achievement for MEN2501

Oxygen-modified graphene filters boost natural gas purification

A new thermoelectric material to convert waste heat to electricity

Restricting mothers' migration: New evidence on children’s health and education

Why aren’t more older adults getting flu or COVID-19 shots?

From leadership to influencers: New ASU study shows why we choose to follow others

‘Celtic curse’ genetic disease hotspots revealed in UK and Ireland

Study reveals two huge hot blobs of rock influence Earth’s magnetic field

RCT demonstrates effectiveness of mylovia, a digital therapy for female sexual dysfunction

Wistar scientists demonstrate first-ever single-shot HIV vaccine neutralization success

Medical AI models need more context to prepare for the clinic

Psilocybin shows context-dependent effects on social behavior and inflammation in female mice modeling anorexia

Mental health crisis: Global surveys expose who falls through the cracks and how to catch them

New boron compounds pave the way for easier drug development

Are cats ‘vegan’ meat eaters? Study finds why isotopic fingerprint of cat fur could trick us into thinking that way

Unexpected partial recovery of natural vision observed after intracortical microstimulation in a blind patient

From sea to soil: Molecular changes suggest how algae evolved into plants

Landmark study to explore whether noise levels in nurseries affect babies’ language development

Everyday diabetes medicine could treat common cause of blindness

Ultra-thin metasurface chip turns invisible infrared light into steerable visible beams

Cluster radioactivity in extreme laser fields: A theoretical exploration

Study finds banning energy disconnections shouldn’t destabilise markets

Researchers identify novel RNA linked to cancer patient survival

Poverty intervention program in Bangladesh may reinforce gender gaps, study shows

Novel approach to a key biofuel production step captures an elusive energy source

‘Ghost’ providers hinder access to health care for Medicaid patients

Study suggests far fewer cervical cancer screenings are needed for HPV‑vaccinated women

[Press-News.org] University of Toronto Engineering study finds bigger datasets might not always be better for AI models
New research on materials science datasets shows the amount of training data can be significantly reduced with minimal impact on the performance of the model