PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Learning the language of molecules to predict their properties

This AI system only needs a small amount of data to predict molecular properties, which could speed up drug discovery and material development.

Learning the language of molecules to predict their properties
2023-07-07
(Press-News.org)

Discovering new materials and drugs typically involves a manual, trial-and-error process that can take decades and cost millions of dollars. To streamline this process, scientists often use machine learning to predict molecular properties and narrow down the molecules they need to synthesize and test in the lab.

Researchers from MIT and the MIT-Watson AI Lab have developed a new, unified framework that can simultaneously predict molecular properties and generate new molecules much more efficiently than these popular deep-learning approaches.

To teach a machine-learning model to predict a molecule’s biological or mechanical properties, researchers must show it millions of labeled molecular structures — a process known as training. Due to the expense of discovering molecules and the challenges of hand-labeling millions of structures, large training datasets are often hard to come by, which limits the effectiveness of machine-learning approaches.

By contrast, the system created by the MIT researchers can effectively predict molecular properties using only a small amount of data. Their system has an underlying understanding of the rules that dictate how building blocks combine to produce valid molecules. These rules capture the similarities between molecular structures, which helps the system generate new molecules and predict their properties in a data-efficient manner.

This method outperformed other machine-learning approaches on both small and large datasets, and was able to accurately predict molecular properties and generate viable molecules when given a dataset with fewer than 100 samples. 

“Our goal with this project is to use some data-driven methods to speed up the discovery of new molecules, so you can train a model to do the prediction without all of these cost-heavy experiments,” says lead author Minghao Guo, a computer science and electrical engineering (EECS) graduate student.

Guo’s co-authors include MIT-IBM Watson AI Lab research staff members Veronika Thost, Payel Das, and Jie Chen; recent MIT graduates Samuel Song ’23 and Adithya Balachandran ’23; and senior author Wojciech Matusik, a professor of electrical engineering and computer science and a member of the MIT-IBM Watson AI Lab, who leads the Computational Design and Fabrication Group within the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the International Conference for Machine Learning.

Learning the language of molecules

To achieve the best results with machine-learning models, scientists need training datasets with millions of molecules that have similar properties to those they hope to discover. In reality, these domain-specific datasets are usually very small. So, researchers use models that have been pretrained on large datasets of general molecules, which they apply to a much smaller, targeted dataset. However, because these models haven’t acquired much domain-specific knowledge, they tend to perform poorly.

The MIT team took a different approach. They created a machine-learning system that automatically learns the “language” of molecules — what is known as a molecular grammar — using only a small, domain-specific dataset. It uses this grammar to construct viable molecules and predict their properties.

In language theory, one generates words, sentences, or paragraphs based on a set of grammar rules. You can think of a molecular grammar the same way. It is a set of production rules that dictate how to generate molecules or polymers by combining atoms and substructures.

Just like a language grammar, which can generate a plethora of sentences using the same rules, one molecular grammar can represent a vast number of molecules. Molecules with similar structures use the same grammar production rules, and the system learns to understand these similarities. 

Since structurally similar molecules often have similar properties, the system uses its underlying knowledge of molecular similarity to predict properties of new molecules more efficiently.  

“Once we have this grammar as a representation for all the different molecules, we can use it to boost the process of property prediction,” Guo says.

The system learns the production rules for a molecular grammar using reinforcement learning — a trial-and-error process where the model is rewarded for behavior that gets it closer to achieving a goal. 

But because there could be billions of ways to combine atoms and substructures, the process to learn grammar production rules would be too computationally expensive for anything but the tiniest dataset. 

The researchers decoupled the molecular grammar into two parts. The first part, called a metagrammar, is a general, widely applicable grammar they design manually and give the system at the outset. Then it only needs to learn a much smaller, molecule-specific grammar from the domain dataset. This hierarchical approach speeds up the learning process.

Big results, small datasets

In experiments, the researchers’ new system simultaneously generated viable molecules and polymers, and predicted their properties more accurately than several popular machine-learning approaches, even when the domain-specific datasets had only a few hundred samples. Some other methods also required a costly pretraining step that the new system avoids. 

The technique was especially effective at predicting physical properties of polymers, such as the glass transition temperature, which is the temperature required for a material to transition from solid to liquid. Obtaining this information manually is often extremely costly because the experiments require extremely high temperatures and pressures.

To push their approach further, the researchers cut one training set down by more than half — to just 94 samples. Their model still achieved results that were on par with methods trained using the entire dataset.

“This grammar-based representation is very powerful. And because the grammar itself is a very general representation, it can be deployed to different kinds of graph-form data. We are trying to identify other applications beyond chemistry or material science,” Guo says.

In the future, they also want to extend their current molecular grammar to include the 3D geometry of molecules and polymers, which is key to understanding the interactions between polymer chains. They are also developing an interface that would show a user the learned grammar production rules and solicit feedback to correct rules that may be wrong, boosting the accuracy of the system.

This work is funded, in part, by the MIT-IBM Watson AI Lab and its member company, Evonik.

###

Written by Adam Zewe


Paper: “Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction”

https://openreview.net/pdf?id=SGQi3LgFnqj

END


[Attachments] See images for this press release:
Learning the language of molecules to predict their properties Learning the language of molecules to predict their properties 2 Learning the language of molecules to predict their properties 3

ELSE PRESS RELEASES FROM THIS DATE:

Previously unidentified proteins suggest new way to diagnose ovarian cancer

Previously unidentified proteins suggest new way to diagnose ovarian cancer
2023-07-07
A study led by Nagoya University in Japan has identified three previously unknown membrane proteins in ovarian cancer. Using a unique technology consisting of nanowires with a polyketone coating, the group succeeded in capturing the proteins, demonstrating a new detection method for identification of ovarian cancer.     The discovery of new biomarkers is important for detecting ovarian cancer, as the disease is difficult to detect in its early stages where it can most easily be treated. One ...

Uncovering secrets of plant regeneration

Uncovering secrets of plant regeneration
2023-07-07
Ikoma, Japan – Plants have the unique ability to regenerate entirely from a somatic cell, i.e., an ordinary cell that does not typically participate in reproduction. This process involves the de novo (or new) formation of a shoot apical meristem (SAM) that gives rise to lateral organs, which are key for the plant’s reconstruction. At the cellular level, SAM formation is tightly regulated by either positive or negative regulators (genes/protein molecules) that may induce or restrict shoot regeneration, respectively. But which molecules are involved? Are there other regulatory layers that are yet to be uncovered? To seek answers to the above questions, a research group led by Nara ...

CT with CTA versus MRI in patients with dizziness

CT with CTA versus MRI in patients with dizziness
2023-07-07
Leesburg, VA, July 7, 2023—According to an accepted manuscript published in the American Journal of Roentgenology (AJR), patients discharged from the emergency department (ED) after CT with CTA alone could have benefitted from an alternative or additional MRI evaluation, including using a specialized abbreviated protocol for the modality. Compared with those patients discharged after CT with CTA only, “the use of MRI in select patients presenting to the ED with dizziness was associated with greater frequency of critical neuroimaging results, greater use of echocardiography, ...

Researchers uncover how a genetic mutation can cause individuals with normal cholesterol levels to develop coronary artery disease at a young age

2023-07-07
A novel molecular pathway to explain how a mutation in the gene ACTA2 can cause individuals in their 30s – with normal cholesterol levels and no other risk factors — to develop coronary artery disease has been identified, according to researchers with UTHealth Houston. The study was published in the European Heart Journal. “The gene ACTA2 codes a specific protein that has nothing to do with cholesterol,” said Dianna Milewicz, MD, PhD, senior author of the study and professor and director of the Division of Medical Genetics at McGovern Medical School at UTHealth ...

Pain risk varies significantly across states

2023-07-07
BUFFALO, N.Y. – The prevalence of moderate or severe joint pain due to arthritis varies strikingly across American states, ranging from 6.9% of the population in Minnesota to 23.1% in West Virginia, according to a new study led by a University at Buffalo researcher. The paper published in the journal PAIN is providing new insights − through its novel combination of individual- and macro-level measures − into geographic differences in pain and their causes. “The risk of joint pain is over three times higher in some states compared to others, with states in the South, ...

The American Society for Nutrition appoints Steven A. Abrams, MD as Next Editor-in-Chief of Advances in Nutrition

2023-07-07
Rockville, MD (July 7, 2023) – Steven A. Abrams, MD, Professor in the Department of Pediatrics at the University of Texas at Austin Dell Medical School has been named the next Editor-in-Chief of Advances in Nutrition. Dr. Abrams is a globally recognized leader in pediatric nutrition whose scientific contributions have helped establish the evidence base on nutrient requirements in infancy, childhood, and adolescence. Advances in Nutrition is the American Society for Nutrition’s journal that publishes reviews spanning basic, translational, ...

New study suggests blood plasma proteins hold answers to better understanding long COVID

2023-07-07
LONDON, ON – Recently published in The Journal of Translational Medicine, a team at Lawson Health Research Institute has discovered unique patterns of blood plasma proteins in patients with long COVID that could reveal potential drug targets to improve patient outcomes. Currently, 10-20 per cent of people with a confirmed case of COVID-19 will be diagnosed with long COVID. “Those patients experience a wide variety of symptoms, which may include fatigue, brain fog and difficulty breathing,” says Dr. Douglas Fraser, Lawson Scientist ...

Ticks may be able to spread chronic wasting disease between Wisconsin deer

Ticks may be able to spread chronic wasting disease between Wisconsin deer
2023-07-07
Madison ­— A new study from researchers at the University of Wisconsin–Madison finds that ticks can harbor transmissible amounts of the protein particle that causes Chronic Wasting Disease (CWD), implicating the parasites as possible agents in the disease’s spread between deer in Wisconsin. Her findings were published in the journal Nature.               CWD is caused by a pathogenic agent called a prion, which can pass from deer-to-deer through contact with things like prion-contaminated soil and infected ...

Doom-and-gloom climate news may scare but also encourage audiences

2023-07-07
UNIVERSITY PARK, Pa. — A team of Penn State researchers investigated how seeing frightening news about climate change day after day may shape the way people feel about the phenomenon and how willing they are to take action to address it. Christofer Skurka, Jessica Myrick and graduate student Yin Yang found that seeing bad news about climate change can make people more afraid over time, but it also may encourage audiences to think about what society can do to address the problem. They published the results of two separate studies in an article titled “Fanning the flames or burning out? Testing competing hypotheses ...

Name of Portuguese astrophysicist shines in the night sky

Name of Portuguese astrophysicist shines in the night sky
2023-07-07
The International Astronomical Union (IAU) has named an asteroid after Pedro Machado, astrophysicist at Institute of Astrophysics and Space Sciences (IA), at the Faculty of Sciences of the University of Lisbon (Portugal). Along with the nomination of Pedro Machado, there were over a hundred other nominations of asteroids and other small bodies. It is almost three kilometers in diameter and takes four and a half years to complete its orbit around the sun. We’re talking about 2001 QL160, or rather the asteroid 32599 Pedromachado. Pedro Machado has been honored by the Work group for the Nomenclature of Small Bodies (WGSBN 2) of the International Astronomical ...

LAST 30 PRESS RELEASES:

School-based program for newcomer students boosts mental health, research shows

Adding bridges to stabilize quantum networks

Major uncertainties remain about impact of treatment for gender related distress

Likely 50-fold rise in prevalence of gender related distress from 2011-21 in England

US college graduates live an average of 11 years longer than those who never finish high school

Scientists predict what will be top of the crops in UK by 2080 due to climate change

Study: Physical function of patients at discharge linked to hospital readmission rates

7 schools awarded financial grants to fuel student well-being

NYU Tandon research to improve emergency responses in urban areas with support from NVIDIA

Marcus Freeman named 2024 Paul “Bear” Bryant Coach of the Year

How creating and playing terrific video games can accelerate the battle against cancer

Rooting for resistance: How soybeans tackle nematode invaders is no secret anymore

Beer helps grocery stores tap sales in other categories

New USF study: Surprisingly, pulmonary fibrosis patients with COVID-19 improve

In a landmark study, an NYBG scientist and colleagues find that reforestation stands out among plant-based climate-mitigation strategies as most beneficial for wildlife biodiversity

RSClin® Tool N+ gives more accurate estimates of recurrence risk and individual chemotherapy benefit in node-positive breast cancer

Terahertz pulses induce chirality in a non-chiral crystal

AI judged to be more compassionate than expert crisis responders: Study

Scale-up fabrication of perovskite quantum dots

Adverse childhood experiences influence potentially dangerous firearm-related behavior in adulthood

Bacteria found to eat forever chemicals — and even some of their toxic byproducts

London cabbies’ planning strategies could help inform future of AI

More acidic oceans may affect the sex of oysters

Transportation insecurity in Detroit and beyond

New tool enables phylogenomic analyses of entire genomes

Uncovering the role of Y chromosome genes in male fertility in mice

A single gene underlies male mating morphs in ruff sandpipers

Presenting CASTER – a novel method for evolutionary research

Reforestation boosts biodiversity, while other land-based climate mitigation strategies fall short

Seasonal vertical migrations limit role of krill in deep-ocean carbon storage

[Press-News.org] Learning the language of molecules to predict their properties
This AI system only needs a small amount of data to predict molecular properties, which could speed up drug discovery and material development.