PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Could LLMs help design our next medicines and materials?

A new method lets users ask, in plain language, for a new molecule with certain properties, and receive a detailed description of how to synthesize it

2025-04-09
(Press-News.org) CAMBRIDGE, MA – The process of discovering molecules that have the properties needed to create new medicines and materials is cumbersome and expensive, consuming vast computational resources and months of human labor to narrow down the enormous space of potential candidates.

Large language models (LLMs) like ChatGPT could streamline this process, but enabling an LLM to understand and reason about the atoms and bonds that form a molecule, the same way it does with words that form sentences, has presented a scientific stumbling block.

Researchers from MIT and the MIT-IBM Watson AI Lab created a promising approach that augments an LLM with other machine-learning models known as graph-based models, which are specifically designed for generating and predicting molecular structures.

Their method employs a base LLM to interpret natural language queries specifying desired molecular properties. It automatically switches between the base LLM and graph-based AI modules to design the molecule, explain the rationale, and generate a step-by-step plan to synthesize it. It interleaves text, graph, and synthesis step generation, combining words, graphs, and reactions into a common vocabulary for the LLM to consume.

When compared to existing LLM-based approaches, this multimodal technique generated molecules that better matched user specifications and were more likely to have a valid synthesis plan, improving the success ratio from 5 percent to 35 percent.

It also outperformed LLMs that are more than 10 times its size and that design molecules and synthesis routes only with text-based representations, suggesting multimodality is key to the new system’s success.

“This could hopefully be an end-to-end solution where, from start to finish, we would automate the entire process of designing and making a molecule. If an LLM could just give you the answer in a few seconds, it would be a huge time-saver for pharmaceutical companies,” says Michael Sun, an MIT graduate student and co-author of a paper on this technique.

Sun’s co-authors include lead author Gang Liu, a graduate student at the University of Notre Dame; Wojciech Matusik, a professor of electrical engineering and computer science at MIT who leads the Computational Design and Fabrication Group within the Computer Science and Artificial Intelligence Laboratory (CSAIL); Meng Jiang, associate professor at the University of Notre Dame; and senior author Jie Chen, a senior research scientist and manager in the MIT-IBM Watson AI Lab. The research will be presented at the International Conference on Learning Representations.

Best of both worlds

Large language models aren’t built to understand the nuances of chemistry, which is one reason they struggle with inverse molecular design, a process of identifying molecular structures that have certain functions or properties.

LLMs convert text into representations called tokens, which they use to sequentially predict the next word in a sentence. But molecules are “graph structures,” composed of atoms and bonds with no particular ordering, making them difficult to encode as sequential text.

On the other hand, powerful graph-based AI models represent atoms and molecular bonds as interconnected nodes and edges in a graph. While these models are popular for inverse molecular design, they require complex inputs, can’t understand natural language, and yield results that can be difficult to interpret.

The MIT researchers combined an LLM with graph-based AI models into a unified framework that gets the best of both worlds.

Llamole, which stands for large language model for molecular discovery, uses a base LLM as a gatekeeper to understand a user’s query — a plain-language request for a molecule with certain properties.

For instance, perhaps a user seeks a molecule that can penetrate the blood-brain barrier and inhibit HIV, given that it has a molecular weight of 209 and certain bond characteristics.

As the LLM predicts text in response to the query, it switches between graph modules.

One module uses a graph diffusion model to generate the molecular structure conditioned on input requirements. A second module uses a graph neural network to encode the generated molecular structure back into tokens for the LLMs to consume. The final graph module is a graph reaction predictor which takes as input an intermediate molecular structure and predicts a reaction step, searching for the exact set of steps to make the molecule from basic building blocks.

The researchers created a new type of trigger token that tells the LLM when to activate each module. When the LLM predicts a “design” trigger token, it switches to the module that sketches a molecular structure, and when it predicts a “retro” trigger token, it switches to the retrosynthetic planning module that predicts the next reaction step.

“The beauty of this is that everything the LLM generates before activating a particular module gets fed into that module itself. The module is learning to operate in a way that is consistent with what came before,” Sun says.

In the same manner, the output of each module is encoded and fed back into the generation process of the LLM, so it understands what each module did and will continue predicting tokens based on those data.

Better, simpler molecular structures

In the end, Llamole outputs an image of the molecular structure, a textual description of the molecule, and a step-by-step synthesis plan that provides the details of how to make it, down to individual chemical reactions.

In experiments involving designing molecules that matched user specifications, Llamole outperformed 10 standard LLMs, four fine-tuned LLMs, and a state-of-the-art domain-specific method. At the same time, it boosted the retrosynthetic planning success rate from 5 percent to 35 percent by generating molecules that are higher-quality, which means they had simpler structures and lower-cost building blocks.

“On their own, LLMs struggle to figure out how to synthesize molecules because it requires a lot of multistep planning. Our method can generate better molecular structures that are also easier to synthesize,” Liu says.

To train and evaluate Llamole, the researchers built two datasets from scratch since existing datasets of molecular structures didn’t contain enough details. They augmented hundreds of thousands of patented molecules with AI-generated natural language descriptions and customized description templates.

The dataset they built to fine-tune the LLM includes templates related to 10 molecular properties, so one limitation of Llamole is that it is trained to design molecules considering only those 10 numerical properties.

In future work, the researchers want to generalize Llamole so it can incorporate any molecular property. In addition, they plan to improve the graph modules to boost Llamole’s retrosynthesis success rate.

And in the long run, they hope to use this approach to go beyond molecules, creating multimodal LLMs that can handle other types of graph-based data, such as interconnected sensors in a power grid or transactions in a financial market.

“Llamole demonstrates the feasibility of using large language models as an interface to complex data beyond textual description, and we anticipate them to be a foundation that interacts with other AI algorithms to solve any graph problems,” says Chen.

###

This research is funded, in part, by the MIT-IBM Watson AI Lab, the National Science Foundation, and the Office of Naval Research.

END


ELSE PRESS RELEASES FROM THIS DATE:

Advanced genome sequencing enables genetic diagnosis for complex psychiatric conditions

2025-04-09
San Diego—April 9, 2025– In a manuscript published today in the American Journal of Psychiatry titled Long-Read Genome Sequencing in Clinical Psychiatry: RFX3 Haploinsufficiency in a Hospitalized Adolescent With Autism, Intellectual Disability, and Behavioral Decompensation, authors describe how they leveraged long-read genomic sequencing (LRS) to make a genetic diagnosis in a17-year-old male with autism spectrum disorder, intellectual disability, and acute behavioral decompensation that would not have been possible by standard methods. Through the use of LRS, a cutting-edge technology ...

Thoracic autonomic nervous system surgery current application—a survey among members of the European Society of Thoracic Surgeons

2025-04-09
Background: Thoracic autonomic nervous system surgery is mainly used for hyperhidrosis/facial flushing, whereas cardiac and vascular indications are limited. The literature remains controversial regarding the correct indications and surgical technique, with the lack of homogeneous data being a major limitation. We designed a survey to investigate current practice among members of the European Society of Thoracic Surgeons (ESTS). Methods: A 29-question ad hoc questionnaire was available to all ESTS members from December 2022 to February 2023. ...

Colourful city birds

Colourful city birds
2025-04-09
Urbanization has a huge impact on the ecosystem and poses enormous challenges to animals and plants. The ongoing, worldwide increase in urbanization is considered one of the main causes of the steady decline in biodiversity. Urban ecology is the field of research that focuses on the effects of urbanization on different organisms. For example, many studies have investigated how urban noise affects communication in birds. However, little is still known about the relationship between urbanization and plumage colour in birds. Plumage colour serves many important functions: it can play a role in ...

To upgrade apps, listen to users

2025-04-09
How do apps improve? For some of today’s most popular applications, it’s by listening to their customers. Instagram responded to requests for in-app editing tools by offering filters, brightness, and contrast adjustments. Offline maps, by Google Maps, answered users who wanted to use the tool when they couldn’t get online. But listening to user feedback isn’t an easy task. The Apple App Store alone offers 3.8 million appswith as many as 1.8 million reviews apiece. New research from ...

The green past of the Saharo-Arabian Desert

The green past of the Saharo-Arabian Desert
2025-04-09
The Saharo-Arabian Desert experienced repeated wetter periods over the past eight million years. Wetter conditions favoured the exchange of mammals between Africa and Eurasia. Fossilised rain water reveals monsoon rains reached Arabia in such wet periods. The Saharo-Arabian Desert is one of the largest biogeographic barriers on Earth, hindering the dispersal of animals between Africa and Eurasia, and is at least eleven million years old. How did water-dependent mammals, including our early ancestors, manage to cross this inhospitable desert in ...

Comprehensive analysis of imaging and pathological features in 20 cases of pulmonary mucosa-associated lymphoid tissue (MALT) lymphoma: a retrospective study

2025-04-09
Background: Pulmonary mucosa-associated lymphoid tissue (MALT) lymphoma is a rare, indolent subtype of non-Hodgkin lymphoma with distinct radiological and pathological characteristics. Clinically, patients may present with nonspecific symptoms such as cough or dyspnea, and the disease can mimic other pulmonary conditions. High-resolution computed tomography (HRCT) imaging plays a critical role in identifying characteristic lung patterns, such as nodules, consolidation, or ground-glass opacities, which help in differentiating pulmonary MALT lymphoma from other pulmonary disorders. The ...

Financial well-being varies across generations

2025-04-09
Generations are already seen as unique in terms of values and beliefs. These differences may stretch into the realm of finance, according to a new study from the University of Georgia. Financial well-being reflects a person’s ability to hold out against financial troubles and achieve their goals. If financial well-being is low, there can be negative impacts on a person’s mental health. The researchers looked at data from the 2016 Consumer Financial Protection Bureau’s National ...

AI-powered smart clothing logs posture, exercises

2025-04-09
ITHACA, N.Y. – Researchers at Cornell University have developed a new type of smart clothing that can track a person’s posture and exercise routine but looks, wears – and washes – just like a regular shirt. The new technology, called SeamFit, uses flexible conductive threads sewn into the neck, arm and side seams of a standard short-sleeved T-shirt. The user does not need to manually log their workout, because an artificial intelligence pipeline detects movements, identifies the ...

Impact of chest tube type on pain, drainage efficacy, and short-term treatment outcome following video-assisted thoracoscopic surgery lobectomy: a randomized controlled trial comparing coaxial silicon

2025-04-09
Background: Chest drains are routinely used after video-assisted thoracoscopic surgery (VATS) lung resections to evacuate fluid and air from the pleural space. We compared the impact of coaxial silicone (SIL) drains vs. standard polyvinyl chloride (PVC) drains on postoperative pain, drainage efficacy, and short-term treatment outcome following VATS lobectomy. Methods: The prospective randomized study included 80 patients who underwent VATS lobectomy for lung cancer between September 2020 and June 2023. Patients were randomized into two groups based on the type ...

Pregnancy-related deaths in the US, 2018-2022

2025-04-09
About The Study: In this cross-sectional analysis of pregnancy-related deaths in the U.S., rates increased during 2018 to 2022, with large variations by state and race and ethnicity. The concerning rates in the U.S. should be an urgent public health priority. Corresponding Author: To contact the corresponding author, Yingxi Chen, MD, PhD, email yingxi.chen@nih.gov. To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/ (doi:10.1001/jamanetworkopen.2025.4325) Editor’s Note: Please see the article for additional information, including other authors, author contributions ...

LAST 30 PRESS RELEASES:

Key enzyme in lipid metabolism linked to immune system aging

Improved smoking cessation support needed for surgery patients across Europe

Study finds women much more likely to be aware of and have good understanding of obesity drugs

Study details role of protein that may play a key role in the development of schizophrenia

Americans don’t think bird flu is a threat, study suggests

New CDC report shows increase in autism in 2022 with notable shifts in race, ethnicity, and sex

Modulating the brain’s immune system may curb damage in Alzheimer’s

Laurie Manjikian named vice president of rehabilitation services and outpatient operations at Hebrew SeniorLife

Nonalcoholic beer yeasts evaluated for fermentation activity, flavor profiles

Millions could lose no-cost preventive services if SCOTUS upholds ruling

Research spotlight: Deer hunting season linked to rise in non-hunting firearm incidents

Rice scientists uncover quantum surprise: Matter mediates ultrastrong coupling between light particles

Integrative approach reveals promising candidates for Alzheimer’s disease risk factors or targets for therapeutic intervention

A wearable smart insole can track how you walk, run and stand

Research expands options for more sustainable soybean production

Global innovation takes center stage at Rice as undergraduate teams tackle health inequities

NIST's curved neutron beams could deliver benefits straight to industry

Finding friendship at first whiff: Scent plays role in platonic potential

Consortium of Multiple Sclerosis Centers releases 2025 expert panel document on best practices in MS management

A cool fix for hot chips: Advanced thermal management technology for electronic devices

Does your brain know you want to move before you know it yourself?

Bluetooth-based technology could help older adults stay independent

Breaking the American climate silence

Groundbreaking study uncovers how our brain learns

Sugar-mimicking molecule central to virulence of a common crop disease, study finds

Surprise: Synapses on single neurons follow distinct rules during learning

Fresh insights into why solid-state batteries fail could inform longer-lasting batteries

Curiosity rover identifies carbonates, providing evidence of a carbon cycle on ancient Mars

Up to 17% of global cropland contaminated by toxic heavy metal pollution, study estimates

Curiosity rover finds large carbon deposits on Mars

[Press-News.org] Could LLMs help design our next medicines and materials?
A new method lets users ask, in plain language, for a new molecule with certain properties, and receive a detailed description of how to synthesize it