“Explainable” AI cracks secret language of sticky proteins

Researchers train AI to predict if and why proteins form sticky clumps, a mechanism linked to 50 human diseases affecting half a billion people

2025-04-30

(Press-News.org) An AI tool has made a step forward in translating the language proteins use to dictate whether they form sticky clumps similar to those linked to Alzheimer’s Disease and around fifty other types of human disease. In a departure from typical “black-box” AI models, the new tool, CANYA, was designed to be able to explain its decisions, revealing the specific chemical patterns that drive or prevent harmful protein folding.

The discovery, published today in the journal Science Advances, was possible thanks to the largest-ever dataset on protein aggregation created to date. The study gives new insights about the molecular mechanisms underpinning sticky proteins, which are linked to diseases affecting half a billion people worldwide.

Protein clumping, or amyloid aggregation, is a health hazard that disrupts normal cell function. When certain patches in proteins stick to each other, proteins grow into dense fibrous masses that have pathological consequences.

While the study has some implications for accelerating research efforts for neurodegenerative diseases, it’s more immediate impact will be in biotechnology. Many drugs are proteins, and they are often hampered by unwanted clumping.

“Protein aggregation is a major headache for pharmaceutical companies,” says Dr. Benedetta Bolognesi, co-corresponding author of the study and Group Leader at the Institute for Bioengineering of Catalonia (IBEC).

“If a therapeutic protein starts aggregating, manufacturing batches can fail, costing time and money. CANYA can help guide efforts to engineer antibodies and enzymes that are less likely to stick together and reduce expensive setbacks in the process,” she adds.

Protein clumps are formed using a poorly understood language. Proteins are made of twenty different types of amino acids. Instead of the usual A, C, G, T letters that make up the language of DNA, a protein’s language has twenty different letters, different combinations of which form “words” or “motifs”.

Researchers have long sought to decipher which combinations of motifs cause clumping and which others enable proteins to fold without error. Artificial intelligence tools that treat amino acids like the alphabet of a mysterious language could help identify the precise words or motifs responsible, but the quality and volume of data about protein aggregation needed to feed models have been historically scant or restricted to very small protein fragments.

The study addressed this challenge by carrying out large-scale experiments. The authors of the study created over 100,000 completely random protein fragments, each 20 amino acids long, from scratch. The ability for each synthetic fragment to clump was tested in living yeast cells. If a particular fragment triggered clump formation, the yeast cells would grow in a certain way that could be measured by the researchers to determine cause and effect.

Around one in every five protein fragments (21,936/100,000) caused clumping, while the rest did not. While previous studies might have tracked a handful sequences, the new dataset captures a much bigger catalogue of the different protein variants which can cause amyloid aggregation.

“We created truly random protein fragments including many versions not found in nature. Evolution has explored only a fraction of all possible protein sequences, while our approach helps us peer into a much bigger galaxy of possibilities, providing lots of data points to help understand more general laws of aggregation behaviour,” explains Dr. Mike Thompson, first author of the study and postdoctoral researcher at the Centre for Genomic Regulation (CRG).

The vast amount of data generated from the experiments was used to train CANYA. The researchers decided to create it using the principles of “explainable AI”, making its decision-making processes transparent and understandable to humans. This meant sacrificing a little bit of its predictive power, which is usually higher in “black-box” AIs. Despite this, CANYA proved to be around 15% more accurate than existing models.

Specifically, CANYA is a convolution-attention model, a hybrid tool borrowing from two distinct corners of AI. Convolution models, like those used in image recognition, scan photos for features like an ear or a nose to identify a face, except in this case CANYA skims through the protein chain to find meaningful features like motifs or “words”.

Attention AI models are used by language translation tools to identify key phrases in a sentence before deciding on the best translation. The researchers incorporated this technique to help CANYA figure out which motifs matter most in the grand scheme of the entire protein.

Together, these two approaches help CANYA see local motifs up close while also spotting their bigger-picture importance. The researchers could use this information to not just predict which motifs in the protein chain encourage clumping, block it, or something in between, but also understand why.

For example, CANYA showed that small pockets of water-repelling amino acids are more likely to spark clumping, while some motifs have a bigger impact on clumping if they’re near the start of a protein sequence rather than at the end. The observations align with previous findings researchers have seen under the microscope in known amyloid fibrils.

But CANYA also found new rules driving protein aggregation. For instance, certain building blocks of proteins, so-called charged amino acids, are normally thought to prevent clumping. But it turns out that in the context of other specific building blocks, they can actually promote clumping.

In its current form, CANYA primarily explains protein aggregation in yes or no terms, i.e. it works as a so-called “classifier”. The researchers next want to refine the system so it can predict and compare aggregation speeds rather than just aggregation likelihood. This could help predict which protein variants form clumps quickly and which do so more slowly, a vital factor in neurodegenerative diseases where the timing of amyloid formation matters just as much as the fact that it happens at all.

“There are 1024 quintillion ways of creating a protein fragment that is 20-amino acids long. So far, we’ve trained an AI with just 100,000 fragments. We want to improve it by making more and bigger fragments. This is just the first step but our work shows it is possible to decipher the language of protein aggregation. This is incredibly important for our understanding of human disease but also to guide synthetic biology efforts” concludes Dr. Bolognesi.

“This project is a great example of how combining large-scale data generation with AI can accelerate research. It’s also a very cost-effective method to generate data,” says ICREA Research Professor Ben Lehner, co-corresponding author and Group Leader at the Centre for Genomic Regulation (CRG) and the Wellcome Sanger Institute.

“Using DNA synthesis and sequencing we can perform hundreds of thousands of experiments in a single tube, generating the data we need to train AI models. This is an approach we are applying to many difficult problems in biology. The goal is to make biology predictable and programmable,” he adds.

The study is a joint collaborative effort by ICREA Research Professor Ben Lehner’s lab at the Centre for Genomic Regulation (CRG) and Benedetta Bolognesi’s lab at the Institute for Bioengineering of Catalonia (IBEC). Researchers from Cold Spring Harbor Laboratory (CSHL) and Wellcome Sanger Institute also collaborated in the study. It was funded by ”La Caixa” Research Foundation, the European Research Council and the Spanish Ministry of Science and Innovation.

END

ELSE PRESS RELEASES FROM THIS DATE:

Setting, acute reaction and mental health history shape ayahuasca's longer-term psychological effects

2025-04-30

Mounting evidence supports ayahuasca’s potential to improve mental health, but its long-term effects are shaped by both individual mental health history and the context in which the psychedelic is used, according to a study published on April 30, 2025 in the open-access journal PLOS Mental Health by Óscar Andión from Research Sherpas, Spain; José Carlos Bouso from the International Centre for Ethnobotanical Education, Research, and Services (ICEERS) and the University of Rovira i Virgili, Spain; Daniel Perkins from the University of Melbourne and Swinburne University; and colleagues. Ayahuasca, a psychedelic medicine traditionally ...

National-Level Actions Effective at Tackling Antibiotic Resistance

2025-04-30

National-level policies can reduce the impact of antibiotic resistance across diverse countries, according to a study published April 30, 2025 in the open-access journal PLOS Global Public Health by Peter Søgaard Jørgensen from Stockholm University and the Royal Swedish Academy of Sciences, Sweden, and colleagues. Antibiotic resistance is a major public health concern, contributing to 1.27 million deaths per year. In 2016, countries around the world committed to developing and implementing national action plans to combat antibiotic resistance. These plans have been criticized ...

Machine learning brings new insights to cell’s role in addiction, relapse

2025-04-30

Object recognition software is used by law enforcement to help identify suspects, by self-driving cars to navigate roadways and by many consumers to unlock their cell phones or pay for their morning coffee. Now, researchers led by the University of Cincinnati’s Anna Kruyer and the University of Houston’s Demetrio Labate have applied object recognition technology to track changes in brain cell structure and provide new insights into how the brain responds to heroin use, withdrawal and relapse. The research was published April 30 in the journal Science Advances. Study ...

The duke mouse brain atlas will accelerate studies of neurological disorders

2025-04-30

A new “atlas” developed by researchers at Duke University School of Medicine, University of Tennessee Health Science Center, and the University of Pittsburgh will increase precision in measuring changes in brain structure and make it easier to share results for scientists working to understand neurological diseases such as Alzheimer's disease. The tool, the Duke Mouse Brain Atlas, combines microscopic resolution, three-dimensional images from three different techniques to create a detailed map of the entire mouse brain, from large structures down to individual cells and circuits. “This ...

In VR school, fish teach robots

2025-04-30

Fish are masters of coordinated motion. Schools of fish have no leader, yet individuals manage to stay in formation, avoid collisions, and respond with liquid flexibility to changes in their environment. Reproducing this combination of robustness and flexibility has been a long-standing challenge for human engineered systems like robots. Now, using virtual reality for freely-moving fish, a research team based in Konstanz has taken an important step towards that goal. “Our work illustrates that solutions evolved by nature over millennia can inspire robust and efficient control laws in engineered systems,” said first author Liang Li from the University of Konstanz. Co-author ...

Every action counts: Global study shows countries can reverse increasing antibiotic resistance

2025-04-30

A new study, led by Peter Søgaard Jørgensen from the Stockholm Resilience Centre at Stockholm University, reveals that while global cooperation remains essential, countries have more power than previously believed to reduce antibiotic resistance through effective domestic interventions. Currently only a handful of countries are taking sufficient action. The study is the first to assess the level of government intervention needed to improve the worsening situation on antibiotic resistance across 73 countries. The researchers find strong associations between the level of action a country reports and whether antibiotic use and antibiotic resistance increased during a ...

Hiding in plain sight: Researchers uncover the prevalence of ‘curiosity’ virus

2025-04-30

A type of virus thought to be a ‘mere curiosity’ is plentiful in one common bacteria, and possibly others, a Monash University-led research team has found. The discovery improves understanding of how viruses work and could mean this particular virus is also common in other types of bacteria. Published in Science Advances, the study looked at bacteriophages (phages), which are viruses that infect bacteria and come in many forms. In particular, researchers investigated telomere phages, a ...

Fusion energy: ITER completes world’s largest and most powerful pulsed magnet system with major components built by USA, Russia, Europe, China

2025-04-30

In a landmark achievement for fusion energy, ITER has completed all components for the world’s largest, most powerful pulsed superconducting electromagnet system. ITER is an international collaboration of more than 30 countries to demonstrate the viability of fusion—the power of the sun and stars—as an abundant, safe, carbon-free energy source for the planet. The final component was the sixth module of the Central Solenoid, built and tested in the United States. When it is assembled at the ITER site in Southern France, the Central Solenoid will be ...

New study unlocks how root cells sense and adapt to soil

2025-04-30

Scientists have discovered, for the first time how root cells respond to their complex soil environment revealing that roots actively sense their microenvironment and mount precise, cell-specific molecular responses. The findings could help the development crops that are resistant to climate stress. In a study published in Nature, an international team of plant scientists and engineers from the University of Nottingham have worked with teams in the USA and Belgium. The team used cutting-edge spatial and single-cell transcriptomics to compare rice roots grown in conventional gel-based media with those grown in heterogeneous natural ...

Landmark experiment sheds new light on the origins of consciousness

2025-04-30

Seattle, WASH.—April 30, 2025—An experiment seven years in the making has uncovered new insights into the nature of consciousness and challenges two prominent, competing scientific theories: Integrated Information Theory (IIT) and Global Neuronal Workspace Theory (GNWT). The findings were published today in Nature and mark a pivotal moment in the goal to understand the elusive origins consciousness. IIT suggests that consciousness emerges when information inside a system (like the brain) is highly connected and unified, for as long as the information is consciously perceived, acting as a single whole. On the other ...

“Explainable” AI cracks secret language of sticky proteins

ELSE PRESS RELEASES FROM THIS DATE:

LAST 30 PRESS RELEASES: