(Press-News.org) The search space for protein engineering grows exponentially with complexity. A protein of just 100 amino acids has 20^100 possible variants—more combinations than atoms in the observable universe. Traditional engineering methods might test hundreds of variants but limit exploration to narrow regions of the sequence space. Recent machine learning approaches enable broader searches through computational screening; however, these approaches still require tens of thousands of measurements or 5-10 iterative rounds.
With the advent of these foundational protein models, the bottleneck for protein engineering swings back to the lab: for a single protein engineering campaign, we can only efficiently build and test hundreds of variants. What is the best way to choose those hundreds to most effectively uncover an evolved protein with substantially increased function? To address this problem, we developed MULTI-evolve, a framework for efficient protein evolution that applies machine learning models trained on datasets of ~200 variants focused specifically on pairs of function-enhancing mutations.
Published today in Science, this work represents Arc Institute's first lab-in-the-loop framework for biological design, where computational prediction and experimental design are tightly integrated from the outset, reflecting our broader investment in AI-guided research.
Learning from pairwise interactions
Evolving proteins involves two fundamental steps: finding beneficial mutations, then combining them synergistically. Early in developing this approach, we realized that neural networks trained on single-mutant data alone couldn't reliably predict which multi-mutant combinations would work. Those models lack information about how mutations interact and most large datasets of random variants aren’t useful because the vast majority of mutations don't enhance function, so testing thousands of random variants teaches models mostly about what doesn't work.
Our insight was to focus on quality over quantity. First identify ~15-20 function-enhancing mutations (using protein language models or experimental screens), then systematically test all pairwise combinations of those beneficial mutations. This generates ~100-200 measurements, and every one is informative for learning beneficial epistatic interactions.
We validated this computationally using 12 existing protein datasets from published studies. Training neural networks on only the single and double mutants, we found models could accurately predict complex multi-mutants (variants with 3-12 mutations) across all 12 diverse protein families. This result held even when we reduced training data to just 10% of what was available.
Training on double mutants works because they reveal epistasis. A double mutant might perform better than the sum of its parts (synergy), worse than expected (antagonism), or exactly as predicted (additivity). These pairwise interaction patterns teach models the rules for how mutations combine, enabling extrapolation to predict which 5-, 6-, or 7- mutation combinations will work synergistically.
We then applied MULTI-evolve to three new proteins: APEX (up to 256-fold improvement over wild-type, 4.8-fold beyond already-optimized APEX2), dCasRx for trans-splicing (up to 9.8-fold improvement), and an anti-CD122 antibody (2.7-fold binding improvement to 1.0 nM, 6.5-fold expression increase). For dCasRx, we started with a deep mutational scan of >11,000 variants, extracted only the function-enhancing mutations, and tested their pairwise combinations—demonstrating the value of strategic data curation for efficient engineering.
Each required experimentally testing only ~100-200 variants in a single round to train models that accurately predicted complex multi-mutants, compressing what traditionally takes 5-10 iterative cycles over many months into weeks.
The MULTI-evolve loop
MULTI-evolve integrates three innovations into an end-to-end framework.
1. Combining protein language models enables effective mutation discovery
While single mutations can improve protein function, substantial improvements in function require combining several mutations. Previous work has demonstrated the ability of protein language model zero-shot methods to predict which mutations might improve function, but any individual method identifies few mutations for generating higher-order combinatorial variants.
To identify many function-enhancing mutations, our solution was to combine predictions from several different models, some analyzing protein sequence, others 3D structure, with two scoring methods. Testing this across 73 diverse protein datasets, we found our approach identified ~20 beneficial mutations on average, compared to ~11 from any single model.
When we applied this to APEX, we identified the A134P mutation, which improves activity 53-fold. Standard protein language model-based methods systematically missed it because they penalize proline substitutions. One of our ensemble scoring strategies involves normalizing amino acid specific biases, like this bias against proline substitutions, allowing A134P to emerge as a candidate when it otherwise would have been overlooked.
2. Neural networks predict which combinations will work best
Our next step was to determine, with a set of beneficial single and the pairwise double mutants, what is the most effective way to combine them into multi-mutant variants with up to 7 mutations.
Through computational benchmarking, we demonstrate that fully connected neural networks can reliably predict the activity of multi-mutants by training on primarily single and double mutants. Across 12 diverse protein datasets, our models correctly identified top performers more than half the time.
In practice, we demonstrate that MULTI-evolve can identify hyperactive variants with up to 7 mutations across 3 distinct proteins. We engineer multi-mutant variants with a single round of machine learning, where models are trained on a compact training set of ~200 strategic variants, and we experimentally test as few as 9 proposed candidates.
3. The MULTI-assembly method enables rapid synthesis
Another bottleneck is building and testing predicted variants. Commercial DNA synthesis is expensive and slow, especially for complex multi-mutants. Existing lab methods for multi-site mutagenesis have low efficiency and subjective oligo design that can make results unreliable.
To address this, we developed MULTI-assembly, a multi-site mutagenesis method that builds complex variants efficiently. By systematically optimizing reaction conditions, oligonucleotide designs, and assembly parameters, we achieved 40-70% assembly efficiency for variants with up to 9 mutations across several kilobases. We also developed a computational oligo designer that takes your target mutations as input and outputs primers optimized for efficient assembly. All of this can be done in days rather than weeks.
Try MULTI-evolve yourself
The MULTI-evolve framework is modular and will improve as the field advances. Better protein language models will enhance mutation discovery, and the approach integrates naturally with other design tools, refining computationally designed proteins or optimizing therapeutic candidates.
We've made MULTI-evolve available as an open-source tool that handles protein language model predictions, neural network training, and MULTI-assembly oligo design. Whether you're working on enzymes, genome editors, or therapeutic proteins, the framework provides a systematic path from initial mutations to optimized multi-mutants.
We're excited to see how the community applies MULTI-evolve to their protein engineering challenges. If you have questions about applying this to your work, please reach out.
###
Tran, V.Q., Nemeth, M., Bartie, L.J., Chandrasekaran, S.S., Fanton, A., Moon, H.C., Hie, B.L., Konermann, S., & Hsu, P.D. (2026). Rapid directed evolution guided by protein language models and epistatic interactions. Science. https://doi.org/10.1126/science.aea1820
Arc Institute is an independent nonprofit research organization based in Palo Alto, California, that aims to accelerate scientific progress and understand the root causes of complex diseases. Arc’s investigators are supported by long-term funding and freedom to pursue bold ideas. Its Technology Centers leverage multi-omics, genome engineering, and cellular, mammalian and computational models to advance discoveries at the intersection of biology and artificial intelligence. Founded in 2021, Arc partners with Stanford, UC Berkeley, and UCSF.
END
MULTI-evolve: Rapid evolution of complex multi-mutant proteins
A data-efficient lab-in-the-loop framework for machine-learning guided protein engineering
2026-02-19
ELSE PRESS RELEASES FROM THIS DATE:
A new method to steer AI output uncovers vulnerabilities and potential improvements
2026-02-19
A team of researchers has found a way to steer the output of large language models by manipulating specific concepts inside these models. The new method could lead to more reliable, more efficient, and less computationally expensive training of LLMs. But it also exposes potential vulnerabilities.
The researchers, led by Mikhail Belkin at the University of California San Diego and Adit Radhakrishnan at the Massachusetts Institute of Technology, present their findings in the Feb. 19, 2026, issue of the journal Science.
In the study, researchers went under the hood of several LLMs to locate specific concepts. They then mathematically increased or decreased the ...
Why some objects in space look like snowmen
2026-02-19
Astronomers have long debated why so many icy objects in the outer solar system look like snowmen. Michigan State University researchers now have evidence of the surprisingly simple process that could be responsible for their creation.
Far beyond the violent, chaotic asteroid belt between Mars and Jupiter lies what’s known as the Kuiper Belt. There, past Neptune, you’ll find icy, untouched building blocks from the dawn of the solar system, known as planetesimals. About one in 10 of these objects are contact binaries, planetesimals that are shaped like two connected spheres, much like ...
Flickering glacial climate may have shaped early human evolution
2026-02-19
Researchers have identified a ‘tipping point’ about 2.7 million years ago when global climate conditions switched from being relatively warm and stable to cold and chaotic, as continental ice sheets expanded in the northern hemisphere.
Following this transition, Earth’s climate began swinging back and forth between warm interglacial periods and frigid ice ages, linked to slow, cyclic changes in Earth’s orbit. However, glacial periods after this tipping point became far more variable, with ...
First AHA/ACC acute pulmonary embolism guideline: prompt diagnosis and treatment are key
2026-02-19
Guideline Highlights:
The first clinical practice guideline on acute pulmonary embolism (PE) from the American Heart Association and the American College of Cardiology introduces a new Acute Pulmonary Embolism Clinical Category system to define the severity of an acute pulmonary embolism and assist in developing a treatment strategy for adults with this condition.
The guideline details risk factors for acute PE, such as recent surgery or hospitalization, trauma, prolonged immobility, pregnancy, obesity, cancer and blood clotting ...
Could “cyborg” transplants replace pancreatic tissue damaged by diabetes?
2026-02-19
PHILADELPHIA— A new electronic implant system can help lab‑grown pancreatic cells mature and function properly, potentially providing a basis for novel, cell-based therapies for diabetes. The approach, developed by researchers at the Perelman School of Medicine at the University of Pennsylvania and the School of Engineering and Applied Sciences at Harvard University, incorporates an ultrathin mesh of conductive wires into growing pancreatic tissue, according to a study published today in Science.
“The words ‘bionic’, ‘cybernetic’, ...
Hearing a molecule’s solo performance
2026-02-19
When things vibrate, they make sounds. Molecules do too, but at frequencies far beyond human hearing. Chemical bonds stretch, bend and twist at characteristic rates that fall in the infrared region of the electromagnetic spectrum. Infrared spectroscopy, which measures how light excites these vibrations, is often likened to listening to a molecule's voice.
Each molecule has its own unmistakable tone – a vibrational “fingerprint” that reflects not only its chemical structure but also the nanoscale environment around it. But the voices of individual molecules are so faint that traditional infrared spectroscopy ...
Justice after trauma? Race, red tape keep sexual assault victims from compensation
2026-02-19
Images of the researchers
Bureaucratic hurdles and racial disparities restrict access to victim compensation for adult survivors of sexual assault, deepen justice system inequities and compound trauma.
The absence of police verification of a crime is the primary reason for rejection, representing 34.4% of disapproved requests—which account for roughly 8 out of every 100 applicants, according to a new University of Michigan study published in the American Journal of Public Health.
"Our ...
Columbia researchers awarded ARPA-H funding to speed diagnosis of lymphatic disorders
2026-02-19
NEW YORK, NY--A team of researchers led by Columbia University Vagelos College of Physicians and Surgeons has been awarded an up to two-year $8.7 million contract from the Advanced Research Projects Agency for Health (ARPA-H) to create genetic tests to speed the diagnosis of patients born with defects in the lymphatic system.
“Discovering genes that cause lymphatic anomalies and using this information to create new clinical tests will not only accelerate the diagnosis of patients, but will also lead to improved treatments and, most importantly, save lives,” says Carrie Shawber, PhD, associate professor of reproductive sciences at VP&S and principal investigator ...
James R. Downing, MD, to step down as president and CEO of St. Jude Children’s Research Hospital in late 2026
2026-02-19
MEMPHIS, Tenn., Feb. 19, 2026 – After leading an unprecedented growth of St. Jude Children’s Research Hospital over the past 12 years, James R. Downing, MD, will step down as president and CEO in late 2026 as part of a planned leadership transition. He will move into a faculty role in the Department of Global Pediatric Medicine, which he helped establish in 2018 to advance the mission of St. Jude around the world.
“When I joined St. Jude 40 years ago, I came for the opportunity to do great science, but I stayed because of the mission and culture,” Downing said. “I’ve watched St. Jude ...
A remote-controlled CAR-T for safer immunotherapy
2026-02-19
FEBRUARY 19, 2026, NEW YORK – Among the most promising tools of cancer therapy, engineered immune cells known as chimeric antigen-receptor (CAR) T cells have already transformed the treatment of blood cancers. Yet, despite their promise, CAR-T cells do have their limitations. For one thing, they’ve so far largely failed against solid tumors, which is to say, most types of cancer. For another, they can inadvertently kill healthy cells along with cancerous ones—or, separately, provoke a systemic immune overreaction—causing ...
LAST 30 PRESS RELEASES:
New ‘scimitar-crested’ Spinosaurus species discovered in the central Sahara
“Cyborg” pancreatic organoids can monitor the maturation of islet cells
Technique to extract concepts from AI models can help steer and monitor model outputs
Study clarifies the cancer genome in domestic cats
Crested Spinosaurus fossil was aquatic, but lived 1,000 kilometers from the Tethys Sea
MULTI-evolve: Rapid evolution of complex multi-mutant proteins
A new method to steer AI output uncovers vulnerabilities and potential improvements
Why some objects in space look like snowmen
Flickering glacial climate may have shaped early human evolution
First AHA/ACC acute pulmonary embolism guideline: prompt diagnosis and treatment are key
Could “cyborg” transplants replace pancreatic tissue damaged by diabetes?
Hearing a molecule’s solo performance
Justice after trauma? Race, red tape keep sexual assault victims from compensation
Columbia researchers awarded ARPA-H funding to speed diagnosis of lymphatic disorders
James R. Downing, MD, to step down as president and CEO of St. Jude Children’s Research Hospital in late 2026
A remote-controlled CAR-T for safer immunotherapy
UT College of Veterinary Medicine dean elected Fellow of the American Academy of Microbiology
AERA selects 34 exemplary scholars as 2026 Fellows
Similar kinases play distinct roles in the brain
New research takes first step toward advance warnings of space weather
Scientists unlock a massive new ‘color palette’ for biomedical research by synthesizing non-natural amino acids
Brain cells drive endurance gains after exercise
Same-day hospital discharge is safe in selected patients after TAVI
Why do people living at high altitudes have better glucose control? The answer was in plain sight
Red blood cells soak up sugar at high altitude, protecting against diabetes
A new electrolyte points to stronger, safer batteries
Environment: Atmospheric pollution directly linked to rocket re-entry
Targeted radiation therapy improves quality of life outcomes for patients with multiple brain metastases
Cardiovascular events in women with prior cervical high-grade squamous intraepithelial lesion
Transplantation and employment earnings in kidney transplant recipients
[Press-News.org] MULTI-evolve: Rapid evolution of complex multi-mutant proteinsA data-efficient lab-in-the-loop framework for machine-learning guided protein engineering