MULTI-evolve Compresses Months of Protein Engineering Into Weeks With 200 Strategic Tests
The challenge in protein engineering is not finding mutations that improve a protein's function - it is finding combinations of mutations that work together synergistically. A protein of 100 amino acids has 20 raised to the 100th power possible variants, more combinations than atoms in the observable universe. Traditional approaches test hundreds of variants but explore only a tiny corner of that space. Machine learning can scan far more territory computationally, but existing methods still require tens of thousands of experimental measurements or five to ten iterative rounds - work that stretches across months.
Researchers at Arc Institute have now built a framework that collapses that timeline. MULTI-evolve, published in Science, achieves meaningful multi-mutation improvements using only about 100 to 200 experimental measurements and a single round of machine learning. The key insight is simple in principle: instead of testing random variants, test all pairwise combinations of the mutations most likely to help.
Why Pairwise Data Unlocks Multi-Mutation Prediction
Single-mutation data tells you which individual amino acid changes improve function. What it cannot tell you is whether two beneficial mutations will add together, cancel each other out, or produce an unexpected synergistic boost when combined. This interaction - called epistasis - is what makes multi-mutation prediction difficult.
The MULTI-evolve approach addresses this by first identifying roughly 15 to 20 function-enhancing single mutations using protein language models, then systematically testing all pairwise combinations of those mutations. That generates approximately 100 to 200 measurements, and crucially, every single one is informative for learning how mutations interact. There is no wasted data on variants that don't work.
Neural networks trained on these pairwise data learn the rules of epistasis for that specific protein - which pairs synergize, which antagonize, which add independently. Those learned rules then allow the model to extrapolate predictions to much more complex combinations: variants carrying 5, 6, or 7 mutations simultaneously.
The team validated this approach computationally using 12 existing protein datasets from published studies, training on only single and double mutant data and then testing prediction accuracy on complex multi-mutants containing 3 to 12 mutations. The models predicted top performers correctly more than half the time, across all 12 diverse protein families.
Three Real Proteins, Three Distinct Challenges
Applied to actual engineering campaigns, the framework produced results across three very different proteins.
For APEX, an enzyme used as a research tool in cell biology, MULTI-evolve achieved up to 256-fold improvement over the wild-type sequence and 4.8-fold improvement over APEX2, an already-optimized variant that has been widely used for years. Notably, the method identified a mutation called A134P - a substitution of proline at position 134 - that standard protein language model methods systematically missed because they penalize proline substitutions as a class. The ensemble scoring strategy in MULTI-evolve corrects for this bias, allowing A134P to be identified as a strong candidate.
For dCasRx, a protein used for RNA editing applications, the team began with a deep mutational scan of more than 11,000 variants, then extracted only the function-enhancing subset and tested their pairwise combinations. This demonstrated the value of strategic data curation: by filtering down to informative variants first, the pairwise testing step remained tractable. The result was up to 9.8-fold improvement in trans-splicing activity.
For an anti-CD122 antibody, the framework achieved 2.7-fold improvement in binding affinity to 1.0 nanomolar and 6.5-fold improvement in expression - relevant metrics for therapeutic antibody development.
A DNA Assembly Method to Match
Predicting which multi-mutation variants will work best is only half the problem. Building and testing those variants is the other half - a bottleneck the team also addressed. They developed MULTI-assembly, a multi-site mutagenesis method that builds complex DNA constructs with 40 to 70% efficiency for variants carrying up to 9 mutations across several kilobases. A companion computational oligonucleotide designer takes target mutations as input and outputs primers optimized for efficient assembly. The whole process takes days rather than the weeks typically required for commercial DNA synthesis.
Open-Source and Modular
Arc Institute has made the MULTI-evolve framework publicly available as an open-source tool that handles protein language model predictions, neural network training, and MULTI-assembly oligonucleotide design. The framework is deliberately modular: as better protein language models emerge, they can be substituted in at the mutation-discovery step. The approach integrates naturally with other design tools and can be used to refine computationally designed proteins or optimize therapeutic candidates.
The current system focuses on proteins where 15 to 20 function-enhancing single mutations can be identified. Proteins with highly complex fitness landscapes - where beneficial mutations are extremely rare or difficult to identify with existing computational methods - may require additional development before MULTI-evolve becomes applicable.