Evo 2 Trained on 128,000 Genomes Can Now Design Bacterial-Scale DNA
Evolution has been running experiments on DNA for roughly four billion years. Evo 2 has spent months reading the results. The artificial intelligence model, developed by Arc Institute and NVIDIA with collaborators at Stanford, UC Berkeley and UC San Francisco, has now been formally published in Nature after its preprint circulated since early 2025. Its training set: 9.3 trillion nucleotides drawn from more than 128,000 whole genomes spanning bacteria, archaea, viruses, plants, animals, and humans.
Reading the tree of life at once
Its predecessor, Evo 1, was trained entirely on single-cell microbial genomes. Evo 2 adds eukaryotes - the domain of life that includes everything from yeast to people. The model processes genetic sequences of up to one million nucleotides at a time, which allows it to detect functional relationships between distant regions of a genome that shorter-context models would miss entirely.
Achieving that required rethinking the underlying architecture. The team developed a new framework called StripedHyena 2, which enabled Evo 2 to train on 30 times more data than its predecessor while reasoning over sequences eight times longer. Training ran for several months on over 2,000 NVIDIA H100 GPUs via the NVIDIA DGX Cloud platform.
"Our development of Evo 1 and Evo 2 represents a key moment in the emerging field of generative biology, as the models have enabled machines to read, write, and think in the language of nucleotides," said Patrick Hsu, Arc Institute co-founder and assistant professor of bioengineering at UC Berkeley. "Evo 2 has a generalist understanding of the tree of life that's useful for a multitude of tasks."
From predicting disease mutations to designing new genomes
In benchmark tests, Evo 2 achieved over 90% accuracy in classifying variants of the breast cancer gene BRCA1 as benign or potentially pathogenic. That level of precision, applied broadly, could save substantial time and research spending that currently goes toward running cell and animal experiments to categorize individual mutations one at a time.
The model can also generate novel DNA sequences. In one application, Arc researchers used Evo 2 to design functional synthetic bacteriophages - viruses that infect bacteria - demonstrating that the model's outputs aren't just predictions but can include working biological code. Bacteriophages are of interest as potential treatments for antibiotic-resistant infections.
Since the preprint appeared, researchers have applied the model to Alzheimer's genetic risk prediction and to assessing variant effects in domesticated animal species - neither of which was part of the original training objectives.
Precise genetic control and the cell-type targeting problem
One application the Arc team highlights is designing genetic regulatory elements that activate only in specific cell types. Gene therapies often cause side effects because they act throughout the body; a therapy meant for liver cells might also affect neurons. Evo 2 could help engineers design genetic switches that are selective.
"If you have a gene therapy that you want to turn on only in neurons to avoid side effects, or only in liver cells, you could design a genetic element that is only accessible in those specific cells," said co-author Hani Goodarzi, an Arc Core Investigator and associate professor of biochemistry and biophysics at UC San Francisco.
Arc's Chief Technology Officer Dave Burke compared the model to an operating system kernel: a foundation on which many specialized applications can be built. Researchers are already building those applications, and the team expects uses to emerge that no one has yet anticipated.
Open source, with deliberate limits
The model's weights, training code, and training data are publicly available - making it the largest fully open-source AI model in biology to date. Arc also worked with AI interpretability lab Goodfire to build a visualizer that shows which biological features Evo 2 has learned to recognize in genomic sequences.
The researchers did exclude one category of data: pathogens that infect humans and other complex organisms were removed from the training set, and the model is designed not to return useful outputs to queries about them. Stanford Professor Tina Hernandez-Boussard and her group advised on responsible development and deployment.
Whether a model trained on DNA sequences generalizes as robustly to novel design tasks as it does to classification remains an active question. Generating new genomes is harder to evaluate than predicting known mutation effects, and the researchers are candid that the field of generative biology is still finding its footing. But 128,000 genomes and 9.3 trillion nucleotides make a formidable starting point.