Technology 2026-03-09 4 min read

Machine learning predicts molecular handedness from just 5 reactions instead of 60

A sparse-data workflow trained on four published papers forecasts how asymmetric cross-coupling reactions will behave, slashing months of trial-and-error chemistry.

Building a drug molecule is partly an exercise in geometry. Many pharmaceutical compounds exist as mirror-image pairs, called enantiomers, and the difference between the two forms can be the difference between a medicine and a poison. Chemists spend enormous amounts of time and money testing combinations of catalysts, ligands, and substrates to find recipes that reliably produce the correct mirror image. What if a computer could narrow those options down before anyone sets foot in the lab?

A filter that speaks the language of molecular shape

Researchers at the University of Utah and the University of California, Los Angeles have built a machine-learning workflow that does exactly that. Published as an accelerated preview in Nature in February 2026, the system screens tens of thousands of chemical structures and predicts how reaction components will come together to favor one enantiomer over another. The key innovation is frugality: the model was trained on data from just four academic papers on asymmetric nickel-catalyzed cross-coupling reactions.

That is a remarkably thin training set by the standards of modern AI, which typically demands enormous datasets. But chemistry operates under different constraints. High-quality experimental data is expensive and slow to generate. A single reaction screen might require weeks of lab work and thousands of dollars in materials. The tool sidesteps this bottleneck by converting each reaction component into a set of numerical descriptors that capture its three-dimensional shape and electronic properties, creating a mathematical fingerprint that a computer can analyze without running the actual reaction.

How left-handed molecules become right-handed problems

The property at stake is called chirality, or molecular handedness. Just as a left shoe does not fit a right foot, left-handed and right-handed molecules interact differently with biological receptors. In drug development, producing the wrong enantiomer can render a compound ineffective or, worse, toxic. Asymmetric reactions are designed to favor one enantiomer, ideally producing something like a 95-to-5 ratio rather than a useless 50-50 split.

Achieving that selectivity requires careful matching of a metal catalyst, a ligand that controls the reaction's three-dimensional orientation, and the substrate molecules being joined. The ligand is typically the most critical variable. Testing ligand-substrate combinations one by one is the traditional approach, and it is painfully slow.

Transferring predictions to reactions the model has never seen

The research team, led by co-lead authors Simone Gallarati and Erin Bucci, put the system through increasingly difficult challenges. First, they asked it to predict outcomes for new substrates used with familiar ligands. Then they tested it on entirely new ligand classes absent from the training data. At each step, the model's predictions were validated through laboratory experiments in the Doyle lab at UCLA.

The results were striking. Instead of running 50 to 60 reactions to identify the best conditions for a new transformation, the team found they could run 5 to 10, guided by the model's predictions. That translates to weeks or months of saved labor, plus significant reductions in material costs, since every reaction component must be either purchased or synthesized from scratch.

Perhaps more important than speed is the model's transparency. Unlike many AI tools that function as opaque prediction engines, this workflow allows chemists to examine which molecular features drove a particular prediction. When the model is wrong, those features still provide useful chemical insight, pointing toward factors that the researchers might not have identified through intuition alone.

Where the tool falls short

The system was developed and tested specifically on nickel-catalyzed asymmetric cross-coupling reactions. While the researchers argue the workflow's architecture is generalizable, its performance on other reaction types, metal catalysts, or more complex molecular systems has not yet been demonstrated. Chemistry is vast, and a model trained on one corner of it may not transfer smoothly to another.

The training data came from academic laboratories with high standards for data quality and consistency. Industrial datasets tend to be noisier, with more variability in experimental conditions. Whether the sparse-data approach remains effective under those conditions is an open question.

The model also cannot currently predict entirely new reaction mechanisms. It excels at interpolating within known reaction families and extrapolating to modest extensions, but it is not designed to replace the creative judgment of a synthetic chemist confronting a genuinely novel transformation.

The pharmaceutical pitch

The immediate application is in pharmaceutical process chemistry. When a company needs to scale up production of a chiral drug candidate for clinical trials, optimizing the enantioselective reaction is often a bottleneck. A tool that narrows the search space from hundreds of possible conditions to a manageable shortlist could materially accelerate the transition from Phase I to Phase II trials.

The broader vision is a shift in how synthetic chemistry operates: fewer experiments, guided by computational prediction, with feedback loops that make the models smarter over time. We are not there yet. But a system that learns from four papers and saves months of lab work is a credible step in that direction.

Source: Gallarati, S. et al. Transferable enantioselectivity models from sparse data. Nature (2026). DOI: 10.1038/s41586-026-10239-7. Research conducted at the University of Utah and UCLA, supported by the Swiss National Science Foundation, NSF, and NIH.