Medicine Technology 🌱 Environment Space Energy Physics Engineering Social Science Earth Science Science
Technology 2026-02-19 3 min read

DEGU Adds Calibrated Uncertainty Estimates to Deep Learning Predictions in Genomics

A new computational tool developed for genomic deep learning provides confidence scores alongside predictions, helping scientists identify when AI models are likely to be wrong.

Deep neural networks have become essential tools in genomics. They predict gene expression levels from DNA sequences, identify regulatory elements, model the effects of mutations, and guide experimental design. But most of them have a significant blind spot: they produce single numerical predictions without any indication of how confident they are, or how likely those predictions are to be wrong.

A new computational tool called DEGU addresses this gap. Developed by researchers working at the intersection of machine learning and biology, DEGU attaches calibrated uncertainty estimates to the outputs of deep neural networks applied to genomic prediction tasks. The result is a model that tells scientists not just what it predicts, but how much confidence to place in each prediction.

Why Uncertainty Quantification Matters in Biology

A scientist designing an expensive wet-lab experiment based on AI-predicted gene regulatory activity needs to know more than just the predicted value. They need to know whether the model is likely to be right. A prediction accompanied by a narrow confidence interval invites action. A prediction with a wide confidence range, or a flagged uncertainty score, suggests verification may be warranted before committing resources.

Current genomic deep learning tools almost universally lack this capability. They were trained to minimize prediction error on held-out test sets, which optimizes average accuracy but does not produce outputs that accurately reflect the model's uncertainty on individual cases. A model can achieve high average accuracy while being systematically overconfident on specific input types - particularly novel inputs that lie outside the distribution of its training data.

In genomics, this matters because researchers frequently want to apply models to sequences or conditions that are meaningfully different from their training contexts. A model trained on human regulatory sequences asked to make predictions about a related sequence in a different species, or a synthetic regulatory element, may produce confident-sounding predictions that reflect pattern-matching from training data rather than true learned understanding of the underlying biology.

How DEGU Produces Uncertainty Estimates

The technical approach underlying DEGU involves conformal prediction, a statistical framework that provides rigorous coverage guarantees for prediction intervals. Unlike Bayesian approaches that require assumptions about the prior distribution of model parameters, conformal prediction works with any trained model and provides provably valid prediction sets under mild distributional assumptions.

Applied to genomic neural networks, DEGU wraps around an existing trained model and uses a calibration set of held-out examples to learn how to translate the model's internal confidence signals into calibrated prediction intervals. The result is that when DEGU attaches a 90% confidence interval to a prediction, that interval actually contains the true value 90% of the time on data similar to the calibration set.

The developers validated DEGU across multiple established genomic deep learning architectures and benchmark prediction tasks, demonstrating both that the calibration is accurate and that the uncertainty estimates add practical predictive value - predictions flagged as highly uncertain by DEGU were indeed more likely to be incorrect than those flagged as confident.

Practical Limitations

DEGU's uncertainty estimates are calibrated to data distributions similar to the calibration set. When applied to inputs very different from that calibration data - highly novel sequences or substantially different genomic contexts - the calibration may not hold. This is a fundamental limitation of conformal prediction rather than a specific flaw in DEGU's implementation, but it means users need to understand their model's training and calibration distributions to interpret uncertainty estimates appropriately.

The tool also addresses uncertainty in predictions from existing models but does not itself improve those predictions. A deep learning model that makes poor predictions due to limited training data, a flawed architecture, or misspecified training objectives will still make poor predictions with DEGU attached; it will just make them with honest uncertainty bands rather than false confidence. Fixing prediction quality requires improving the underlying model, not the uncertainty wrapper.

The Broader Problem of AI Transparency in Biology

DEGU addresses one specific dimension of AI transparency - calibrated confidence - but the broader question of how scientists should interpret and trust deep learning predictions in biology is more complex. Understanding which features of an input sequence drive a prediction, whether those features correspond to real biological mechanisms, and how predictions generalize across different biological contexts all require additional tools and analytical approaches that DEGU does not provide.

Nevertheless, calibrated uncertainty is a prerequisite for any rigorous use of AI predictions to prioritize experiments. If the field is to move toward AI-guided research workflows that direct experimental resources efficiently, tools that flag unreliable predictions are as important as tools that improve average accuracy. DEGU fills a practical gap that existed in the genomic deep learning toolbox.

Source: DEGU research publication on uncertainty quantification for genomic deep neural networks, 2026. Developed for the computational biology and machine learning communities.