An AI model trained on energy and transport data can predict river flows it has never seen
In much of the world, river gauges are sparse or absent. Records are incomplete. Monitoring networks, where they exist, are expensive to maintain and vulnerable to the same extreme weather they are meant to track. Without reliable historical data, communities have limited capacity to forecast floods, anticipate droughts, or plan water infrastructure. Climate change is making this problem worse, not better.
A study published in Machine Learning: Earth by researchers at the University of Texas at Austin and Hydrotify LLC suggests that a new class of artificial intelligence models might help close that gap - and they were not even trained on river data.
Foundation models meet hydrology
The models in question are called time-series foundation models, or TSFMs. They are the hydrological equivalent of large language models: trained on massive, diverse datasets to recognize patterns in sequential data, then applied to tasks they were not specifically designed for. The TSFMs evaluated in this study were originally trained on time-series data from energy grids, transportation networks, climate reanalyses, and other domains - not on river flow measurements.
The research team tested several TSFMs on a standard U.S. river flow dataset covering more than 500 basins with decades of daily streamflow records. The question was straightforward: how well can a model that has never seen hydrological training data predict the flow of a river?
The answer, for at least one model, was surprisingly well.
Sundial's performance against a purpose-built benchmark
The benchmark was a long short-term memory (LSTM) neural network - a type of deep learning model specifically designed for sequential data and extensively validated for streamflow prediction. The LSTM had been fully trained on decades of observed river flow data from the same basins. It represents the current standard for data-driven hydrological forecasting.
One TSFM in particular, called Sundial, came close to matching the LSTM's performance despite having been trained on entirely different types of time-series data. It produced daily streamflow predictions that captured the major patterns - seasonal cycles, peak flows, baseflow recession - with skill approaching that of the purpose-trained model.
The strongest results came in basins with clear seasonal signals, particularly those dominated by snowmelt. Snowmelt-driven rivers follow predictable annual patterns - low winter flows, a spring freshet, summer recession - and TSFMs trained on other seasonally structured data appear to transfer that pattern-recognition capability across domains.
What this means for data-scarce regions
The practical significance lies not in U.S. river basins, which already have extensive monitoring and established forecasting systems, but in the regions that do not. Large parts of Africa, Central Asia, Southeast Asia, and South America have rivers that support millions of people but lack the decades of gauging data needed to train traditional hydrological models.
If a foundation model trained on non-hydrological data can produce useful river flow predictions without any local training data, it offers something genuinely new: a forecasting capability that does not depend on having measured what you are trying to predict. For communities facing flood risk without warning systems, or water managers planning allocations without historical records, even an approximate forecast is vastly better than none.
"Reliable water information is essential for communities everywhere, but many regions still lack the long-term records needed to support traditional forecasting methods," said Alexander Sun of the University of Texas at Austin and Hydrotify LLC. "Approaches like this show how new AI tools could help close that gap by giving more places access to data-driven predictions."
Where the models struggle
The TSFMs did not perform equally well everywhere. Basins with complex hydrology - multiple tributaries, regulated flows, urban runoff, or groundwater-dominated regimes - presented more difficulty. These systems produce streamflow patterns that are less regular and harder to capture with the generalized pattern recognition that foundation models provide.
The distinction makes intuitive sense. A snowmelt basin is, from a time-series perspective, similar to other seasonal phenomena: energy demand cycles, agricultural production rhythms, temperature oscillations. A foundation model trained on diverse seasonal data can recognize that template. An urban basin where flow spikes unpredictably after impervious-surface runoff events presents a pattern that may not have analogues in the model's training data.
The study also evaluated multiple TSFMs, and performance varied significantly across models. Sundial stood out, but other foundation models performed worse, sometimes substantially so. The choice of model matters, and not all TSFMs are equally suited to hydrological applications.
Scaling with data - and the path forward
The authors note a key property of foundation models: their performance scales with training data size. Current TSFMs were trained on datasets that include relatively little Earth science data. As future generations incorporate more climate records, reanalysis products, and hydrological observations, their capacity for water-related predictions should improve. The current results represent a floor, not a ceiling.
This trajectory is important because it means the technology will likely get better without requiring each new application to build a model from scratch. A foundation model that improves globally benefits every region where it is applied, including regions that contribute no training data of their own.
Honest boundaries of the current work
The study tested TSFMs on U.S. basins with abundant historical data - precisely the basins where they are least needed. The true test will come when these models are applied in the data-scarce regions where they could have the most impact. Performance in well-gauged American rivers does not guarantee performance on ungauged African or Asian rivers with different climate regimes, land use patterns, and hydrological characteristics.
The study also focused on daily streamflow prediction. Operational flood forecasting often requires sub-daily or even hourly resolution, and whether TSFMs can maintain skill at finer time scales remains to be demonstrated. Similarly, drought forecasting involves longer time horizons and different hydrological processes than the daily prediction tasks evaluated here.
The comparison against a single LSTM benchmark, while informative, does not cover the full landscape of hydrological modeling approaches. Process-based models, hybrid physics-ML models, and ensemble methods may outperform both TSFMs and LSTMs in specific settings. The study demonstrates that TSFMs are competitive; it does not demonstrate that they are optimal.
There is also the question of uncertainty quantification. Operational flood warning systems need not just a prediction of river flow but a measure of confidence in that prediction. A forecast that says flow will be 500 cubic meters per second is less useful than one that specifies a range with a confidence interval. Whether TSFMs can produce reliable uncertainty estimates alongside their point predictions is an open question that operational deployment would require answering.
The equity dimension deserves attention as well. The regions that stand to benefit most from data-free forecasting are often those with the least capacity to deploy and maintain AI systems. Satellite imagery access, cloud computing resources, and local technical expertise are all prerequisites for translating a research demonstration into an operational tool. Bridging that implementation gap requires investment in infrastructure and training that goes beyond model development.
Still, the core finding opens a practical door. AI models trained on entirely non-hydrological data can predict river flows with useful skill. As these models grow more capable and as Earth science data are integrated into their training, the gap between data-rich and data-poor regions in water forecasting may begin to narrow. For the communities most exposed to water-related hazards and most underserved by existing monitoring networks, that is a development worth watching closely.