Medicine Technology 🌱 Environment Space Energy Physics Engineering Social Science Earth Science Science
Technology 2026-03-10 3 min read

Zephyrus: An AI Agent That Answers Weather Questions in Plain English

UC San Diego researchers built a framework that lets large language models interact with AI weather forecasting data through natural language queries, though it still stumbles on complex tasks.

AI-powered weather models have gotten dramatically better at forecasting in recent years. But there is a practical problem: the models speak code, not English. They produce massive numerical datasets that require specialized programming skills to analyze. A meteorologist can interpret them. A city emergency manager, a farmer, or a student probably cannot.

Researchers at the University of California San Diego have built the first AI agent designed to bridge that gap. Called Zephyrus, the system takes natural language questions about weather and climate data, translates them into executable code, runs the analysis, and returns the answer in plain language. The team will present the work at the International Conference on Learning Representations (ICLR) in Rio de Janeiro in April 2026.

How Zephyrus works

The architecture has three layers. At the bottom sit AI weather forecasting models that produce gridded numerical predictions of temperature, precipitation, wind, and other variables across space and time. In the middle is a code execution environment that can query and analyze these datasets programmatically. On top sits a large language model (LLM) that translates English queries into code and then translates the code-generated results back into English.

A user might ask: "What will the temperature be in Denver on Thursday?" or "Where in the Midwest will rainfall exceed 50 millimeters this weekend?" Zephyrus parses the question, generates the appropriate code to query the weather model data, executes it, and responds with a natural language answer.

The system can also, in principle, reason about text-based information like meteorology reports and weather bulletins, something pure numerical weather models cannot do at all.

Strong on basics, weak on complexity

In testing, Zephyrus performed well on straightforward tasks: finding specific weather conditions at a location, generating forecasts for particular places and times, and answering factual queries about weather data. These are the kinds of questions that are simple for a human expert with data access but impractical for someone without programming skills.

But the system struggled with more complex tasks. Detecting extreme weather events, which requires integrating multiple variables across space and time to identify unusual patterns, proved difficult. Report generation, which demands structuring information into a coherent narrative format, was also a weakness.

The researchers tested four different frontier LLMs as the language engine powering Zephyrus. All performed with similar accuracy, suggesting that the bottleneck is in the framework's ability to translate complex analytical tasks into code rather than in the language model's capabilities.

The broader ambition

Weather was chosen as the test domain deliberately. It combines large, complex datasets that change over time with the need to communicate findings in plain language. If the approach works for meteorology, the researchers believe similar agents could serve other data-intensive scientific fields, particularly climate science.

Duncan Watson-Parris of the Scripps Institution of Oceanography, a co-author, framed the goal as increasing the speed at which researchers can reason about multimodal data by making it easier for students and young scientists to interact with different datasets. Rose Yu of the Department of Computer Science and Engineering described the vision as democratizing earth science.

What Zephyrus cannot do yet

The system is a proof of concept, not a production tool. Its training dataset is limited, and the researchers plan to use larger datasets for the next iteration. Performance on complex tasks like extreme weather detection and report generation remains insufficient for operational use.

Zephyrus depends on the accuracy of the underlying AI weather models. If those models produce incorrect forecasts, Zephyrus will confidently translate those errors into plain English. The system adds a natural language interface but does not independently verify the data it reports.

The current implementation was tested on standard weather forecasting data. Climate science applications, which involve longer time scales, higher uncertainty, and more complex multi-variable relationships, have not been evaluated. Extending the framework to climate data will require substantial additional development.

The system also inherits the known limitations of large language models, including occasional hallucination (generating plausible-sounding but incorrect information) and sensitivity to how questions are phrased. For safety-critical applications like severe weather warnings, these failure modes would need to be thoroughly addressed before deployment.

Next steps include fine-tuning open-source models for climate-focused tasks and expanding the training datasets.

Source: Fisher, M. et al. "Zephyrus: An Agentic Framework for Weather Science." ICLR 2026. UC San Diego Department of Computer Science and Engineering, Scripps Institution of Oceanography, and Halicioglu Data Science Institute.