Medicine Technology 🌱 Environment Space Energy Physics Engineering Social Science Earth Science Science
Medicine 2026-03-03 3 min read

TweetyBERT: How a Self-Supervised AI Trained on Canary Songs Could Help Decode the Human Brain

University of Oregon neuroscientists adapted the BERT language model architecture to automatically annotate canary vocalizations without human-labeled training data, achieving expert-level accuracy across 30-40 syllable repertoires.

To understand how the brain learns language, neuroscientists have spent decades studying songbirds. Canaries are particularly valuable because they learn complex, lengthy songs throughout their lives - acquiring new syllables, rearranging sequences, and modifying their vocalizations in response to social context and season. That behavioral flexibility mirrors, in important respects, the kind of learning the human brain performs when it acquires and produces speech.

The problem is scale. A canary song contains 30 to 40 distinct syllables, and analyzing even a few hours of recorded vocalizations by hand requires expert annotators working for days. Automated methods exist, but they typically require large amounts of human-labeled training data to function - creating a bottleneck that limits how much vocal data researchers can practically analyze.

A team at the University of Oregon has built a way around that bottleneck. The result, called TweetyBERT, is a self-supervised machine learning model that can segment and classify canary vocalizations with expert-level accuracy without ever seeing a human-annotated example during training.

BERT for Birdsong

TweetyBERT adapts BERT - the transformer architecture that underpins many modern large language models - to handle the acoustic structure of birdsong. In language processing, BERT learns representations of words and sentences by training on massive text corpora, using a self-supervised objective that involves predicting masked portions of the input. TweetyBERT applies the same principle to audio: the model is trained to predict masked or hidden fragments of birdsong spectrograms from the surrounding context, learning to represent the statistical structure of canary vocalizations in the process.

"Current AI methods for analyzing animal vocalizations require human-labeled training data, a slow and labor-intensive process," said Tim Gardner, associate professor of bioengineering at the University of Oregon's Knight Campus. "We developed TweetyBERT, a self-supervised neural network for analyzing birdsongs. It can rapidly process unlabeled vocal recordings, identify communication units, and annotate sequences."

The system is trained entirely on raw audio without any human annotation, then applied to new recordings to automatically identify and classify syllables. Its accuracy on held-out test data matches that of trained human expert annotators - a level of performance that previous automated systems have generally failed to achieve without supervised training on labeled datasets.

What Researchers Can Do With It

George Vengrovski, a graduate student in Gardner's lab who led the development of TweetyBERT, points to the system's ability to track individual variation and change over time as particularly valuable for neuroscience. Because the model can quickly process large volumes of unlabeled recordings, researchers can now analyze how a single bird's song changes across weeks or months - tracking learning, aging, social influence, and brain circuit manipulation with a resolution that was previously impractical.

For understanding the neural basis of learned vocal behavior, this matters. The circuits that control song production in canaries share organizational features with the circuits that underlie speech in humans - specifically the cortico-basal ganglia-thalamic loops that support motor learning and sequence production. Studies of how those circuits function during song learning, and how they are disrupted by targeted interventions, depend on having accurate, high-throughput annotations of the vocal output they produce.

"This ability to classify and annotate songs quickly, finding differences across individuals and tracking how songs change over time, can help neuroscientists uncover the neural underpinnings of how the brain learns and produces language," Vengrovski said.

Beyond Canaries

The approach is not limited to laboratory songbirds. Gardner notes that the underlying method is already being applied to dolphins and whales, and that with appropriate modification - different acoustic preprocessing, different input representations - the same self-supervised learning framework could extend to any species with structured vocalizations.

One application Gardner highlights is ecological monitoring. Many bird species are experiencing population changes driven by habitat loss, urbanization, and climate change, and vocal behavior is one of the first things to shift when birds respond to environmental pressure. A scalable method for automatically tracking vocal patterns in wild populations could provide sensitive, continuous indicators of how those populations are faring - data that current manual annotation approaches cannot generate at the necessary scale.

"We built this for canaries, but the underlying approach isn't species-specific, and the world is full of birds whose vocal behavior we're barely tracking," Gardner said.

The study is limited in that TweetyBERT was developed and validated on canary recordings, and direct comparisons with other species will require additional validation work. The model's performance on vocalizations with different acoustic properties - shorter or more variable syllable boundaries, less stereotyped sequences - has not yet been fully characterized. The researchers present it as a platform for continued development rather than a finished tool ready for universal deployment across species.

Source: Vengrovski et al., Patterns (2026). Senior author: Tim Gardner, University of Oregon Knight Campus. Media contact: Molly Blancett, University of Oregon, blancett@uoregon.edu, 541-515-5155.