Shrinking an AI Vision Model 1,000-Fold Revealed What Individual Neurons in the Brain Are Actually Doing
For decades, neuroscientists studying vision have faced a paradox. The neural network models that best predict how individual brain cells respond to images are enormous - containing millions of parameters and operating in ways that are essentially opaque. To understand the visual system, researchers have had to choose between accuracy and interpretability. A study published in Nature suggests that trade-off may not be necessary.
The work, led by Cold Spring Harbor Laboratory's Benjamin Cowley and Carnegie Mellon's Matthew Smith, shows that a large AI model trained to predict visual cortex neuron responses in macaques can be compressed to roughly one-thousandth its original size without meaningfully sacrificing accuracy. The resulting model is small enough to attach to an email. More importantly, it is small enough to analyze.
Building Down From Big
The team began with a standard computational pipeline. Macaques were shown carefully curated sets of natural images while the researchers recorded which neurons in their visual cortex fired in response to each image. A large AI model was then trained to predict these neural responses, outperforming competing models by more than 30 percent. That accuracy, though useful, came with a cost: the model was enormous, with millions of parameters and internal representations too complex to interpret directly.
Using machine learning compression techniques, the team then reduced that large model to compact versions thousands of times simpler. The compressed models retained the predictive accuracy of their larger predecessors - they still identified which neurons would respond to which images, and with comparable precision. But now the internal computations were small enough to examine directly.
"This work shows that we don't need massive, complicated networks to understand what individual neurons are doing," said Smith, professor of biomedical engineering and Neuroscience at Carnegie Mellon. "By making the models smaller and interpretable, we can actually gain intuition about how the visual system works and develop hypotheses that can be tested in the lab."
What the Compact Model Revealed
When the researchers examined the internal structure of the compressed models, they found a consistent pattern. Every model neuron decomposed images into low-level features - edges, colors, orientations - and then formed what the researchers describe as "unique preferences" by combining this information in different ways. Some model units responded specifically to eyes within faces. Others activated for particular spatial patterns, like dots in a regular arrangement.
This architecture suggests that the visual cortex does not process images with a single unified strategy. Instead, neurons specialize in extracting particular combinations of elementary features, and visual recognition emerges from the collective activity of many such specialists. That the same pattern appeared in compressed AI models and in real neurons suggests the simplification captured something genuine about the underlying biology rather than introducing artifacts.
The study extended earlier work by Cowley, who had previously used similar methods to model neural responses in fruit flies. Macaques represent a considerably more relevant model organism for human neuroscience - their visual systems are anatomically and functionally much closer to our own. Collaborators at Princeton, led by Jonathan Pillow, contributed statistical modeling expertise to the project.
Implications for AI Design
The finding also raises questions about how modern computer vision systems are built. Deep learning architectures used in face recognition, autonomous vehicles, and medical imaging were originally inspired by the structure of biological visual systems. They perform impressively in controlled conditions but fail in ways that human vision handles easily - unusual lighting, partial occlusion, adversarial inputs designed to fool the classifier.
Understanding the actual computational principles that biological visual systems use - rather than simply mimicking their architecture at a gross scale - might inform better designs. The compact models described here, because they are interpretable, provide a tool for extracting those principles.
Several limitations bear noting. The recordings were made in macaques, not humans, and used a specific set of curated natural images rather than arbitrary visual input. The models predict responses in the primary visual cortex and nearby areas; higher-level visual processing, which involves face recognition and object categorization, was not the focus. Whether the same compression approach would work for modeling those stages remains an open question.
The study was conducted across three institutions - Cold Spring Harbor Laboratory, Carnegie Mellon University, and Princeton University - and published in Nature.