Photonic Chips Run Neural Networks Entirely in Light, With 320-Picosecond Latency
The bottleneck in photonic computing has always been the same: light handles the easy math well, but when the computation requires nonlinear steps -- the kind that make learning and decision-making possible -- the signal has to be converted back to electronic form. That conversion eats up the speed and energy advantages that made photonics attractive in the first place.
A team at Xidian University in China has built a two-chip system that keeps the entire process in light.
Two chips, one all-optical pipeline
The system, described in Optica, consists of two fabricated chips working together. The first is a 16-by-16 Mach-Zehnder interferometer mesh chip designed specifically for spiking neural networks. It handles the linear computation -- the matrix operations that form the backbone of neural network processing. The second chip contains an array of distributed feedback lasers with saturable absorbers, optimized to provide the nonlinear activation functions that neural networks need for learning.
Together, the two chips form a 16-channel photonic neuromorphic computing system with 272 trainable parameters. That is modest by conventional computing standards, but it represents a significant step for photonic neuromorphic hardware, which has struggled to perform both halves of the computation in the optical domain.
Reinforcement learning in photonic hardware
To demonstrate what the system can do, the researchers implemented reinforcement learning -- a type of machine learning where a system improves through trial and error. They tested it on two standard benchmark tasks: CartPole, where a pole must be balanced on a moving cart, and Pendulum, where a pendulum must be swung upright and kept balanced.
The system used a hybrid training approach. Models were first trained globally in software, then deployed on the physical chips, and finally fine-tuned in software to account for chip-level manufacturing variations. The hardware-only decisions were nearly as accurate as pure software: performance dropped by just 1.5% on CartPole and 2% on Pendulum.
Using the combined hardware-software framework, the system achieved perfect performance on CartPole and strong results on the more complex Pendulum task.
Speed and efficiency numbers
The performance metrics place the system in competitive territory. For linear operations, it achieved an energy efficiency of 1.39 tera operations per second per watt (TOPS/W) and a computing density of 0.13 TOPS per square millimeter. For nonlinear computation, the figures were 987.65 giga operations per second per watt and 533.33 GOPS per square millimeter.
Those numbers put the chips in the GPU-class range for energy efficiency and in the range of GPUs and application-specific integrated circuits for computing density. But the most striking number is latency: on-chip computation takes just 320 picoseconds -- 320 trillionths of a second. For applications like autonomous driving, where reaction time is measured in milliseconds, that margin is enormous.
What it cannot do yet
The system is a proof of concept, not a product. Sixteen channels and 272 parameters are enough to solve benchmark control tasks, but real-world applications in autonomous navigation or robotics will require much larger networks. The researchers plan to design and fabricate a 128-channel version for more complex reinforcement learning tasks.
There is also the integration challenge. The current system uses separate chips connected through an opto-electronic testing setup. A practical edge computing device would need everything on a single compact chip -- or at least a tightly integrated package. That engineering problem has not been solved yet.
The hybrid training approach, while effective, also highlights a current dependency on conventional software for initial model training. A fully autonomous photonic learning system remains a longer-term goal.
Still, the core achievement -- performing both linear and nonlinear neural computation entirely in the optical domain, at picosecond speeds and GPU-competitive efficiency -- addresses what has been a fundamental limitation in the field.