Medicine Technology 🌱 Environment Space Energy Physics Engineering Social Science Earth Science Science
Engineering 2026-02-24 3 min read

MambaAlign Detects Manufacturing Defects Missed by Standard Cameras at Near-Real-Time Speed

A new multimodal defect detection framework fuses RGB with thermal or depth sensor data using state-space modeling, improving detection accuracy by up to 6.5% while running at close to 30 frames per second.

Manufacturing quality inspection faces a fundamental sensory mismatch. Standard RGB cameras - fast, cheap, and widely deployed - capture surface appearance but cannot see geometry, heat patterns, or subsurface material structure. A scratch that alters a part's thermal signature goes undetected. A dent that barely changes color fails to register. Missing components in electronics that affect only thermal or structural properties pass through visual inspection systems unnoticed.

Adding sensors solves part of this problem. Thermal cameras reveal heat patterns. Depth scanners capture surface geometry. But adding sensors creates a new problem: combining their outputs reliably and efficiently while maintaining the spatial precision needed to localize defects at the pixel level. Many existing fusion approaches compromise on one of these requirements - losing fine spatial detail, demanding too much computation for real-time use, or failing when sensors are not perfectly aligned, which is common in factory settings where vibration and wear gradually shift sensor positions.

The MambaAlign approach

A research team led by Associate Professor Phan Xuan Tan at Shibaura Institute of Technology, Japan, and Dr. Dinh-Cuong Hoang at FPT University, Vietnam, developed MambaAlign - a framework that addresses these alignment and efficiency challenges through a combination of state-space modeling and selective cross-sensor information exchange.

The study was published in the Journal of Computational Design and Engineering in January 2026.

Rather than processing all spatial relationships between every pair of image locations simultaneously - the approach taken by attention-based deep learning models, which scales quadratically with image size - MambaAlign uses state-space recurrence to capture long-range spatial context at roughly linear computational cost. This matters for defects like fine scratches and cracks, which extend across large portions of an image and require the model to consider distant parts of the image together, but which would be prohibitively expensive to handle with full attention at high resolution.

The system applies this recurrence in multiple orientations, capturing directional context that is particularly relevant for oblique defects - damage that runs at an angle and that purely horizontal or vertical processing would miss. A semantic-level information exchange between the RGB and auxiliary sensor streams happens only at high-level feature stages, avoiding the noise amplification that occurs when low-level pixel data from imperfectly aligned sensors are mixed too early. A top-down reconstruction mechanism then restores fine spatial detail for precise localization.

Performance against existing methods

Tested across three standard RGB-plus-auxiliary-modality (RGB-X) datasets - a standard benchmark format pairing RGB images with depth, thermal, or near-infrared sensor data - MambaAlign improved image-level area under the receiver operating characteristic curve (AUROC) by approximately 4.8 percentage points compared to prior methods. Pixel-level AUROC improved by about 5.0 percentage points. Area under the per-region overlap (AUPRO) curve, which measures how well the system identifies the full extent of defect regions rather than just their location, improved by roughly 6.5 percentage points.

The system ran at close to 30 frames per second at moderate image resolutions with controlled memory usage. In inline inspection applications on conveyor belts, 30 fps is typically sufficient to inspect parts moving at production-line speeds. The memory efficiency also makes the system deployable on standard industrial hardware rather than requiring specialized high-memory computing infrastructure.

"MambaAlign achieves state-of-the-art localization with parameters and runtime suitable for real-time inspection. It not only provides higher detection accuracy but also tighter and less fragmented anomaly maps. This translates directly into fewer false alarms, fewer missed defects, and more actionable outputs for engineers on the factory floor," said Dr. Tan.

Application across manufacturing sectors

The research team describes MambaAlign's relevance across several industrial contexts. Electronics and printed circuit board inspection can use the combination of RGB and thermal imaging to detect micro-cracks, missing components, and solder defects that create thermal anomalies. Aerospace and composite manufacturing can combine RGB with thermal to detect subsurface delamination invisible to standard cameras. Automotive body inspection benefits from improved detection of dents, scratches, and seam irregularities. The system's real-time performance enables inline deployment rather than offline batch inspection, which reduces the gap between defect occurrence and detection.

The study acknowledges that performance was evaluated on existing benchmark datasets rather than in live industrial deployment. Real factory conditions - variable lighting, production-line vibration, accumulated sensor wear, and the full diversity of defect types encountered across a product lifecycle - may differ from benchmark test conditions in ways that affect performance. Field validation across specific industrial applications represents the next development step before broad deployment.

Source: Shibaura Institute of Technology, Japan; FPT University, Vietnam
Lead researchers: Associate Professor Phan Xuan Tan (SIT); Dr. Dinh-Cuong Hoang (FPT University)
Study: Published in Journal of Computational Design and Engineering, Volume 13, Issue 1, January 2026. DOI: https://doi.org/10.1093/jcde/qwaf143