GSA-YOLO: an enhanced underwater object detector integrating ghost module and spatial-channel synergistic attention

Our take

The GSA-YOLOv11 model introduces significant advancements in underwater object detection by addressing key challenges faced by conventional algorithms. By integrating a Ghost module that expands channel width and a Spatial-Channel Synergistic Attention (SCSA) mechanism, this model enhances the perceptual capabilities of autonomous underwater robots. The GSA-YOLOv11 demonstrates improved detection accuracy, achieving increases in mean average precision (mAP) while maintaining efficiency with 10.6 GFLOPs and 836.59 FPS. This innovative approach not only boosts detection performance in complex environments but also optimizes resource usage

Conventional object detection algorithms for autonomous underwater robot perception face two primary challenges. Firstly, pronounced underwater images degradation impedes algorithm performance. Secondly, the diversity and complexity of underwater targets demand sophisticated algorithms, yet current methods often suffer from high computational resource consumption, low detection accuracy, and reduced efficiency. This study proposes GSA-YOLOv11, a YOLOv11-based model to enhance the perceptual capabilities of underwater robots. First, the Ghost module is integrated into the Backbone to replace the C3k2 module. Unlike conventional usage that compresses channels, we strategically configure the module to expand channel width while exploiting its cheap operations, achieving an “expansion--moderation” balance that increases model capacity without parameter explosion. This design generates numerous Ghost feature maps to capture richer intrinsic feature information, thereby enhancing the model’s representational ability and object detection performance in complex underwater environments and improving detection robustness. Secondly, the SCSA (Spatial-Channel Synergistic Attention) mechanism is integrated into the detection head to effectively capture features in both channel and spatial dimensions. This synergy enhances cross-scale target detection, achieving an optimal balance between accuracy, detection speed, and model complexity. Comparative experiments were conducted on the DUO dataset, showing that the mean average precision (mAP) at 50% (mAP50) and mAP at 50% to 95% (mAP50-95) of the GSA-YOLOv11 model increased by 2.73% and 3.52%, respectively, in comparison to the baseline model. Concurrently, the model exhibits 10.6 GFLOPs and 836.59 FPS, sufficient to enhance environmental perception under the computing constraints of small onboard devices. Moreover, comparative experiments on the UDID dataset demonstrate that GSA-YOLOv11 outperforms baseline models. Ablation experiments validate the optimization performance and synergistic ability of the two modules. By implementing targeted enhancements for small-target detection in optically degraded underwater environments, this model offers insights for enhancing the environmental perception and operational capabilities of underwater robots.

Tagged with

#autonomous underwater vehicles#environmental DNA#interactive ocean maps#GSA-YOLOv11#underwater object detection#Ghost module#spatial-channel synergistic attention#autonomous underwater robots#image degradation#detection accuracy#computational resource consumption#detection speed#model complexity#cross-scale target detection#mean average precision (mAP)#DUO dataset#UDID dataset#performance optimization#environmental perception#intrinsic feature information