An attempt at underwater image lightweight super-resolution using transformer and frequency-domain learning

Our take

This study introduces the Frequency-domain Learning Transformer (FLT), a novel approach to underwater image super-resolution (SR) that addresses the challenges of low-resolution imaging in complex underwater environments. By leveraging both spatial and frequency domain information, FLT enhances fine-grained detail reconstruction while significantly reducing computational costs. The architecture incorporates Residual Dual-domain Joint Learning Transformer Blocks (RDTBs) and a Multi-scale FeedForward Neural network for improved visual fidelity.

Lightweight image Super-resolution (SR) is a computer vision technology that aims to recover high-quality image details from low-resolution images with limited computing costs. While Transformer-based SR models have made remarkable advancements, their balanced edge-end deployment and reconstruction quality have been notably hindered by complex underwater imaging conditions and the scarcity of publicly available high-quality datasets. To address these issues, we propose a Frequency-domain Learning Transformer (FLT) for underwater images SR, which leverages complementary information from spatial and frequency domains to enable fine-grained detail reconstruction while reducing storage and computing costs. Specifically, FLT comprises Residual Dual-domain Joint Learning Transformer Blocks (RDTBs). Each RDTB captures low-frequency structures via the spatial-domain branch and high-frequency textures via the frequency-domain branch, thereby enhancing fine-grained details of lightweight SR. Furthermore, a Multi-scale FeedForward Neural (Ms-FFN) network is incorporated into each RDTB as an auxiliary detail enhancement module, which improves the visual fidelity of reconstructed images through multi-scale feature aggregation. We perform visual and quantitative comparisons, ablation studies, and model analyses against state-of-the-art methods on both the public UFO-120 dataset and the KLSG-II dataset. Experimental results demonstrate that FLT achieves performance comparable to or exceeding state-of-the-art SR models, while having significantly reduced by about 50% to 60% parameters and drastically reduced computational cost. This unique balance between reconstruction quality and efficiency underscores FLT’s superiority for lightweight underwater SR, providing a promising solution for resource-constrained underwater imaging applications. The code is available at https://github.com/WanghtCC/FLT.

Tagged with

#autonomous underwater vehicles#research datasets#super-resolution#underwater image#frequency-domain learning#Transformer#Residual Dual-domain Joint Learning Transformer Blocks#lightweight SR#RDTBs#spatial-domain#frequency-domain#high-resolution images#detail enhancement#computational cost#Multi-scale FeedForward Neural#UFO-120 dataset#KLSG-II dataset#feature aggregation#visual fidelity#state-of-the-art methods