
Research Article
Semantic Segmentation-Based Enhancement of Visual SLAM Loop Closure Detection in Dynamic Indoor Environments
@INPROCEEDINGS{10.4108/eai.21-11-2024.2354632, author={Lu Wang and Chao Hu and Xiaoxia Lu}, title={Semantic Segmentation-Based Enhancement of Visual SLAM Loop Closure Detection in Dynamic Indoor Environments}, proceedings={Proceedings of the 2nd International Conference on Machine Learning and Automation, CONF-MLA 2024, November 21, 2024, Adana, Turkey}, publisher={EAI}, proceedings_a={CONF-MLA}, year={2025}, month={3}, keywords={loop closure detection dynamic environment semantic segmentation motion intensity centroid coordinate dynamic weight allocation}, doi={10.4108/eai.21-11-2024.2354632} }
- Lu Wang
Chao Hu
Xiaoxia Lu
Year: 2025
Semantic Segmentation-Based Enhancement of Visual SLAM Loop Closure Detection in Dynamic Indoor Environments
CONF-MLA
EAI
DOI: 10.4108/eai.21-11-2024.2354632
Abstract
Current visual SLAM loop closure detection algorithms encounter significant challenges in dynamic environments, where moving objects such as pedestrians lead to inconsistencies in feature points, compromising map accuracy. This study proposes a novel visual SLAM loop closure detection algorithm leveraging semantic segmentation, specifically designed for complex indoor dynamic scenarios. The proposed approach introduces the Bottleneck with Squeeze and Excitation Block (BnSEBlock) to improve the U-Net++ semantic segmentation model by incorporating residual connections, dilated convolutions, and an adaptive attention mechanism. Dynamic weights are assigned to semantic information based on motion intensity and centroid coordinates, which are derived through adaptive HDBSCAN clustering. Loop closure is identified by assessing the similarity between keyframes and candidate frames using these weighted parameters. Experimental evaluations on publicly available datasets demonstrate that the enhanced U-Net++ model achieves a Mean Intersection over Union (MIoU) of 76.9% and reduces the loss to 0.172. In comparison, the traditional bag-of-words-based approach yields a maximum similarity of 0.273 for loop images. The proposed algorithm shows a 61.57% improvement in localization accuracy within dynamic indoor environments.