LDMSNet: Lightweight Dual-Branch Multi-Scale Network for Real-Time Semantic Segmentation of Autonomous Driving |
Haoran Yang1, Dan Zhang1, Jiazai Liu1, Zekun Cao2, Na Wang3 |
1College of Information Science and Technology & Artificial Intelligence, Nanjing Forestry University, Nanjing, 210037, China 2College of Safety and Emergency Management Engineering, Taiyuan University of Technology, Taiyuan, 30024, China 3Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, 200240, China |
|
Received: June 27, 2024; Revised: September 28, 2024 Accepted: October 10, 2024. Published online: November 7, 2024. |
|
|
ABSTRACT |
Semantic segmentation plays a crucial role in autonomous driving systems, serving as a key technology for understanding and interpreting the road environment. Most existing semantic segmentation networks strive for high accuracy, but achieving true real-time performance while maintaining high accuracy remains a challenge. However, autonomous driving systems require extremely high reaction speed and real-time processing capabilities, and any processing delay may lead to safety risks. To solve this problem, this paper proposes a lightweight dual-branch multi-scale network (LDMSNet) to achieve real-time semantic segmentation. First, the effective dilated bottleneck (EDB) is proposed to efficiently extract semantic information and spatial information using complementary dual-branch structure and depth-wise dilated convolution. Second, the multi-scale pyramid pooling module (MSPPM) is proposed, which uses a hierarchical residual structure and combines with dilated convolution to extract detailed information from low-resolution branches. Third, the polarized self-attention mechanism (PSA) is introduced to further enhance the interaction and correlation between features and improve the ability to perceive global information. The experimental results show that LDMSNet achieves 74.46% MIoU at 113FPS on the Cityscapes dataset, 71.51% MloU at 153FPS on the CamVid dataset and 77.41% MIoU at 170FPS on the StreetView dataset, effectively balancing speed and accuracy compared to state-of-the-art models. |
Key Words:
Semantic segmentation · Autonomous driving · Attention mechanism · Multi-scale feature · Feature fusion |
|