ew 23(1):

Editorial

Transformer-Based Object Detection with Deep Feature Fusion Using Carafe Operator in Remote Sensing Image

Download111 downloads
  • @ARTICLE{10.4108/ew.3404,
        author={Shenao Chen and Bingqi Wang and Chaoliang Zhong},
        title={Transformer-Based Object Detection with Deep Feature Fusion Using Carafe Operator in Remote Sensing Image},
        journal={EAI Endorsed Transactions on Energy Web},
        volume={10},
        number={1},
        publisher={EAI},
        journal_a={EW},
        year={2023},
        month={8},
        keywords={Remote sensing image, transformer, target decision},
        doi={10.4108/ew.3404}
    }
    
  • Shenao Chen
    Bingqi Wang
    Chaoliang Zhong
    Year: 2023
    Transformer-Based Object Detection with Deep Feature Fusion Using Carafe Operator in Remote Sensing Image
    EW
    EAI
    DOI: 10.4108/ew.3404
Shenao Chen1, Bingqi Wang2, Chaoliang Zhong1,*
  • 1: Hangzhou Dianzi University
  • 2: Beijing Forestry University
*Contact email: chaoliang_zhong@outlook.com

Abstract

Recently, broad applications can be found in optical remote sensing images (ORSI), such as in urban planning, military mapping, field survey, and so on. Target detection is one of its important applications. In the past few years, with the wings of deep learning, the target detection algorithm based on CNN has harvested a breakthrough. However, due to the different directions and target sizes in ORSI, it will lead to poor performance if the target detection algorithm for ordinary optical images is directly applied. Therefore, how to improve the performance of the object detection model on ORSI is thorny. Aiming at solving the above problems, premised on the one-stage target detection model-RetinaNet, this paper proposes a new network structure with more efficiency and accuracy, that is, a Transformer-Based Network with Deep Feature Fusion Using Carafe Operator (TRCNet). Firstly, a PVT2 structure based on the transformer is adopted in the backbone and we apply a multi-head attention mechanism to obtain global information in optical images with complex backgrounds. Meanwhile, the depth is increased to better extract features. Secondly, we introduce the carafe operator into the FPN structure of the neck to integrate the high-level semantics with the low-level ones more efficiently to further improve its target detection performance. Experiments on our well-known public NWPU-VHR-10 and RSOD show that mAP increases by 8.4% and 1.7% respectively. Comparison with other advanced networks also witnesses that our proposed network is effective and advanced.