Transformer-Based Object Detection with Deep Feature Fusion Using Carafe Operator in Remote Sensing Image

Shenao Chen; Bingqi Wang; Chaoliang Zhong

Editorial

Transformer-Based Object Detection with Deep Feature Fusion Using Carafe Operator in Remote Sensing Image

Download111 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/ew.3404,
    author={Shenao Chen and Bingqi Wang and Chaoliang Zhong},
    title={Transformer-Based Object Detection with Deep Feature Fusion Using Carafe Operator in Remote Sensing Image},
    journal={EAI Endorsed Transactions on Energy Web},
    volume={10},
    number={1},
    publisher={EAI},
    journal_a={EW},
    year={2023},
    month={8},
    keywords={Remote sensing image, transformer, target decision},
    doi={10.4108/ew.3404}
}

Shenao Chen
Bingqi Wang
Chaoliang Zhong
Year: 2023
Transformer-Based Object Detection with Deep Feature Fusion Using Carafe Operator in Remote Sensing Image
EW
EAI
DOI: 10.4108/ew.3404

Shenao Chen¹, Bingqi Wang², Chaoliang Zhong¹^,*

1: Hangzhou Dianzi University
2: Beijing Forestry University

*Contact email: chaoliang_zhong@outlook.com

Abstract

Recently, broad applications can be found in optical remote sensing images (ORSI), such as in urban planning, military mapping, field survey, and so on. Target detection is one of its important applications. In the past few years, with the wings of deep learning, the target detection algorithm based on CNN has harvested a breakthrough. However, due to the different directions and target sizes in ORSI, it will lead to poor performance if the target detection algorithm for ordinary optical images is directly applied. Therefore, how to improve the performance of the object detection model on ORSI is thorny. Aiming at solving the above problems, premised on the one-stage target detection model-RetinaNet, this paper proposes a new network structure with more efficiency and accuracy, that is, a Transformer-Based Network with Deep Feature Fusion Using Carafe Operator (TRCNet). Firstly, a PVT2 structure based on the transformer is adopted in the backbone and we apply a multi-head attention mechanism to obtain global information in optical images with complex backgrounds. Meanwhile, the depth is increased to better extract features. Secondly, we introduce the carafe operator into the FPN structure of the neck to integrate the high-level semantics with the low-level ones more efficiently to further improve its target detection performance. Experiments on our well-known public NWPU-VHR-10 and RSOD show that mAP increases by 8.4% and 1.7% respectively. Comparison with other advanced networks also witnesses that our proposed network is effective and advanced.

Keywords: Remote sensing image, transformer, target decision

Received: 2023-05-30
Accepted: 2023-08-21
Published: 2023-08-23
Publisher: EAI

: http://dx.doi.org/10.4108/ew.3404

Copyright © 2023 Chen et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.