A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

Yalin Tang; Haoqi Ren; Zhifeng Zhang

Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings

Research Article

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-34790-0_20,
    author={Yalin Tang and Haoqi Ren and Zhifeng Zhang},
    title={A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA},
    proceedings={Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings},
    proceedings_a={CHINACOM},
    year={2023},
    month={6},
    keywords={convolutional neural network depthwise convolution quantization hardware accelerator EfficientNet},
    doi={10.1007/978-3-031-34790-0_20}
}

Yalin Tang
Haoqi Ren
Zhifeng Zhang
Year: 2023
A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA
CHINACOM
Springer
DOI: 10.1007/978-3-031-34790-0_20

Yalin Tang^,*, Haoqi Ren¹, Zhifeng Zhang¹

1: School of Electronics and Information Engineering

*Contact email: 2030815@tongji.edu.cn

Abstract

With the development of lightweight convolutional neural networks (CNNs), these newly proposed networks are more powerful than previous conventional models [4,5] and can be well applied in Internet-of-Things (IoT) and edge computing. However, they perform inefficiently on conventional hardware accelerators because of the irregular connectivity in the structure. Though there are some accelerators based on unified engine (UE) architecture or separated engine (SE) architecture which can perform well for both standard convolution and depthwise convolution, these versatile structures are still not efficient for lightweight CNNs such as EfficientNet-lite. In this paper, we propose a reconfigurable engine (RE) architecture to improve the efficiency, which is used in communications such as IoT and edge computing. In addition, we adopt integer quantization method to reduce computational complexity and memory access. Also, the block-based calculation scheme is used to further reduce the off-chip memory access and the unique computational mode is used to improve the utilization of the processing elements. The proposed architecture can be implemented on Xilinx ZC706 with a 100 MHz system clock for EfficientNet-lite0. Our accelerator achieved 196 FPS and 72.9% top-1 accuracy on ImageNet classification, which is 27% and 18% speedup compared to CPU and GPU of Pixel 4 respectively.

Keywords: convolutional neural network depthwise convolution quantization hardware accelerator EfficientNet

Published: 2023-06-10
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-34790-0_20

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

Abstract

About EAI

Community

Publish with EAI