
Research Article
Feasibility and Design Trade-Offs of Neural Network Accelerators Implemented on Reconfigurable Hardware
@INPROCEEDINGS{10.1007/978-3-030-63083-6_9, author={Quang-Kien Trinh and Quang-Manh Duong and Thi-Nga Dao and Van-Thanh Nguyen and Hong-Phong Nguyen}, title={Feasibility and Design Trade-Offs of Neural Network Accelerators Implemented on Reconfigurable Hardware}, proceedings={Industrial Networks and Intelligent Systems. 6th EAI International Conference, INISCOM 2020, Hanoi, Vietnam, August 27--28, 2020, Proceedings}, proceedings_a={INISCOM}, year={2020}, month={11}, keywords={Neural network FPGA accelerator Data recognition}, doi={10.1007/978-3-030-63083-6_9} }
- Quang-Kien Trinh
Quang-Manh Duong
Thi-Nga Dao
Van-Thanh Nguyen
Hong-Phong Nguyen
Year: 2020
Feasibility and Design Trade-Offs of Neural Network Accelerators Implemented on Reconfigurable Hardware
INISCOM
Springer
DOI: 10.1007/978-3-030-63083-6_9
Abstract
In recent years, neural networks based algorithms have been widely applied in computer vision applications. FPGA technology emerges as a promising choice for hardware acceleration owing to high-performance and flexibility; energy-efficiency compared to CPU and GPU; fast development round. FPGA recently has gradually become a viable alternative to the GPU/CPU platform.
This work conducts a study on the practical implementation of neural network accelerators based-on reconfigurable hardware (FPGA). This systematically analyzes utilization-accuracy-performance trade-offs in the hardware implementations of neural networks using FPGAs and discusses the feasibility of applying those designs in reality.
We have developed a highly generic architecture for implementing a single neural network layer, which eventually permits further construct arbitrary networks. As a case study, we implemented a neural network accelerator on FPGA for MNIST and CIFAR-10 dataset. The major results indicate that the hardware design outperforms by at least 1500 times when the parallel coefficient( p )is 1 and maybe faster up to 20,000 times when that is 16 compared to the implementation on the software while the accuracy degradations in all cases are negligible, i.e., about 0.1% lower. Regarding resource utilization, modern FPGA undoubtedly can accommodate those designs, e.g., 2-layer design with( p )equals 4 for MNIST and CIFAR occupied 26% and 32% of LUT on Kintex-7 XC7K325T respectively.