A Deep Learning Compiler for Vector Processor

Pingping Pan; Jun Wu; Songyuan Zhao; Haoqi Ren; Zhifeng Zhang

Communications and Networking. 15th EAI International Conference, ChinaCom 2020, Shanghai, China, November 20-21, 2020, Proceedings

Research Article

A Deep Learning Compiler for Vector Processor

Download

74 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-030-67720-6_46,
    author={Pingping Pan and Jun Wu and Songyuan Zhao and Haoqi Ren and Zhifeng Zhang},
    title={A Deep Learning Compiler for Vector Processor},
    proceedings={Communications and Networking. 15th EAI International Conference, ChinaCom 2020, Shanghai, China, November 20-21, 2020,  Proceedings},
    proceedings_a={CHINACOM},
    year={2021},
    month={2},
    keywords={Deep learning compiler Target optimization Code generation Vector processor},
    doi={10.1007/978-3-030-67720-6_46}
}

Pingping Pan
Jun Wu
Songyuan Zhao
Haoqi Ren
Zhifeng Zhang
Year: 2021
A Deep Learning Compiler for Vector Processor
CHINACOM
Springer
DOI: 10.1007/978-3-030-67720-6_46

Pingping Pan¹, Jun Wu²^,*, Songyuan Zhao¹, Haoqi Ren¹, Zhifeng Zhang¹

1: Department of Computer Science
2: School of Computer Science

*Contact email: wujun@fudan.edu.cn

Abstract

The technical route of machine learning compiler generally refers to the application of automatic or semi-automatic code generation in the optimization process instead of hand-optimization. This paper presents a deep learning compiler (DLCS) for target vector processor based on LLVM framework, which lowers deep learning (DL) models to an intermediate representation (IR) of two levels. The high-level IR realizes target-independent optimizations including kernel fusion, data replacement and data simplification, while the low-level IR allows the compiler to perform target-dependent optimizations, such as Eight-Slots VLIW and special intrinsic function. The proposed compiler customizes the architecture description of target vector processor to achieve a high-quality automatic code generation. We evaluate the performance comparison between DLCS and hand-optimization when deploying ResNet-18 model and MobileNet model to the target vector processor. Experimental results show that DLCS offers Multi-slot parallel performance for target vector processor and achieves speedups ranging from 1.5× to 3.0× over existing frameworks backed by hand-optimized libraries.

Keywords: Deep learning compiler, Target optimization, Code generation, Vector processor

Published: 2021-02-02
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-030-67720-6_46

A Deep Learning Compiler for Vector Processor

Abstract

About EAI

Community

Publish with EAI