Optimization of Tensor Operation in Compiler

Chenguang Qiu; Jun Wu; Haoqi Ren; Zhifeng Zhang

Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings

Research Article

Optimization of Tensor Operation in Compiler

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-34790-0_16,
    author={Chenguang Qiu and Jun Wu and Haoqi Ren and Zhifeng Zhang},
    title={Optimization of Tensor Operation in Compiler},
    proceedings={Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings},
    proceedings_a={CHINACOM},
    year={2023},
    month={6},
    keywords={MLIR Deep learning Compiler Vector processor},
    doi={10.1007/978-3-031-34790-0_16}
}

Chenguang Qiu
Jun Wu
Haoqi Ren
Zhifeng Zhang
Year: 2023
Optimization of Tensor Operation in Compiler
CHINACOM
Springer
DOI: 10.1007/978-3-031-34790-0_16

Chenguang Qiu¹^,*, Jun Wu², Haoqi Ren¹, Zhifeng Zhang¹

1: Department of Computer Science
2: School of Computer Science

*Contact email: chenguangqcg@163.com

Abstract

This paper proposes an AI compiler architecture, which can compile the trained model and deploy it on DSP chip. The biggest difficulty in deploying the reasoning model on DSP is the multiplication between tensors. Tensor multiplication is the main operation and the most time-consuming operation in the process of model reasoning. Therefore, the operation efficiency of tensor multiplication directly restricts the performance of reasoning. However, there is no matrix computing unit in DSP chip, instead of vector computing unit. We define a new dialect in MLIR(Multi-Level Intermediate Representation) to efficiently compile AI models, especially GEMM and conv operations. The dialect is based on the basic features of mhlo, so this new dialect can make full use of the existing optimized pass of mhlo. Moreover, we have added some functions to support architecture related optimization, mainly the lower algorithm of operation, such as GEMM and conv. we finally map dialect to LLVM dialect and convert it into LLVM IR(immediate representation). The advantage of converting to LLVM IR is that more detailed instruction scheduling can be carried out at the backend of the compiler. We compare the efficiency of a speech model in the code generated by the traditional compiler clang and the code generated by our compiler. The experimental results show that this conversion method has greatly improved the efficiency.

Keywords: MLIR Deep learning Compiler Vector processor

Published: 2023-06-10
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-34790-0_16

Optimization of Tensor Operation in Compiler

Abstract

About EAI

Community

Publish with EAI