
Research Article
An Empirical Study on Model Pruning and Quantization
@INPROCEEDINGS{10.1007/978-3-031-40467-2_7, author={Yuzhe Tian and Tom H. Luan and Xi Zheng}, title={An Empirical Study on Model Pruning and Quantization}, proceedings={Broadband Communications, Networks, and Systems. 13th EAI International Conference, BROADNETS 2022, Virtual Event, March 12-13, 2023 Proceedings}, proceedings_a={BROADNETS}, year={2023}, month={7}, keywords={Model compression Deep neural network Edge computing}, doi={10.1007/978-3-031-40467-2_7} }
- Yuzhe Tian
Tom H. Luan
Xi Zheng
Year: 2023
An Empirical Study on Model Pruning and Quantization
BROADNETS
Springer
DOI: 10.1007/978-3-031-40467-2_7
Abstract
In machine learning, model compression is vital for resource-constrained Internet of Things (IoT) devices, such as unmanned aerial vehicles (UAVs) and smart phones. Currently there are some state-of-the-art (SOTA) compression methods, but little study is conducted to evaluate these techniques across different models and datasets. In this paper, we present an in-depth study on two SOTA model compression methods, pruning and quantization. We apply these methods on AlexNet, ResNet18, VGG16BN and VGG19BN, with three well known datasets,Fashion-MNIST,CIFAR-10, andUCI-HAR. Through our study, we draw the conclusion that, applying pruning and retraining could keep the performance (average less than(0.5\%)degrade) while reducing the model size (at(10\times )compression rate) on spatial domain datasets (e.g.pictures); the performance on temporal domain datasets (e.g.motion sensors data) degrades more (average about(5.0\%)degrade); the performance of quantization is related with the pruning rate and the network architecture. We also compare different clustering methods and reveal the impact on model accuracy and quantization ratio. Finally, we provide some interesting directions for future research.