Combining Feature Selection Methods with BERT: An In-depth Experimental Study of Long Text Classification

Kai Wang; Jiahui Huang; Yuqi Liu; Bin Cao; Jing Fan

Collaborative Computing: Networking, Applications and Worksharing. 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16–18, 2020, Proceedings, Part I

Research Article

Combining Feature Selection Methods with BERT: An In-depth Experimental Study of Long Text Classification

Download

53 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-030-67537-0_34,
    author={Kai Wang and Jiahui Huang and Yuqi Liu and Bin Cao and Jing Fan},
    title={Combining Feature Selection Methods with BERT: An In-depth Experimental Study of Long Text Classification},
    proceedings={Collaborative Computing: Networking, Applications and Worksharing. 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16--18, 2020, Proceedings, Part I},
    proceedings_a={COLLABORATECOM},
    year={2021},
    month={1},
    keywords={Text classification Long text BERT Feature selection},
    doi={10.1007/978-3-030-67537-0_34}
}

Kai Wang
Jiahui Huang
Yuqi Liu
Bin Cao
Jing Fan
Year: 2021
Combining Feature Selection Methods with BERT: An In-depth Experimental Study of Long Text Classification
COLLABORATECOM
Springer
DOI: 10.1007/978-3-030-67537-0_34

Kai Wang, Jiahui Huang, Yuqi Liu, Bin Cao^,*, Jing Fan

*Contact email: bincao@zjut.edu.cn

Abstract

With the introduction of BERT by Google, a large number of pre-training models have been proposed. Using pre-training models to solve text classification problems has become the mainstream. However, the complexity of BERT grows quadratically with the text length, hence BERT is not suitable for processing long text. Then the researchers proposed a new pre-training model XLNet to solve the long text classification problem. But XLNet requires more GPUs and longer fine-tuning time than BERT. To the best of our knowledge, no attempt has been done before combining traditional feature selection methods with BERT for long text classification. In this paper, we use the classic feature selection methods to shorten the long text and then use the shortened text as the input of BERT. Finally, we conduct extensive experiments on the public data set and the real-world data set from China Telecom. The experimental results prove that our methods are effective for helping BERT to process long text.

Keywords: Text classification, Long text, BERT, Feature selection

Published: 2021-01-22
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-030-67537-0_34

Combining Feature Selection Methods with BERT: An In-depth Experimental Study of Long Text Classification

Abstract

About EAI

Community

Publish with EAI