About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Collaborative Computing: Networking, Applications and Worksharing. 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16–18, 2020, Proceedings, Part I

Research Article

Combining Feature Selection Methods with BERT: An In-depth Experimental Study of Long Text Classification

Download(Requires a free EAI acccount)
5 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-030-67537-0_34,
        author={Kai Wang and Jiahui Huang and Yuqi Liu and Bin Cao and Jing Fan},
        title={Combining Feature Selection Methods with BERT: An In-depth Experimental Study of Long Text Classification},
        proceedings={Collaborative Computing: Networking, Applications and Worksharing. 16th EAI International Conference, CollaborateCom 2020, Shanghai, China, October 16--18, 2020, Proceedings, Part I},
        proceedings_a={COLLABORATECOM},
        year={2021},
        month={1},
        keywords={Text classification Long text BERT Feature selection},
        doi={10.1007/978-3-030-67537-0_34}
    }
    
  • Kai Wang
    Jiahui Huang
    Yuqi Liu
    Bin Cao
    Jing Fan
    Year: 2021
    Combining Feature Selection Methods with BERT: An In-depth Experimental Study of Long Text Classification
    COLLABORATECOM
    Springer
    DOI: 10.1007/978-3-030-67537-0_34
Kai Wang, Jiahui Huang, Yuqi Liu, Bin Cao,*, Jing Fan
    *Contact email: bincao@zjut.edu.cn

    Abstract

    With the introduction of BERT by Google, a large number of pre-training models have been proposed. Using pre-training models to solve text classification problems has become the mainstream. However, the complexity of BERT grows quadratically with the text length, hence BERT is not suitable for processing long text. Then the researchers proposed a new pre-training model XLNet to solve the long text classification problem. But XLNet requires more GPUs and longer fine-tuning time than BERT. To the best of our knowledge, no attempt has been done before combining traditional feature selection methods with BERT for long text classification. In this paper, we use the classic feature selection methods to shorten the long text and then use the shortened text as the input of BERT. Finally, we conduct extensive experiments on the public data set and the real-world data set from China Telecom. The experimental results prove that our methods are effective for helping BERT to process long text.

    Keywords
    Text classification Long text BERT Feature selection
    Published
    2021-01-22
    Appears in
    SpringerLink
    http://dx.doi.org/10.1007/978-3-030-67537-0_34
    Copyright © 2020–2025 ICST
    EBSCOProQuestDBLPDOAJPortico
    EAI Logo

    About EAI

    • Who We Are
    • Leadership
    • Research Areas
    • Partners
    • Media Center

    Community

    • Membership
    • Conference
    • Recognition
    • Sponsor Us

    Publish with EAI

    • Publishing
    • Journals
    • Proceedings
    • Books
    • EUDL