About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Science and Technologies for Smart Cities. 6th EAI International Conference, SmartCity360°, Virtual Event, December 2-4, 2020, Proceedings

Research Article

Labeling News Article’s Subject Using Uncertainty Based Active Learning

Download(Requires a free EAI acccount)
2 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-030-76063-2_15,
        author={Meet Parekh and Yash Patel},
        title={Labeling News Article’s Subject Using Uncertainty Based Active Learning},
        proceedings={Science and Technologies for Smart Cities. 6th EAI International Conference, SmartCity360°, Virtual Event, December 2-4, 2020, Proceedings},
        proceedings_a={SMARTCITY},
        year={2021},
        month={5},
        keywords={Active learning Natural language processing Uncertainty sampling Na\~{n}ve bayes SVM Labeling},
        doi={10.1007/978-3-030-76063-2_15}
    }
    
  • Meet Parekh
    Yash Patel
    Year: 2021
    Labeling News Article’s Subject Using Uncertainty Based Active Learning
    SMARTCITY
    Springer
    DOI: 10.1007/978-3-030-76063-2_15
Meet Parekh, Yash Patel

    Abstract

    In Natural Language Processing, labeling a text corpus is often an expensive task that requires a lot of human efforts and cost. Whereas unlabeled text corpora in varying domains are readily available. For a couple of decades, research efforts have concentrated on algorithms that can be used for labeling the corpus, thus minimizing the number of articles required to be labeled manually. Semi-Supervised Learning and Active Learning have been a great promise for labeling the articles using a trained model. Also, Semi-Supervised learning algorithms and Active learning algorithms have strong theoretical guarantees. This study aims to tag 1183 articles from The New York Times and The Wall Street Journal with the subject (i.e. primary organization related to news articles) employing Active Learning algorithm. We used Active Learning algorithm which uses Random Sampling along with Uncertainty Based Querying. This Active Learning approach is used to train Naïve Bayes classifier using Bag of Words features. This classifier is used to tag 1183 articles of which only 167 required manual review, thus achieving reduction of 85.89% with 78.18% accuracy. Also, for verifying quality of labeled corpus, SVM classifier using same features was trained on labeled corpus giving accuracy of 74.45% on test data.

    Keywords
    Active learning Natural language processing Uncertainty sampling Naïve bayes SVM Labeling
    Published
    2021-05-22
    Appears in
    SpringerLink
    http://dx.doi.org/10.1007/978-3-030-76063-2_15
    Copyright © 2020–2025 ICST
    EBSCOProQuestDBLPDOAJPortico
    EAI Logo

    About EAI

    • Who We Are
    • Leadership
    • Research Areas
    • Partners
    • Media Center

    Community

    • Membership
    • Conference
    • Recognition
    • Sponsor Us

    Publish with EAI

    • Publishing
    • Journals
    • Proceedings
    • Books
    • EUDL