About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Machine Learning and Intelligent Communications. 6th EAI International Conference, MLICOM 2021, Virtual Event, November 2021, Proceedings

Research Article

Automatic Detection and Classification of Anti-islamic Web Text-Contents

Download(Requires a free EAI acccount)
2 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-04409-0_16,
        author={Rawan Abdullah Alraddadi and Moulay Ibrahim El-Khalil Ghembaza},
        title={Automatic Detection and Classification of Anti-islamic Web Text-Contents},
        proceedings={Machine Learning and Intelligent Communications. 6th EAI International Conference, MLICOM 2021, Virtual Event, November 2021, Proceedings},
        proceedings_a={MLICOM},
        year={2022},
        month={5},
        keywords={Web text mining Text analysis Text classification SVM Sentiment analysis Fake news Hate speech Toxicity detection},
        doi={10.1007/978-3-031-04409-0_16}
    }
    
  • Rawan Abdullah Alraddadi
    Moulay Ibrahim El-Khalil Ghembaza
    Year: 2022
    Automatic Detection and Classification of Anti-islamic Web Text-Contents
    MLICOM
    Springer
    DOI: 10.1007/978-3-031-04409-0_16
Rawan Abdullah Alraddadi1,*, Moulay Ibrahim El-Khalil Ghembaza2
  • 1: Department of Computer Science, College of Computer Science and Engineering
  • 2: Department of Computer Science, College of Engineering and Information Technology
*Contact email: rawanalradadi3@gmail.com

Abstract

The aim of this research is to use the sentiment analysis techniques to deal with large dataset corpus, which has been collected, to detect and classify anti-Islamic online contents. Anti-Islamic websites have spread a lot in the last decade causing a lot of hate toward the Muslims communities; there have been many websites that attack Islam and Muslims and insult the Messenger, blessings and peace be upon him. We have gathered our proper dataset from different sources into a large corpus, and we have produced two datasets (balanced and non-balanced) for the English language. The framework of our proposed methodology has been described. Two approaches are used in this framework, the first one is based on supervised Machine Learning (ML) approach using Support Vector Machines (SVM) model as classifier and Term Frequency-Inverse Document Frequency (TF-IDF) as feature extraction; the second one is a hybrid approach combining lexicon-based dictionary and TF-IDF as feature extraction with SVM algorithm. We conducted different experiments and we compared the obtained results. We first use TF-IDF on word level, and then we have improved the model using tri-gram level. The experimental results show that the ML approach is the best approach for both datasets that produces high accuracy of 97% applied on the non-balanced English dataset using SVM with tri-gram level TF-IDF as feature extraction. Additionally, SVM with word-level TF-IDF also provides excellent results regardless of the type of dataset.

Keywords
Web text mining Text analysis Text classification SVM Sentiment analysis Fake news Hate speech Toxicity detection
Published
2022-05-18
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-04409-0_16
Copyright © 2021–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL