
Research Article
Automatic Detection and Classification of Anti-islamic Web Text-Contents
@INPROCEEDINGS{10.1007/978-3-031-04409-0_16, author={Rawan Abdullah Alraddadi and Moulay Ibrahim El-Khalil Ghembaza}, title={Automatic Detection and Classification of Anti-islamic Web Text-Contents}, proceedings={Machine Learning and Intelligent Communications. 6th EAI International Conference, MLICOM 2021, Virtual Event, November 2021, Proceedings}, proceedings_a={MLICOM}, year={2022}, month={5}, keywords={Web text mining Text analysis Text classification SVM Sentiment analysis Fake news Hate speech Toxicity detection}, doi={10.1007/978-3-031-04409-0_16} }
- Rawan Abdullah Alraddadi
Moulay Ibrahim El-Khalil Ghembaza
Year: 2022
Automatic Detection and Classification of Anti-islamic Web Text-Contents
MLICOM
Springer
DOI: 10.1007/978-3-031-04409-0_16
Abstract
The aim of this research is to use the sentiment analysis techniques to deal with large dataset corpus, which has been collected, to detect and classify anti-Islamic online contents. Anti-Islamic websites have spread a lot in the last decade causing a lot of hate toward the Muslims communities; there have been many websites that attack Islam and Muslims and insult the Messenger, blessings and peace be upon him. We have gathered our proper dataset from different sources into a large corpus, and we have produced two datasets (balanced and non-balanced) for the English language. The framework of our proposed methodology has been described. Two approaches are used in this framework, the first one is based on supervised Machine Learning (ML) approach using Support Vector Machines (SVM) model as classifier and Term Frequency-Inverse Document Frequency (TF-IDF) as feature extraction; the second one is a hybrid approach combining lexicon-based dictionary and TF-IDF as feature extraction with SVM algorithm. We conducted different experiments and we compared the obtained results. We first use TF-IDF on word level, and then we have improved the model using tri-gram level. The experimental results show that the ML approach is the best approach for both datasets that produces high accuracy of 97% applied on the non-balanced English dataset using SVM with tri-gram level TF-IDF as feature extraction. Additionally, SVM with word-level TF-IDF also provides excellent results regardless of the type of dataset.