Fake site detection using Machine Learning Algorithm and N-Gram Analysis

Asha J; Saradha S; Mohanraj A; Devipriya C

Proceedings of the 1st International Conference on Artificial Intelligence, Communication, IoT, Data Engineering and Security, IACIDS 2023, 23-25 November 2023, Lavasa, Pune, India

Research Article

Fake site detection using Machine Learning Algorithm and N-Gram Analysis

Download445 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.23-11-2023.2343171,
    author={Asha  J and Saradha  S and Mohanraj  A and Devipriya  C},
    title={ Fake site detection using Machine Learning Algorithm and N-Gram Analysis},
    proceedings={Proceedings of the 1st International Conference on Artificial Intelligence, Communication, IoT, Data Engineering and Security, IACIDS 2023, 23-25 November 2023, Lavasa, Pune, India},
    publisher={EAI},
    proceedings_a={IACIDS},
    year={2024},
    month={3},
    keywords={fake sites feature extraction n-gram analysis svm knn lr},
    doi={10.4108/eai.23-11-2023.2343171}
}

Asha J
Saradha S
Mohanraj A
Devipriya C
Year: 2024
Fake site detection using Machine Learning Algorithm and N-Gram Analysis
IACIDS
EAI
DOI: 10.4108/eai.23-11-2023.2343171

Asha J¹^,*, Saradha S², Mohanraj A², Devipriya C²

1: PSG Institute of Technology and Applied Research
2: Sri Eshwar College of Engineering

*Contact email: ashaantony2805@gmail.com

Abstract

The prevalence of fake websites increases as more people uses the internet. The identification of fake internet sites has thus been the subject of increased research in recent years.Website detection is extremely difficult because there aren't enough resources, or datasets available. This study seeks to identify fake websites by integrating two independent feature extraction techniques with Support vector machines, KNN, and logistic regression as learning algorithms. Term frequency (TF) and Term frequency-invented document frequency (TF-IDF) using N-gram analysis is also deteremined. To support the research, performance metrics are analyzed including accuracy, precision, and recall. The experiment results in a high TF feature extraction accuracy when N = 1, and a high TF-IDF feature extraction accuracy when employing the KNN algorithm (97.3%).

Keywords: fake sites, feature extraction, n-gram analysis, svm, knn, lr

Published: 2024-03-07
Publisher: EAI

: http://dx.doi.org/10.4108/eai.23-11-2023.2343171

Fake site detection using Machine Learning Algorithm and N-Gram Analysis

Abstract

About EAI

Community

Publish with EAI