
Research Article
Removing Noise (Opinion Messages) for Fake News Detection in Discussion Forum Using BERT Model
@INPROCEEDINGS{10.1007/978-3-031-56580-9_5, author={Cheuk Yu Ip and Fu Kay Frankie Li and Yi Anson Lam and Siu Ming Yiu}, title={Removing Noise (Opinion Messages) for Fake News Detection in Discussion Forum Using BERT Model}, proceedings={Digital Forensics and Cyber Crime. 14th EAI International Conference, ICDF2C 2023, New York City, NY, USA, November 30, 2023, Proceedings, Part I}, proceedings_a={ICDF2C}, year={2024}, month={4}, keywords={Fact Opinion Text classification Check-worthy Fake news Misinformation Discussion forum Lihkg BERT}, doi={10.1007/978-3-031-56580-9_5} }
- Cheuk Yu Ip
Fu Kay Frankie Li
Yi Anson Lam
Siu Ming Yiu
Year: 2024
Removing Noise (Opinion Messages) for Fake News Detection in Discussion Forum Using BERT Model
ICDF2C
Springer
DOI: 10.1007/978-3-031-56580-9_5
Abstract
The exponential growth and widespread of fake news in online media have been causing unprecedented threats to the election result, public hygiene and justice. With ever-growing contents in online media, scrutinizing every single message could be extremely resource intensive, if not impracticable. However, most of the messages are opinion of the authors, not presenting a fact (whether it is fake or true), which contribute a significant portion of noise. This paper suggests a cost-effective approach to identify opinion contents (noise) in discussion forums which cannot be classified as fake or true news. By excluding opinion contents which are not check-worthy in the preprocessing step, the cost of detection could significantly be reduced, especially if voluminous contents are to be dealt with timely. This paper built up an opinion and factual statement dataset in a mixture of officially written Traditional Chinese from the most popular discussion forum in Hong Kong, namely, LIHKG, relating to local Government officials, then used the Bidirectional Encoder Representations from Transformers (BERT) model to identify opinion contents which achieve 98.7% accuracy, and generalized well in public hygiene related contents which the BERT model did not pre-train. This paper further discovered that some of the 15 most active LIHKG users creating discussion threads relating to the local Government officials might be troll accounts with underlying purposes, and assessment on their behavior and sentiments might assist in detecting misinformation.