
Research Article
Investigation of Imbalanced Sentiment Analysis in Voice Data: A Comparative Study of Machine Learning Algorithms
@ARTICLE{10.4108/eetsis.4805, author={Viraj Nishchal Shah and Deep Rahul Shah and Mayank Umesh Shetty and Deepa Krishnan and Vinayakumar Ravi and Swapnil Singh}, title={Investigation of Imbalanced Sentiment Analysis in Voice Data: A Comparative Study of Machine Learning Algorithms}, journal={EAI Endorsed Transactions on Scalable Information Systems}, volume={11}, number={6}, publisher={EAI}, journal_a={SIS}, year={2024}, month={4}, keywords={Audio Feature Extraction, Emotion Detection, Gradient Boosting, machine learning, Speech Emotion Recognition, Sentiment Analysis}, doi={10.4108/eetsis.4805} }
- Viraj Nishchal Shah
Deep Rahul Shah
Mayank Umesh Shetty
Deepa Krishnan
Vinayakumar Ravi
Swapnil Singh
Year: 2024
Investigation of Imbalanced Sentiment Analysis in Voice Data: A Comparative Study of Machine Learning Algorithms
SIS
EAI
DOI: 10.4108/eetsis.4805
Abstract
INTRODUCTION: Language serves as the primary conduit for human expression, extending its reach into various communication mediums like email and text messaging, where emoticons are frequently employed to convey nuanced emotions. In the digital landscape of long-distance communication, the detection and analysis of emotions assume paramount importance. However, this task is inherently challenging due to the subjectivity inherent in emotions, lacking a universal consensus for quantification or categorization. OBJECTIVES: This research proposes a novel speech recognition model for emotion analysis, leveraging diverse machine learning techniques along with a three-layer feature extraction approach. This research will also through light on the robustness of models on balanced and imbalanced datasets. METHODS: The proposed three-layered feature extractor uses chroma, MFCC, and Mel method, and passes these features to classifiers like K-Nearest Neighbour, Gradient Boosting, Multi-Layer Perceptron, and Random Forest. RESULTS: Among the classifiers in the framework, Multi-Layer Perceptron (MLP) emerges as the top-performing model, showcasing remarkable accuracies of 99.64%, 99.43%, and 99.31% in the Balanced TESS Dataset, Imbalanced TESS (Half) Dataset, and Imbalanced TESS (Quarter) Dataset, respectively. K-Nearest Neighbour (KNN) follows closely as the second-best classifier, surpassing MLP's accuracy only in the Imbalanced TESS (Half) Dataset at 99.52%. CONCLUSION: This research contributes valuable insights into effective emotion recognition through speech, shedding light on the nuances of classification in imbalanced datasets.
Copyright © 2024 V. N. Shah et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.