Efficient Framework for Sentiment Classification Using Apriori Based Feature Reduction

This paper proposes a novel feature selection method for Sentiment Classification. UCI ML Dataset is selected having a textual review from three domains (IMDB Movie, AMAZON Product, and YELP restaurant). Text pre-processing and feature selection technique is applied to the dataset. A Novel Feature Selection approach using Association Rule Mining is presented in which Sentence is converted in binary form and Apriori Algorithm is applied to reduce the dataset. Four Machine Learning algorithms: Naïve Bayes, Support Vector Machine, Random Forest & Logistic Regression to implement experiment. The proposed approach shows an accuracy improvement of 4.2%, 4.9% & 5.9% for IMDB, Amazon & Yelp domain datasets, respectively. Compared with the Genetic Algorithm, Principal Component Analysis, Chi-Square, and Relief based feature selection, the proposed method shows an accuracy improvement of 9.8%, 0.4%, 0.6% & 1.9%, respectively.


Introduction
The sentiment is considered as a viewpoint about a topic or a thing. It plays an essential role in the decision-making process of an individual. People tend to evaluate the world with their experiences, beliefs, and choices. Considering this, we can say that sentiment has a very high place in every aspect of human life. Sentiment Analysis or Opinion Mining is the technique of evaluating text documents and extracting emotions from them. Sentiments are generally classified as either positive, negative, or neutral [1,24]. Recent years have seen a surge in several research works focusing on Sentiment Analysis. Many researchers and organizations are using the concept in a vast number of fields, such as Movie Review and Product Review [2,3]. Analyzing a small number of reviews can be carried out manually, but the manual analysis is not possible when there are thousands of reviews.
Machine Learning (ML) has been extensively used to perform the task of sentiment classification. The ML model is trained on the well-labeled training dataset and then applied on a testing dataset to predict the sentiment. The authors are using various approaches in recent years. The most common of them is using Lexicon Analysis that lists words by their semantic values. Different Machine Learning Classification algorithms are also used to predict the opinion of a sentence. In recent years, researchers have started developing Hybrid models using the Lexicon technique and ML algorithms to solve exciting research problems in the field of Sentiment Analysis [4,5].
Sentiment Classification is a process of classifying a sentence into the sentence's polarity, and ML approaches prove very useful in improving the accuracy of the prediction model. There are three levels of Sentiment Classification: Documentlevel, Sentence-level and Feature-level. In this paper, we 2 iii.
Four different supervised ML algorithms, namely, Naïve Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), and Random Forest (RF) were used to compare proposed approach with existing feature selection techniques. iv.
Evaluation metrics like Accuracy, Precision, Recall, F-measure and Area Under ROC Curve (AUC) were used to compare the efficiency of the proposed feature selection approach.
The rest of the paper is organized as follows: Section 2 presents the related work in Sentiment Analysis. Section 3 describes the proposed methodology and the framework of Apriori based feature selection. Section 4 provides details about the experimental work. Section 5 discusses the results obtained. Finally, Section 6 presents future work and concludes the paper.

Related Work
Recently Social Media is being used to analyze data for more efficient decision making, and researchers are working on every aspect to improve the knowledge process [7]. There is much data available on Product Review Sites, Forums, Blogs, and Social Networking Websites like Facebook and Twitter. Manual analysis of this data is not possible due to its enormous size and volume. Therefore, Sentiment Classification is used to find the sentiment of the data using supervised ML algorithms. This section presents an overview of past research works based on Sentiment Classification using supervised ML and Feature Selection techniques [8,9].
Medhat et al. [10] carried out a comprehensive survey on new classification algorithms, applications, and improvements in the area of Sentiment Analysis. The author also sheds light on Emotion Detection, Transfer Learning, and Resource Building. In [11], the authors introduced a novel rule-based Sentiment classification of sentences from blog comments and reviews. SentiWordNet is used to obtain the polarity score, and their research work shows the effectiveness of the proposed method compared to ML-based methods. Our approach focuses on both Sentiment Score and ML Classifier algorithms. We are calculating the sentiment score using Vader API and applying four different supervised classifiers to predict the polarity of sentences.
In [12], Agarwal et al. use Twitter as the data source for sentiment analysis. The authors introduced the Part of Speech (POS) feature to carry out classification task. They used new features and tree kernel and showed that their approach works better than baseline techniques. In [13], the authors used three Machine Learning Algorithms, namely: Naïve Bayes Multinomial (NBM), Maximum Entropy (ME) and SVM for classification of data into respective polarity. They employed Unigram, Bigram and Hybrid N-Gram Features approach, and the results shows that NBM performed best according to their experiments. Dave et al. in [14] carried out sentiment classification work on CNET and Amazon reviews. They also used the N-Gram approach, but only the bigram and trigram feature set are considered for final evaluation. SVM and NB classifiers are used for training and testing the model.
One of the datasets used by us in this study is from the IMDB Movie Review, and this dataset is one of the most popular datasets used by researchers for the Sentiment Classification task. In [15], the authors also used IMDB Dataset and incorporated WordNet lexicon resource to extract opinion from review. Various ML Classifiers such as SVM, NB and Alternating Decision Tree are used to classify the dataset with more than 75% accuracy. In [16], Zhang et al. worked on Chinese Reviews of Clothing product using the word2vec approach. The authors proposed a classification approach using Semantic Approach and SVM. They used SVMpref, which is an alternative structural formulation of SVM used for Binary Classification. Semantic features of the reviews are extracted with the help of word2vec. The proposed approach shows better results in the sentiment classification task.
The authors in [17] proposed a sentiment classification approach using Fuzzy. The dataset used for the experiment is a movie review, and the results show considerable improvement in the accuracy with SVM. Before applying the supervised Machine Learning algorithm, data was preprocessed using POS Tagging, Term Frequency Inverse Document Frequency (TF-IDF), Stop Words Removal, Tokenization to obtain the final N-Gram feature set [18]. In [19], the authors used the concept of Ensemble Framework to carry out Sentiment Analysis. Three base classifiers used in work are NB, ME and SVM with various ensemble options: fixed combination, weighted combination and meta-classifier combination. The Highest accuracy of 88.65% was achieved on Kitchen Dataset. The authors in [20,21] also proposed using the Ensemble technique where SVM is chosen as a base classifier and other methods used are Boosting, Bagging, Random Subspace and Bagging Random Subspaces. The best results are obtained in ensemble techniques using random subspace and bagging subspace.
Earlier used techniques for feature selection are Chi-Square, Correlation, Information Gain, Relief F, etc. The authors in [22] used the mentioned feature selection techniques to select a subset based on the average weight approach. Sentiment Classification is then carried out using SVM and BN on the Arabic Review dataset. Feature Combination of various selection techniques is also a new trending research area in Sentiment Classification. The authors in [23] combine the feature selected from Chi-Square (CHI2), Information Gain (IG), Optimal orthogonal centroid (OCFS) and Document frequency difference (DFD) and implemented four Classifiers to carry out Sentiment Classification on English and Turkish review dataset. Agarwal et al. [4] proposed a novel Hybrid Merging method using Rough Set Theory and Information Gain. The proposed model was evaluated on different domain datasets using SVM and NB Supervised Machine Learning Classifiers.
From the above-related work, we found out that most of the work is carried out using predefined feature selection techniques such as Information Gain, Chi-square, PCA and more [26]. In this paper, we are proposing a new feature selection technique that deals with the association between words present in the sentence with the sentence's polarity. 3 Algorithm) in the sentiment classification field to the best of our knowledge. In this paper, a novel feature selection model using Apriori Algorithm is proposed in which the Support and Confidence value of the rule is considered to prune the word features.

Proposed Feature Selection
The proposed Sentiment Classification approach is summarized as follows: i. Feature Extraction and Selection: First, the feature vector is converted into a binary form to apply the proposed feature selection technique. The feature value will be 1 if the word is present in the sentence and 0 if not present. The output label (Neg or Pos) is also considered as the feature vector. Next, the proposed feature selection approach based on the Apriori algorithm is applied to select a reduced feature set. iv.
Classification: Finally, we train the supervised Machine Learning classifiers SVM [27], RF [28], LR [29], NB [30] with both reduced feature dataset and original feature dataset on reviews from different domains.

Methodology
In this paper, we are focusing on using the word frequency as items so that the Apriori algorithm [31] can be applied to generate Association rules between the words and sentence's polarity. This process is carried out in 4 steps: Data Collection, Data Preprocessing, Feature Selection using the proposed approach and classification using supervised ML algorithms. A generalized scheme of the proposed work is shown in Fig.1.
The proposed approach works on Association Rule Mining, in which the Apriori algorithm is applied to words. For the Generation of Rules, data needs to be present in binary format. Therefore, each of the three datasets is tokenized using the relation between Word Wi and Review Text T. The association is shown in equation 1 and snapshot of the dataset in suitable form is shown in Fig 2. ( Apriori algorithm is then applied on the datasets and different support confidence values are used to generate the rules. In this work, support = 0.02 and confidence = 0.01 were used to generate the rules. These support and confidence values provide the optimum features. The formulae used to calculate support and confidence are shown in equation 2,3, and 4.
where, Wi = tokenized words present in review; N is the total number of reviews and NWi is the total number of transactions containing word Wi.   Criteria for consideration of rule for feature selection is simple. If the Consequent part of the rule contains either 'pos' or 'neg' sentiment class, then the rule is considered else it is discarded. Table [2,3,4] shows some of the proposed approach's rules whose algorithm is shown in Algorithm 1.

Experiment Setup
The three datasets used in this work are processed in two different phases. In the first phase, data is preprocessed by tokenizing, stop words removal, stemming and extracting sentiment score using Vader API [32]. Once the data is cleaned, the TF-IDF value is calculated for each feature. In the second phase, the generation of Association Rules using the Apriori Algorithm is performed. All the rules not containing either 'neg' or 'pos' in the consequent part are discarded. The experiments are carried out on three different datasets from the UCI ML repository using tenfold (k=10) cross validation. The dataset is partitioned into two sets, where 9 folds (k-1) are used for training the model and 1-fold is used for testing.

Evaluation Parameters
In this paper, the performance of supervised ML algorithms is evaluated using the Confusion Matrix. There are four entities in the Confusion Matrix: True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FP). The Formula for each entity is shown in Table 5. The results are compared with the following evaluation metrics that are based on values of Confusion Matrix entities.

Results and Discussions
The experiment conducted in this study helps select features with a novel technique based on the Apriori algorithm. Experiments were conducted using three different datasets: IMDB Movie Review, Restaurant and Product Reviews, taken from the UCI ML repository. This section gives an in-depth analysis of results obtained by the proposed feature selection approach using four supervised classifiers: SVM, RF, NB and LR. First, the data is preprocessed to make it ready for classifiers. To preprocess the data, various operations are applied, such as Tokenizing the sentence into unigram, removing stop words, and stemming using Lovins Stemmer and representing feature vector as TF-IDF. The proposed feature selection approach is then applied to select the best features based on support and confidence values of the association rule. Finally, the classifiers SVM, RF, NB and LR are used on the original feature set (without feature selection) and on the reduced feature set (using proposed feature selection). Table [6,7,8] shows the performance of ML classifiers with original features and reduced features (with the proposed scheme) on different domain datasets. Results show that using the proposed feature selection approach improves all the evaluation parameters for Sentiment Classification. To observe the impact of the proposed feature selection technique, ROC Curves are plotted and shown in Figure 3,4,5 for IMDB, AMAZON, and YELP Dataset respectively. To understand each supervised classifier's effect, each graph contains 4 ROC plots of the respective ML algorithm: SVM, NB, RF and LR. Figure 6,7 and 8 show the accuracy comparison of ML Techniques on IMDB, Amazon and Yelp Reviews respectively. From the graphs we can observe that accuracy of Apriori based reduced feature set results in improved accuracy in all classifiers. For IMDB Movie Review and Amazon Product Review datasets, NB classifier shows maximum accuracy with a value of 78.4% and 81.08%. For Yelp Movie Review dataset, RF performs best with an Accuracy value of 77.6%. Apart from Accuracy, a detailed comparison of four ML classifiers was performed using Apriori based reduced feature set. Figure 9 shows Precision, Recall, and F-Measure scores for various classifiers.    For IMDB Movie Review dataset, it was observed from Table 6 and Fig. 9, that SVM has the maximum Precision score of 0.8193 followed by NB having a score of 0.806. From the Accuracy curve, it was found that NB shows maximum Accuracy with 78.4% which is aligned with the results obtained for Precision. For Recall, it was observed that RF outperform all other classifiers with a score of 0.88 followed by NB and LR with a score of 0.748. For F-measure, RF scored the maximum with a value of 0.7871 which is closely followed by all other classifiers.
For the Amazon dataset, it was observed from Table 7 and Fig 9, that SVM has the highest Precision value of 0.8791, closely followed by other three classifiers. For Recall and Fmeasure, RF shows the best results with values 0.8097 and 0.8085 respectively. Among the other three classifiers SVM shows lowest value of 0.6628 and 0.7537 for Recall and Fmeasure respectively.
Lastly, for the Yelp dataset, it was observed from Table 8 and    Table 9,10,11 for IMDB Movie Review dataset, Amazon Product Review dataset and Yelp Restaurant Review dataset respectively. It was observed that maximum reduction in error rate for all datasets considered in this paper is found for LR with a value of 6.5%, 15.01% and 16.4% for IMDB, Amazon and Yelp datasets respectively.

Comparison with Existing Approach
This section demonstrates that the proposed Apriori-based feature selection approach is more efficient than the Genetic Algorithm(GA) based approach used in [25]. In [25] the authors used the UCI ML datasets belonging to the same domains as used in this paper. The authors proposed GA based feature reduction (GA-FS) technique. Figure 10,11 and 12 shows the Accuracy comparison between the FS approach proposed in this paper and GA-FS.

Comparison of Proposed Feature Selection with Other Feature Selection techniques
In the final part of our evaluation, we demonstrate that our Apriori based feature selection techniques perform better than other feature selection techniques such as PCA, Chi-Square and Relief. Figure 14 shows the accuracy graph of all four feature selection techniques on three datasets i.e. IMDB Movie Review, Amazon Product Review and Yelp Restaurant Review. Naïve Bayes classifier is used to compare our proposed feature selection with other feature selection techniques. As we can see that ARM based feature reduction technique has better accuracy in all three datasets. For IMDB Movie review dataset we achieve same accuracy score of 78.4% for both ARM based and Chi Square feature selection. For Amazon and Yelp Dataset our proposed approach is giving maximum accuracy score of 81.08% and 77.1% respectively.

Conclusion and Future Scope
This paper proposed a novel feature selection approach based on the Apriori algorithm for performing Sentiment Classification. We employed four Different Classifiers i.e., SVM, NB, LR and RF to compare our proposed approach on dataset with proposed feature selection and without feature selection. Detailed analysis of results shows that Naïve Bayes classifier achieved maximum accuracy in 78.4% and 81.08% for IMDB and Amazon datasets respectively. While in case of Yelp dataset, Random Forest classifier outperforms other classifiers, achieving an Accuracy score of 77.6%. Further, the proposed approach's results were compared with [25] using six classifiers (J48, NB, PART, SMO, IB-K, and JRiP). The proposed approach manages to outperform the GA based approach by an average of 9.78% increase in Accuracy for all the three datasets. Detailed analysis of results shows that JRiP classifier achieves maximum Accuracy increase of 25.26%, 16.89%, and 15.05% for IMDB, Amazon and Yelp dataset respectively. The proposed approach is also compared with existing feature selection techniques: PCA, CHI2 and RF. The results show 0.41%, 0.557% and 1.87% average accuracy increase than PCA, CHI2 and RF feature selection respectively. The results achieved during this study strengthen the claim that the proposed feature selection technique gives better accuracy than existing feature selection techniques and also helps in reducing dataset by considerable size.
This work provides several interesting future directions. The proposed approach can be used to generate n-gram (Bigram, Trigram….) feature set using unigram dataset. This will help in reducing feature preprocessing time by considerable amount. Then we will incorporate other features also such as Part of Speech(POS), Negation handling to build fused dataset.