Implementation of Human Cognitive Bias on Naïve Bayes

Hidetaka Taniguchi Tokyo Denki University School of Science and Engineering Hatoyama, Hiki, Saitama 350-0394, Japan +81-49-296-5416 htdendai@gmail.com Tomohiro Shirakawa National Defense Academy of Japan Department of Computer Science 1-10-20 Hashirimizu Yokosuka, Kanagawa 239-8686, Japan +81-46-841-3810 sirakawa@nda.ac.jp Tatsuji Takahashi Tokyo Denki University School of Science and Engineering Hatoyama, Hiki, Saitama 350-0394, Japan +81-49-296-5416 tatsujit@mail.dendai.ac.jp


INTRODUCTION
Naïve Bayes classifier is one of the most successful machinelearning methods that is widely used for spam-detecting tasks and its conditional assumption is suitable for text data mining.This "naive" assumption sets the conditions that all features are independent given the class and each distribution is estimated as a one-dimensional distribution [1].And thus, the parameters for each attribute will be learned separately and this greatly simplifies the learning.Therefore Naïve Bayes algorithm is frequently used for the classification with a feature vector of high dimensionality due to its independence assumption simplifies the algorithm especially when the number of attributes is large [2].Although the independence of attributes is unrealistic, Naïve Bayes classifier shows the superior performance in the text-classification.However, Bayesian classifier will not be optimal when attribute independence does not hold [3].In such a situation, the assumption of Naïve Bayes is likely to be violated by missing data or uncertainty in feature selection [4,5] and thus the prediction accuracy would be decreased.This problem is triggered by many kinds of factors (e.g. the number of sample data is too small, or data is too much biased to apply the assumption) and difficult to detect the cause of problem among them.
Meanwhile, some studies [6,7,8,9] have indicated that the Human-Cognitively inspired bias is able to enhance the prediction accuracy of machine learning algorithms.This Human-Cognitively inspired model called "Loosely Symmetric (LS) model" introduced by Shinohara et.al.[7] was designed to flexibly adjust the two biases of  and  and exhibits an optimal property breaking the usual trade-off between speed and accuracy.Oyo et.al.[8] reported that LS model is able to describe the human evaluation of co-occurrence information for inductive inference of causal relationships, and developed an excellent heuristics for evaluating options for two-armed bandit problems and Naïve Bayes [9].Furthermore, we thus assume that the LS model can smoothly adjust biases between classes and factors more than the conventional Naïve Bayes model.In this paper, we propose two kinds of human-cognition inspired classification model named Loosely Symmetric Naïve Bayes (LSNB) model and its variant incorporated with stronger bias named enhanced Loosely Symmetric Naïve Bayes (eLSNB) model.

Naïve Bayes Text Classifier
Naïve Bayes is a classification method based on Bayes' theorem.

𝑃(𝑐
Where () can be ignored and regarded as a constant because it takes same value for all categories and does not affect the relative values of their probabilities [3].
Naïve Bayes requires an assumption that every feature in texts is conditionally independent [3].But this assumption is clearly incorrect because some words are likely to co-occur at the same time (e.g. the word "Roulette" is likely to co-occur with "Casino") [10].However, this "Naïve" assumption enables the improvement of processing-speed, simplification of the algorithm and reliable performance.For the spam-classifying tasks, the -dimensional word vector  = 〈 1 ,

Human-Cognitively Inspired NB-Model 3.1 Loosely Symmetric Model
Previous researches [6,7,8,9] have shown the capability of implementing human-cognition inspired model for machinelearning tasks.The well-used model called LS model flexibly adjusts the two biases ( and  ) and has correlation to human-cognition [7].The LS model shows the superior performance on machine learning tasks including twoarmed bandit problems [8] and Naïve Bayes [9].It is known that human has illogical symmetric cognitive biases that induces from a proposition "if  then " its converse "if  then " and inverse "if ̅ then not  ̅".The LS model quantitatively represents these tendencies [6].Takahashi et.al.[7] suggested that LS formula can be applied to every area that involves the use of conditional probability.In Table 2, the cells , ,  and  represent each cooccurrence of , ,that is, probabilities of co-occurrence; , ,  ̅, ̅ .LS model describes the relationship between  and  as in (4).Therefore, LS model estimates each distribution as a onedimensional distribution from a set of  -dimensional feature vectors.We adopted a probabilistic model using the LS model to enhance the prediction accuracy and applied the flexibility to spam-classifier with cognitive features.

Enhanced LSNB Model
We developed a new classification model named enhanced LSNB (eLSNB) model that derived from LSNB model.The eLSNB model has greater symmetric biases than LSNB and is formalize as in ( 12) to (19).(  ∩   ) is the frequency of a word, or the number of "counts" that indicates the number of appearance of   in   .As in (12), the word density of   in   is represented by (  ∩   ) [11].The purpose of this modification is to enhance the probability of each word on the feature vector that co-occurred in   .For example, if   was only observed in  class much more frequently than ℎ class,   should be considered a  related word, and vice-versa.And thus, eLSNB model is designed to maintain stronger biases for the binary classification.Each cooccurrence is strongly biased by (  ∩   ) as in ( 13)-( 16), and the posterior probability calculated by LSNB is as in ( 17)-(19).

Experimental Settings 4.1 Benchmark Corpora
We tested our LSNB models and Naïve Bayes model using six email corpuses, Ling-Spam, SpamAssassin, PU1, PU2, PU3 and PUA.Table 4 shows the number of spam and ham messages and the spam ratio of each corpus.The Ling-Spam [17,18]

Class Prior Probability
The prior probability is typically estimated by dividing the number of training examples of category   by the total number of training examples [12,13].However, since we partly used the limited numbers of training examples for the experiment, the prior probability hardly affects the classification and assuming uniform priors can improve the classification accuracy [14].Therefore, the prior probability for the binary classification set to be equivalent: () = 0.5, (ℎ) = 0.5.

Data Preparation
For text classification, feature selection is a necessary step due to the high dimensionality of feature vector.First, we removed punctuation and words that occurred only once, and that are in a standard stop word list [15] from the feature vector.Also we removed numbers from the feature vector except for PU corpus that expressed as integers.We only use "White Space" and "Line Break" as separators between the words.For the treatment of missing values, we adopt a simple method, replacing by a default value as "missing".This is really simple, however, Robert [16] reported that the model handling missing values by treating "missing" as a legitimate value showed better results than the models with more difficult rules.

Results and Discussion
We tested NB, LSNB and eLSNB models using 6 corpuses.For the SpamAssassin test, each model classified the entire email of easy_ham2 and spam2 directories.For the tests using Ling-Spam and PU, we combined all directories for the experiment and used randomly chosen data as sample, and classified the rest of data.LSNB and eLSNB showed better performances in  classification than NB in almost all the experiments.We suppose this is because LSNB and eLSNB classifiers can refer each word   from both categories and this implementation yields more biases between words and categories than the NB classifier.Such an effect is enhanced in eLSNB, and this seems to increase the performance of eLSNB in  classification compared to LSNB.Also the F-measure scores of LSNB and eLSNB showed better learning efficiency than the NB classifier.This fact indicates that LSNB and eLSNB can learn more effectively from a small number of sample data compared to the NB classifier.
Meanwhile, LSNB and eLSNB did not significantly improve the prediction accuracy of ℎ classifications on every corpus except SpamAssassin, though LSNB has slightly improved performance than the other two models.We suppose this is because ℎ documents did not contain "trigger words" like  documents, and LSNB and eLSNB could not effectively adjust biases between categories and documents.It is particularly prominent in the results from eLSNB that is supposed to mistakenly adjust biases of harmless words from  category, and thus the prediction accuracy did not improve significantly.However, LSNB and eLSNB models substantially improved average of the classification accuracy and F-measure on every corpus and the results indicate that the implementation of human cognitive bias contributes to the enhancement of the prediction.

Conclusion
We have introduced a modified Naïve Bayes model by implementing the human-like causal inference, and our model showed its effectiveness in text classification.The main purpose of this study was to extensively test the performance of LSNB in our previous study and improve the prediction accuracy of LSNB model by some modifications.As a result, our new model named eLSNB performed the best score in Ling-Spam, Spam Assassin and PUA classifications.Both of LSNB and eLSNB showed better score than NB, and between them, eLSNB was better in spam classification and LSNB was better in ham classification.
In future work, we will elucidate the reason why our model did not improve the prediction accuracy in ℎ classification and detect the composition difference between  and ℎ documents.Also, we will try to minimize the number of sample data since LSNB and eLSNB are supposed to be learned effectively from a small number of sample data as compared to NB classifier [9] and measure execution time and resource consumption.In order to improve the prediction accuracy of both categories, we will modify and improve our models to adapt to any training corpus.

Table 3 .
Table 3 shows the co-occurrence table of LSNB.Where (  |  ) is the co-occurrence of class   and word   in a document and (  |¬  ) is the co-occurrence of the counterpart class of   and   .A 2 × 2 co-occurrence table of LSNB For example, if   is , (  |  ) is a word co-occurrence of , and (  |¬  ) is the word co-occurrence of ℎ, and (¬  |  ) and (¬  |¬  ) are the probability that   was not observed in   or ¬  .Each co-occurrence  - is set as in (

Table 4 .
[18]us consists of 2412 of non-spam messages (ham) and 481 of spam messages from Linguist list.We used lemm version of Ling-Spam corpus for the experiment.SpamAssassin [19] corpus consists of 3900 of non-spam messages and 1897 of spam messages.These messages are divided into 4 directories; 2442 of non-spam messages for easy_ham directory, 493 of spam-messages for spam directory, 1401 messages for easy_ham2 directory and 1397 spam messages for spam2 directory.We used easy_ham and spam directories as sample data and easy_ham2 and spam2 directories for test data, and the spam ratio of sample data was 26%.PU corpus[18]consists of non-spam and spam messages that have been tokenized due to the security policy, and each word is expressed as numbers.Corpuses used in the experiments The number of training data is given by  = 4 +  * 8 , ℎ = 20 +  * 40 , where [|0 ≤  ≤ 17] and to simplify the experiments, the spam ratio of sample data was always 17%.The scores displayed in Figures 1-24 indicate the average of 30 results.The classification results from 6 corpuses are shown in Figures 1-18.They indicate the accuracy of the  classification, ℎ classification and the average of them by NB, LSNB and eLSNB with each database.Figure 19-24 indicate the values of F-measure in each test.