Application of Sentiment Analysis in Understanding Human Emotions and Behaviour

INTRODUCTION: Presently, naturalistic observation is considered as critical in understanding, and predicting the complexities of feelings and sentiments. Emerging trends of social media has completely revolutionized the process of communication. Social media, microblogging, and other means of e-communication can be used for extracting the content to decode the quality, valence, and effectiveness of communication. OBJECTIVES: In this paper, we represent the explanatory urge of mental health assessment during a pandemic situation, especially in a smart city scenario. METHODS: We reviewed the role of sentimental analysis, as an emerging application tool for emotional and behavioural analysis for the population affected by a pandemic. This paper examines the prospect of analysing the sentiments through machine learning tools to understand, describe, and predict human behavior. CONCLUSION: Further analysis of this psychological e-content can be used to understand and predict the patterns of human sentiments.


Introduction
The Covid-19 pandemic has caused an immediate, focused demand for technological assistance in all the mediums of human communication. In a smart city, we consider most of the citizens are netizens, and institutions have a soft, virtual connection with society. Today everyone is using more of text messages, than calling or interacting personally. Face to face interaction has reduced and youth tend to express their emotions and moods through social media, which may be text, emoticons, punctuations, etc. Recently Sentiment Analysis has been considered as a powerful tool to support psychological evaluations and predict mental health conditions. The multifaceted nature of the expression of valence especially with relation to emotion thought, affect, choice, and decisiveness, has intrigued scientists, philosophers, psychologists, and recently data analysts. With social media surging as the primary mode of communication, both computer scientists and data crunchers have taken a keen interest in both subjective and objective

EAI Endorsed Transactions on Smart Cities
Research Article analysis of the information being shared in the form of written material over social platforms like Facebook, Twitter, Blogs, etc. This information has the potential to make deep inroads into the minds of the users through text analysis using both supervised and unsupervised machine learning. Since a simple text can provide multiple details about the person, it is always pragmatic to have the objective clear from the beginning. A psychologist will look for different constructs and their variances than a business analyst from the same information. The mental health professional would look into the pattern of thought, mood and affect, identify personality type, etc. whereas a business analyst would pay more attention to their shopping pattern, holidays, or payment style. Both are looking into patterns and sentiment towards their area of interest.
Quite often it is difficult for a client to express their innermost thoughts and feelings in the first few meetings with the practitioner, many times the therapist fails to elicit adequate responses leading to an incomplete or incorrect diagnosis. Any unattended mental illness has irreparable consequences that may be harmful to self, family, peers, or maybe society in general. It is difficult for youth to manage their stress and anxiety in tough times. The need for psychological services in stress management and optimal wellbeing is growing fast. Risk assessment of self-harm, suicide, and violence is difficult to accurately ascertain through self-rated questionnaires and unstructured tests like Rorschach Inkblot Test, Thematic Apperception Test or, Draw a Person. Queries may be biased towards expected responses and cultural differences. Artificial Intelligence (AI) based algorithms have the potential to support therapists in understanding valence, attitude, and decision making capacities of an individual. This paper attempts to study application of sentiment analysis in psychological assessments through machine based analytical tools.

Language and Emotions
Human language is a tool to express emotions apart from gestures, postures, and facial expressions. It transmits information in the forms of words, symbolic or acoustic representation of semantically relevant and syntactically organized in a phrase or sentence. Though language and emotions are two parallel systems, coexisting and have a relationship in which one system affects the performance of the other. Both perform the function of communication between people (Bamberg, M. 1997).
The relay of emotions in a language is culture-specific. The eastern part of the population relies heavily on the use of emotionally rich words in comparison to the people residing in the west. Emotions are both verbally and nonverbally expressed, and it is easier to comprehend the verbal messages than the non-verbal mode of communication. Language enriches human thought and accelerates the process of problem-solving and decision making.
Sometimes words such as joy, surprise, anger, agony, fear, disgust, etc. seem to convey the target message more than facial expressions, which are more culture-specific. Fridlund (1997) suggests that the universal facial expressions postulated by Paul Ekman do not provide a comprehensive account for cultural differences or deviations in different social contexts. Paul Ekman is known for his pioneering work in the study of emotions and their relation with facial expressions. He has a keen eye in detecting lies and had developed an atlas of emotions. He has conducted crucial research on the universality of expression of basic emotions in humans. (Ekman P. 1992). Paul Ekman (1992) describes emotions as "Emotions are viewed as having evolved through adaptive value in dealing with fundamental life-tasks. Each emotion has unique features: signal, physiology, and antecedent events. Each emotion also has characteristics in common with other emotions: rapid onset, short duration, unbidden occurrence, automatic appraisal, and coherence among responses. These shared and unique characteristics are the product of our evolution and distinguish emotions from other affective phenomena."

Emotional Disorders
Mental health professionals have always attempted to understand and control deviant and undesirable behavior, which is harmful to self or others and contrary to cultural beliefs and ethos. An observation is made and recorded for any unusual patterns of behavior that also include discrepancies in thinking and emotional disturbances. Most of the time, clients present with two kinds of emotional responses in a clinical setting, either they are disruptive, non-adjusting, or low activity, succumbing to stress (Table  1). Each individual has a specific coping mechanism in response to internal and external stressors. Lack of family support, peer pressure, the pressure created by self or organizations, the need for overachievement, perfectionism, controlling attitude, and lack of self-regulations create imbalance or disharmony in life. These lead to emotional imbalance, and sometimes it is difficult to note them accurately and on time. Timing is crucial for a successful prognosis in any ailment, whether it is mental or physical.

Obsessive-Compulsive Disorders
Recurrent thoughts in the form of image, idea or impulse, extremely distressing, lack of self-control, high level of anxiety, restlessness, suicidal thoughts and depressed mood, etc.

Trauma-and Stressor-Related Disorders
Presence of stressor, low coping mechanism, lack of social support, emotional detachment, fearful dreams, loss of appetite, considerable weight loss/gain Dissociative Disorders Loss of integration between past and present, lack of awareness of self and control over bodily movements, attention seeking behaviour,

Somatic Symptom and Related Disorders
Depression, anxiety related to continuously changing physical symptoms, unexplainable body pain, loss of interest in daily activities, sexual and menstrual complaints.

Assessment of Emotions, Thought and behaviour
Psychiatric interviews are both structured and unstructured depending upon the referral settings, goals, and context of the interview. The interviewer tries to establish rapport and orients the interview in such a way as to find out the cause for change in behavior. Its goal is to collect evidence and organize them efficiently to reach a conclusion based on the multiaxial system of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5). This interview is directional and considers both inclusion and exclusion criteria. Often a semi-structured and open-ended question is asked like the history of present illness, course, precipitating and predisposing factors, family background, personal history related to childhood and adolescence, any history of substance abuse, sexual history, etc. A mental status examination aids the case history and is incremental to psychiatric case workup. It includes observation of the interviewer on general appearance and behavior, mood & affects, perception, thinking, speech & language, attention, concentration & memory, insight, and judgment (Groth-Marnat, 2009). Behavioural assessment includes neuropsychological assessment, self-reported and standardized inventories and projective or unstructured assessment. The primary goal of undertaking these assessments is to find out the reasons behind the shift from usual behavior and come to an appropriate diagnosis for the well-being of the client.
A pandemic accompanies real-life tragedy, mental and emotional health challenges, and psychological vulnerabilities are visible in all age groups. The mental health fall-out management needs a systematic approach to provide an urgent, timely support system (Meyer et al. 2014). Thus, sentiment analysis is useful during the development of a social support network system to handle pandemic situations. Sentimental analysis can emerge as an integral part of epidemiological surveillance development and implementation of a system. The result of this hybrid analysis aims to promote community spirit and participation desired in a pandemic situation. The systematic approach in data mining incorporated with sentimental analysis is already in use of inter-institutional coordination during a pandemic threat. Sentimental analysis can emerge as an integral part of epidemiological surveillance development and implementation of a system. The result of this hybrid analysis aims to promote community spirit and participation desired in a pandemic situation. The systematic approach in data mining incorporated with sentimental analysis is already in use of inter-institutional coordination during a pandemic threat.

Application of Sentiment Analysis in Psychological Assessment
Most of the available information about the client is dependent on a family member or friend. In some cases, the information is accessed directly from the client. The chances of biases are high. Sometimes gender, culture, age, and economic status also interfere with the correct diagnosis (Farbstein et al., 2010). In modern times, with the advancement of technology, we must work towards developing a strong, and empirically evidenced measures to rule out the problems inherent in the present system. Opinion Mining or Sentiment Analysis is an opportunity in the hands of clinicians to identify new variables from both subjective and objective information provided by the clients. It adds a new dimension to the assessment that is relevant in the present situation. With social media emerging as a new platform for general discourse, ignoring or avoiding such a critical source of information that may add new elements in diagnosis, is unacceptable and renders the earlier conventional mode of diagnosis as incomplete.

Sentiment Analysis
Average per day Human-Computer Interaction (HCI) time is more than human to human interaction. The prevalence of WhatsApp chat, Facebook messages, or Tweets as means of expression provides real-time and quality data that can help in analysing through Natural Language Processing (NLP). Technological advancement has broadened the application of sentiment analysis in various fields like political debates, business reviews, medical fields, etc. Sentiment analysis uses a computer-based algorithm to analyse perspectives, valence, and subjectivity of social media communication. Sentiment Analysis (SA), also known as Opinion Mining (OM) study the opinion, attitude, or emotion towards an entity that may be an individual, event, or a thing. Medhat et al. (2014) posit that though both terms Sentiment Analysis and Opinion Mining are interchangeable, the main task of SA is to identify the sentiment in a specific opinion and then categorize the inherent valence in terms of its polarity. There are three levels of taxonomies in SA: document-level, sentence-level, and aspect-level or entity-level. The document-level SA endeavours to categorize a document based on the positive, neutral, or negative emotions associated with it. The whole document is assumed to be directed towards a particular topic. Similarly, the sentence-level SA tries to identify whether the given sentence is subjective or objective. Next, it analyses the sentence to determine the polarity of the valence. Medhat et al. (2014) in their survey also discuss the opportunities text mining or SA has provided in related fields. Emotion Detection (ED) helps in identifying emotions at both the conscious and subconscious levels. Both the implicit and explicit meanings in the text or document are extracted. Transfer learning aims to evaluate data from one source and use it in another destination. Building Resources supports the development of a lexiconbased dictionary or corpora in which sentiments are organized based on their polarity. O = "g,s,h,t" "Where g is the sentiment target, s is the sentiment of the opinion about the target g, h is the opinion holder, and t is the time when an opinion is posted." Each component is important, and if any part is missing, we will miss the critical information in ascertaining the weight to the particular opinion.

Opinion Mining
Sentiments in a sentence determine the polarity of the valence (emotion) in the text associated with the target of the opinion, and in case of, multiple targets, a specific sentiment is listed for a particular target. Words such as amazing, awesome, appalling express emotions that aid in the development of algorithms to extract sentiments as well as opinion targets. B. Liu (2017) -"An emotion is a quintuple: (e, a, m, f, t)" where e is the target entity, a is the target aspect of e that is responsible for the emotion, m is the emotion type or a pair representing an emotion type and an intensity level, f is the feeler of the emotion, and t is the time when the emotion is expressed.
An excerpt from a young mother shared on Facebook-"Motherhood is the greatest thing and the hardest thing. I am not a supermom… I just do the best I can… I am still struggling… because I love u both. I am a mother first… blessed to have that role in life. The world can like me, hate me, fall apart around me…. At least I have both of u around me …I'm happy. Discovering strengths, I did not know I had and dealing with fears which I did not know existed. In a lot of ways, both of u come into my life to teach me. Life is tough….But trust me will find me by your side…in every thick & thin. I am your strength and not your weakness… I am proud of u both…Hope I make you proud too." Here the entities are the two children towards whom the emotions are targeted. The feeler of the emotion is 'I" or the mother and the difficulties faced in dealing with motherhood are the target aspect related to both children. The use of superlatives and strong words like hate, the greatest, and the hardest convey the intense emotion of the author. Emotions such as hope and happiness are related to children, and despair & tiredness are related to self.
Psychologists such as Plutchik, R., & Kellerman, H. (Eds.). (2013) have classified discrete emotions based on their intensities. This collection of high as well as low intensity emotion-laden words uses language as a vehicle to communicate emotion coherently. Non-verbal patterns of communication such as gesture, posture, and facial expressions are more culture-specific than language. Mostly they convey more information than words. Moreover, a language depends on a fixed set of syntactic rules and semantics, written into phrases or sentences. The use of quality in a language requires a basic level of expertise for ease of access to mental lexicon, which is vulnerable to decay over time.

Approaches towards implementing Sentiment Analysis (SA)
There are multiple approaches to implement Sentiment Analysis, the most popular being: • Rule based systems -There are a set of predefined rules which are used to determine the sentiment portrayed by a given text. • Machine learning based systems -These are mostly classifiers that use existing data to train a model using various Machine Learning algorithms. This is used to classify text into having binary sentiments (positive, negative) or multiple classes. Example for multi-class sentiment analysis include rating the product on a scale of 1 to 5. • A hybrid application of both rule based and machine learning systems.
Rule-Based Systems: To be able to perform meaningful analysis, the text must be pre-processed uniformly. There are a lot of aspects in a text that is intuitive to humans but not to machines. For example, all punctuations are removed along with 'meaningless' words such as conjunctions, prepositions, articles, etc. These are generally known as Sweta Saraff et al.
'stop words' and do not really contribute to the context or meaning behind the text, hence can be omitted. The remaining words are converted to lowercase, after which we use a collection of predefined words commonly referred to as a 'bag of words' that have a 'sentiment value' associated with it. We can match these words in our text with its corresponding sentiment value. The cumulative sum of the sentiment value of all the words would tell us whether the text has an overall positive or negative sentiment.  This system will have a low accuracy as it doesn't take into account the fact that words may have different meanings when combined. For example, 'happy' would have a positive sentiment value, and if the phrase 'not happy' appears in the text somewhere, it would still signify positivity, which is incorrect. We could include rules to account for such cases i.e. negate the value of a word when it is preceded by 'not'. However, it will still not account for commonly used idioms or phrases, and a lot of complex rules will be needed to attain even a satisfactory accuracy level.
Machine Learning-Based Systems: Sentiment analysis is a classic application of NLP (Natural Language Processing) and machine learning algorithms. It can be considered a classification problem where we input the given text into a classifier model, and it returns the category the text belongs to.

Text Data Representation for machine learning models
The text data must be converted into a numerical representation, to train a classifier with a Machine learning algorithm, called Word or Text Embedding. Word Embedding is a Natural Language Processing technique that maps text data to vectors of fixed dimensional size. Bag-of-words model, N-grams model, Tf-idf model, Word2vec are few language models that are used for the process of feature extraction from the text. Stop words, in natural language processing refers to those words which are too frequent to be informative. Examples of stop words are articles, pronouns, and prepositions. In English, the semantics of a few words are similar, and distinguishing those using representative models will increase overfitting. For example, "likes", "liked", "likely" have the same semantic and word stem. The words are normalized to overcome this problem. Stemming is one such normalization method used in Natural Processing Language. The most simple, effective, and commonly used ways to represent text for machine learning is the bag-of-words representation. The bag-of-words model uses the frequency of each word in the text, disregarding the grammar and word order, as a feature to train the classifier. The computation of the bag-of-words model requires three steps: Tokenization, Vocabulary building, and encoding. Tokenization is the process of splitting a given text data into smaller units called tokens. Vocabulary is built, by using all the words present in the document and numbered. For the encoding process, the frequency of each word in the vocabulary of each document is noted.
The main drawback of the bag-of-words representation is that it discards the word order. There may be cases where two different sentences with unlike meanings, have the same bag-of-words representations. For example, consider the sentences: "it's bad, not good at all" and "it's good, not bad at all". Both the above sentences have the same bag-of-words representation, even though the meanings are different. Therefore, for capturing context when using a bag-of-words representation, counts of pairs or triplets of tokens that appear next to each other are considered. Pairs of tokens are known as bi-grams, triplets of tokens are known as tri-grams, and sequences of tokens are known as n-grams. Term frequency-Inverse document frequency (tf-idf) is a method used in information retrieval and text mining. This method gives weights to words to evaluate the importance of the words.

Naive-Bayes
Multinomial Naïve-Bayes is used for the task Sentiment Analysis. To determine the probabilities of a given text belonging to various classes, by using the joint probabilities of classes and text.
The Bayes theorem is written as: P(A/B) = P(B/A) *P(A) / P(B) Where P(A/B) represents the probability of event A given that B has occurred. Hence, for our purpose, 'A/B' would be the event that the text belongs to category 'A' given that 'B' set of words occur in the text. For this approach to work well, we need a vast collection of words, sentences, and phrases in our training data. This is because if the sample text contains a unique combination of words that is not present in the training data, its probability will be zero.

Logistic Regression
This is a common method used to approximate a certain value on the Y-axis given a set of features on the X-axis. Here Y is the independent variable would be the required category that the text belongs to, and X is the dependent variable would be the set of features extracted from the text. We then plot the training data set as points on a graph. Now, various methods with varying accuracies are used to approximate the closest category that the given text can belong to.

Support Vector Machines
Similar to regression, SVMs are a non-probabilistic linear model that plot training data points on a multidimensional space. It is designed in a way that there are separate regions formed for each category. For example, we can have two features X and Y, and two categories positive and negative. We can plot the points in a two-dimensional space with X and Y as axes and tag each data point represented by an (x, y) coordinate as positive or negative. We then draw a plane best separating the positive and negative data points. Now, we extract the features from the given text and plot it accordingly. Whichever side of the separating plane this point lies on, is the category it belongs to.

Deep Learning approach
Recurrent Neural Network (RNN) is a deep learning technique that is used for sequential data such as text, speech, stock prices, etc. RNN is the most powerful of all kinds of Neural Networks because of its internal memory. Another method includes the implementation of Bidirectional RNN. In RNNs, the information cycles through a loop, and the output layer can get information from past states only. In the case of Bidirectional RNNs, the output layer can get information from past and future states simultaneously. The deep learning techniques try to simulate the behavior of a human brain using artificial neural networks to process the data. One advantage is that deep learning returns better results as the number of data increases and is more accurate than other algorithms for larger data sets.

Challenges Faced
Multiple challenges arise when trying to derive the sentiment behind words through the use of logic and algorithms because text and speech are essentially subjective at their core. Language is highly intuitive, and the human brain can interpret it clearly after years of conditioning. Hence it would be unfair to expect the same level of accuracy from a machine.
Subjectivity and context play a major part in understanding text. If the sample text is complex and has a specific context associated with it, sentiment analysis methods may not yield correct results, since they only take into consideration the logical implications of words and phrases. There are ways to incorporate the context into the algorithm, and sometimes the machine may be able to interpret the context from the text itself.
Irony and sarcasm are possibly the biggest roadblocks for sentiment analysis because there is absolutely no way for a machine to understand that a sentence means the complete opposite of what it says. Even though we can include sarcastic sentences in the training data, the machine will not be able to distinguish the original sentence from its sarcastic version as both of them have the same collection of words. For example, 'How was your experience using our product?' 'Too good to be true.' This response may either be sincere or sarcastic which can only be understood if the tone or context of the sentence is known. The words 'good' and 'true' will still be registered as positive in most of the algorithms, and hence the result might be incorrect.
Comparisons or similes may also create challenges if not properly accounted for in the algorithm used. This is because the subject of the sentence has to be identified.
For example the sentence, 'Phone A is better than Phone B' is positive for company A while being negative for company B at the same time. Hence, the mere presence of the positive word 'better' should not determine the sentiment behind the sentence as a whole.
Similarly, there are a lot of nuances that are starting to appear in the text in modern times. This can be observed more when the text is from social media posts etc. There are a lot of words and phrases social media users come up with regularly which may not have any meaning when read without context. Emojis made out of special characters also appear very frequently after sentences and may prove to be essential to their sentiment, sometimes even more than words. Most of the machine learning tools are designed to analyse English Texts only. There is a need for the development of corpora based on multiple languages for its ubiquitous use in mining psychological data. Hence, taking into consideration all the contemporary nuances making their way into the text, along with the innate subjectivity of language, we must be wary of the accuracy of sentiment analysis methods. However, it is still extremely useful for multiple use cases.
Text mining or SA is qualitative and provides a promising tool in the hands of the physician to explore different avenues in a natural setting which is otherwise Application of Sentiment Analysis in Understanding Human Emotions and Behavior implausible. Consent of the client before extracting information from social media is crucial. We must also be sceptical at the same time as not to compromise on matters of confidentiality, and all the ethical guidelines must be followed.