An interactive VR platform with emotion recognition for self-attachment intervention

INTRODUCTION: Self-attachment is a new self-administrable psychotherapeutic intervention based on creating an a ﬀ ectional bond between the user and their childhood-self using their childhood photos to develop the capacity for a ﬀ ect self-regulation. Technological advances, such as virtual reality (VR), can enhance the procedure of this intervention and make it scalable. METHODS: We have developed a user-friendly, interactive VR platform for self-attachment featuring a virtual assistant and a customised child avatar that resembles the user in their childhood. The virtual agent interacts with the user and using an emotion recognition algorithm can provide suggestions for the user to undertake an appropriate self-attachment sub-protocol. Furthermore, the platform allows user interaction with the child avatar, such as embracing the avatar. RESULTS: We show by a small preliminary trial that such a VR experience can be realistic, leading to a positive emotion change in the user.


Introduction
According to 'The Lancet Commission on global mental health and sustainable development' [1], the burden of disease attributable to mental disorders has increased over the past two decades around the world.In addition, social disadvantage and poor mental health are highly correlated, where the latter leads to poverty and violence forming a disturbing cycle that carries on from generation to generation.It is in fact common for people with serious mental disorders not to receive regular treatment or to abandon their treatment before it is completed, which means that the vicious cycle is seldom broken.

Attachment theory and mental disorders
Different factors contribute to the risk of developing mental disorders; these are often rooted in the individuals' personal experiences.Pioneered by John Bolwby in 1960's and 1970's [2], attachment theory investigates the long term and life impact of the quality of the early relationships that infants develop with their primary care-givers.In recent decades, attachment theory has emerged as a main scientific paradigm in developmental psychology [3].According to its basic tenets, the child-parent dyadic relationship can either lead to a secure attachment of the child or to one of the three forms of insecure attachments: avoidant, ambivalent or disorganised.The strange situation experiment [4], developed by Mary Ainsworth and many times conducted in different cultures, finds a direct link between the way a parent responds to the distress signals of a toddler and the type of attachment the child develops.While secure attachment is associated with the quick and appropriate response of the parent in distressing situations, insecure attachment is associated with parental responses that are either rejecting, inconsistent or frightful for the child.
In the past decades, there have been a large number of studies on the longitudinal impact of early attachment types on the life of individuals.The main findings of Mikulincer and Shaver [5], who adopt an attachment theoretical viewpoint [6], suggest that early attachment insecurities of an individual can provide the basis for mental disorders.Their review [7] provided compelling evidence that attachment insecurities, which are due to 'early interactions with inconsistent, unreliable, or insensitive primary care-givers' [8], is the common factor for a range of mental disorders.These attachment insecurities can lead to depression, anxiety, posttraumatic stress disorder, suicidal tendencies and eating disorders, as well as to a number of personality disorders.
On the other hand, Mikulincer and Shaver [5] argue that a sense of secure attachment has healing effects and leads to improvement in affect regulation and mental health.They showed that using security-enhancing attachment figures -such as subliminal pictures or subliminal names of people-individuals can acquire a sense of security knowing that these supporting figures are available.They called this "security priming" which can improve the mood of the individual and enable them to deal in a more effective way with stressful situations and to avoid the detrimental effects of any 'threats on positive moods' [9].
While John Bowlby was inspired by cybernetics, control theory and ethology for developing attachment theory, in recent decades there has been supporting evidence from developmental neuroscience.Alan Schore's self-regulation theory integrates attachment theory with neuroscience [10][11][12].According to selfregulation theory, securely attached children are able to internalise the capacity for emotion self-regulation from their parents through positive dyadic interactions leading to an optimal development of the OFC and other regions of the brain in charge of affect regulation.By appropriately mirroring, attuning and resonating the emotional states of the child, the parent of a securely attached child is able to regulate their strong emotions and reach homeostasis.Singing, dancing, playing and laughing are used to maximise the positive affects, while comforting, soothing and cuddling minimise the negative affects.The regular repetition of such optimal interactions gradually provides the child -through neuroplasticity and long term potentiation-the neural circuits in the brain for affect self-regulation, which will play a key role in their emotional well-being for life.
Conversely, repeated sub-optimal parent-child interactions and trauma disrupt the innate capacity for developing affect self-regulation in the child which leads to insecure attachment.

Digitised technologies for treating mental illness
Given the increasing prevalence of mental disorders in recent years which has been significantly accentuated in the COVID-19 pandemic (see for example [13] and [14]), the use of technology for developing effective treatment for mental illness has become an urgent need.
The 21st century is characterised as the century of technological revolution, where the widespread use and rapid development of technology have led to some of the biggest changes in our way of living, thus affecting all sectors of society, including the medical field.Medicine and technology converge together in patients' health care at faster rates with innovative methods.There has been a significant growth in the use of digitised technology for treating mental health problems.In particular, a great deal of research has provided strong evidence that VR can be used to treat people with various mental disorders [15][16][17][18]; see below in the related work.
A different dimension of the problem that needs to be addressed is the question of accessibility to treatment.Only a small proportion of people who suffer from mental disorders are actually receiving any kind of treatment [1].In fact, existing psychotherpeutic methods require a great deal of interaction with a therapist, rendering them not scalable.Although digitised technologies can in principle provide a way for scalability, this can only be achieved through selfadministrable therapeutic techniques.
The recently introduced self-attachment intervention [19] is a self-administrable psychotherapeutic technique which is informed by attachment theory.It aims to help the individual to tackle their early insecure attachments and create secure attachment for themselves.In this way the individual would be able to increase their capacity to cope with stressful life events and negative emotions.Self-attachment is focused on creating a secure attachment -in the form of an imaginative passionate bonding-between the adultself of the individual, representing their logical and rational faculty, and their childhood-self representing their inner emotional world.The bond creation is facilitated by using images, e.g., photos, of the individual's childhood.Internal secure attachment and a capacity for affect regulation is then attained by simulating the optimal parent-child dyadic interactions in a selfadministrable manner.
Because self-attachment intervention is selfadministable, it has the potential to develop through digitised technologies to an automated procedure for use by people around the world.Furthermore, we hypothesise that self-attachment can become even more efficacious if it is enhanced by virtual reality (VR) that can be used as well as childhood photos.We analyse a platform that has been created in order to deliver an innovative type of the self-attachment intervention.The platform has emotion recognition capability with a dialogue manager and is personalised for each user, featuring their customised child avatar.The flexibility of the platform to adapt to the needs of the user is crucial as it can positively contribute to the successful completion of the treatment [20].

Related work
Virtual reality has been extensively used in the resent years in psychotherapy as many studies have suggested the potential benefits of treating a range of mental disorders.
A study which shows that a virtual reality environment can produce a physiological and psychological stress response was conducted by Martens et al. [21].This study measured the stress response of healthy individuals in a randomised controlled trial where the two groups were exposed in different VR scenarios.After several measurements, the two groups showed a significant difference in their stress response which suggests that VR environment can be 'felt real'.Thus VR can be used to investigate the potential benefits of treating a range of mental disorders.However, this study also has its limitations, including the small number of only healthy young adult male participants.Nevertheless, the huge potential of VR in healthcare is noteworthy.
A feasibility study by Loucks et al. [18] suggests that the virtual reality exposure therapy (VRET) can help in the treatment of PTSD.They focused on the treatment of PTSD because of military sexual trauma and thus they developed an appropriate virtual environment that aims to safely deliver a treatment.Their results showed significant clinical reduction in symptoms after the therapy and the reduction was also maintained after three months.In addition to the several study limitations mentioned by the authors, the procedure is not autonomous, meaning that the therapist needs to control the environment.As a result, in our study we have tried to solve this problem by creating a fully automatic procedure for the therapy.
The proof-of-concept study by Falconer et al. [15] investigated whether a virtual embodiment of depressed patients within a virtual reality environment could have an improvement on their conditions.The experiment included a scenario where in the first stage, the participants suffering from depression were in the body of an adult trying to express compassion to a virtual child, while in the second stage, the participants were in the place of the child receiving compassion from themselves.The study suggested that the immersive virtual reality can be beneficial to the participants as they observed a reduction in the severity of depression and self-criticism and on the other hand, an increase in self-compassion.
A study by Freeman et al. [16], also, showed promising results regarding the use of virtual reality to treat persecutory delusions.We can infer that VR allows users to experience psychologically difficult situations in a relatively safe setting and learn how to deal and cope with the challenges through repeated exposure in a VR environment.Even though they know that the situation is simulated, their minds behave as if it is real and consequently their learning can be transferred to a real-world situation.VR can easily simulate rare situations and can reduce the need for interactions with therapists.Until recently, VR equipment was expensive and difficult to set up, which has meant that research has been making only slow progress.This is not the case anymore as the required technology is now available outside well-equipped laboratories [17].
A quantitative meta-analysis by Opris et al. [22] investigated the effectiveness of VRET in anxiety disorders compared to classical interventions like the in vivo exposure therapy.The results from the 23 final selected studies revealed that VRET can be as efficient as classical evidence-based interventions.Ling et al. [23] carried out a meta-analysis in order to investigative 'the relationship between Self-Reported Presence and Anxiety in Virtual Reality Exposure Therapy (VRET) for Anxiety Disorders' and showed that these two are correlated during VRET.A metaanalysis by Morina et al. [24] was conducted in order to investigate whether virtual reality exposure therapy (VRET) had an impact on the behaviour of patients with different kinds of phobias.They concluded that the patients' scores improved after VRET and that VRET can change the behaviour of patients in real-life scenarios.
Since this study aims to build an immersive VR platform which can actively adapt to each patient's needs, emotion recognition plays an important role.A system capable of correctly recognising the emotional state of the user can enter into an emphatic dialogue with the user, thus helping them to regulate their emotions.Automatic emotion recognition is a challenging task if we consider the different and complex ways in which emotions are expressed.Auditory and visual modalities include speech, facial expressions, and body movement, and can be combined to achieve a multi-dimensional analysis of emotions [25,26].Commonly used emotion dimensions are arousal and valence that are introduced by Russell in 1980 [27].
Tzirakis et al. [25] implemented an end-to-end deep neural network that takes as input audio and visual data and performs an emotion prediction.They used two networks to train the audio and visual data separately and the results were fused to train another network to produce the final prediction.The performance of their model was overall better than previous works on the same training dataset.Another application of neural networks for emotion recognition from Tripathi et al. [26] also presented good performance.The authors' approach was based on the training of different networks on speech, text and motion data, finding the best architecture for each modality and then fusing the results on the final layer to recognise the emotion.
Base on Relational Frame Theory (RFT), Miner et al. [28] used NLP to provide an analysis of sentiment dynamics in conversations between two humans, as well as between a human and an agent.They showed that when humans need to respond to a negative sentiment, the probability of replying negatively to a robot is twice as high than replying negatively to a human.This shows that any conversational agent for healthcare use should be carefully developed.They also developed an Affective Neural Model (ANM), which is a recurrent neural network (RNN) that is trained on pairs of dialogue lines with corresponding affect labels.Their aim was to predict a probability distribution of seven affects (angry, surprise, happy, love, sad, disgust, laughter) and their reported accuracy on the test set was 90%.However, their approach cannot be generalised because the predicted emotional state is based only on specific labelled words.
Mobile technologies can be employed as platforms that can run algorithms employing real-time data, such as personal information, and can therefore deliver customised medical interventions [29].In addition, due to their ability to collect many different types of personal data, these technologies can potentially detect mental health conditions.Hence, conversational agents that are tailored to an individual, can be created to facilitate dynamic interactions.Using the results from such studies, [28] and [29], Fitzpatrick et al. [30] created Woebot, a fully automated text-based conversational agent.Their randomised controlled trial on non-clinical population showed that patients, who used Woebot for two weeks, had experienced a reduction in depression.However, no significant difference was observed on anxiety levels between the control group (participants who studied the National Institute of Mental Health ebook, "Depression in College Students") and the Woebot group.
The reduction in depression observed in the Woebot group shows the potential benefit that such a chatbot can provide in the treatment of mental disorders.However, to our knowledge, the efficacy of Woebot has not been compared with the traditional therapistdelivered cognitive behavioural therapy (CBT) for treating depression and anxiety.Such a comparison would provide us with more significant results, revealing the actual efficacy of the Woebot.

Self-attachment intervention
Self-attachment, introduced in [19], is informed by John Bowlby's attachment theory [6] which suggests that from the first year of life, children create an emotional attachment with their primary care-givers.The type of attachment determines the personality and emotional development of an individual later in their adult life; it also underpins the way they perceive the world, through their so-called internal working model, built on the quality of their relationships with parents and other significant people, such as siblings.The four basic types consist of secure attachment, which leads to the capacity for self-regulation of emotions, and three types of insecure attachment types: avoidant, ambivalent and disorganised attachments [31].In [8], the role of attachment objects -such as transitional objects for children and religious beliefs for adultsthat have been used by human beings to create secure attachment for emotion regulation has been reviewed.It is then suggested that the adult-self of an individual can internally play the role of an attachment object.Thus, the individual "is imagined to consist of an inner child, representing the emotional self, rooted mostly in the right brain and the limbic system, which becomes dominant under stress, and an inner adult corresponding to the logical self, rooted mostly in the left brain and the prefrontal cortex, which is dominant in the absence of stress" [19, p. 6].
The aim of the self-attachment intervention is the creation of an affectional bond between the childhoodself and the adult-self who takes the role of a new primary carer-giver.In this way, a secure attachment is created between the individual and the child that represents their emotional state.The proposed selfattachment protocol [8] contains the following four stages: 1.The user is introduced to the self-attachment intervention and its scientific and theoretical framework.This is an important stage as the user must be dedicated and motivated in order to successfully complete the protocol.
2. In this stage the relationship between the user and their childhood-self is initiated by looking at childhood photos.This is a visual stage emulating the child's early years when the vision was the main sense.Both a happy and sad photo are chosen by the user for the intervention in order to remember the environment and relationship with their care-givers.
3.Here the bond is established as the user falls in love with the childhood-self and vows to take care of and be a secure attachment object for the child.This process allows the user to self-regulate their emotional state and subsequently stay motivated for the rest of the protocol.In order to have a successful bonding process, the intervention employs techniques, such as singing and dancing, which are known for releasing dopamine in the brain.
4. At the final and longer stage, the user interacts with their childhood-self by emulating the optimal interactions of parents with their children -in particular during any emotional stress or crisis-such as embracing, loudly reassuring and physically cuddling that is self-administered, for example self-head-massage.In this way, through many iterations, negative emotions both from past traumatic events and current every day events are revisited, reprocessed and replaced by more positive ones.
The reiteration of the protocols in the last stage is meant to create new habits -by neuroplasticity and long term potentition-for the individual to enable them to take care of themself: thus every time the individual is in a stressful situation they will be able to support and calm their childhood-self, i.e., they will be able to regulate their emotions.The details of the protocol are given in [32].
As reviewed in [8], the self-attachment intervention is a double role playing game in which the user adopts, at the same time, the role of the adult-self (the thinking, rational agent that is the new parent) and the childhood-self (the emotion-driven agent).It is this double role playing game that makes it into a selfadministrable protocol.
Self-attachment is supported by various computational models including game theory, artificial Hebbian neural (Hopfield) networks, reinforcement learning (Qlearning) and neural models of the brain [33,34].
A neural brain model for the level of empathy between one-self and others is presented in [35].This model is extended to apply to the self-attachment intervention in order to describe how the adultself can increase empathy towards the child during the intervention.According to [35], compassionate interactions between the adult-self and the childhoodself lead to internal secure attachment which can reduce anxiety, depression and self-criticism.It is also argued that compassionate interactions with one's own childhood-self can provide a more effective way of attaining self-compassion than by expressing compassion with a generic child as in [15].
In [36], a "free-energy" model of the formation of infant attachment types has been constructed.It is based on active inference, which provides an account of action, perception and learning in the brain by a child agent interacting with a parent agent.There are six possible game-theoretic interactions between the child (with three actions: seek proximity with the parent, guarded seek or avoid the parent) and the parent (with two actions: attend to or ignore the child).The parent is modelled simply by the probability that it responds to the distress signal of the child agent, whereas the child agent learns a payoff table for its various interactions with the parent agent and aims to choose an action that minimises the free energy that can be viewed as "surprise" which in turn maximises the child's payoff.In particular, it is shown that the three distinct forms of attachment types emerge from these dynamics.A low probability of attending by the parent agent produces an avoidant attachment, while a high probability leads to a secure attachment and a medium range probability creates an ambivalent attachment.
The above active inference model can simulate a successful self-attachment intervention.The two agents are both internal to the user, who will gradually increase the probability of the parent agent attending to the emotional distress of the child agent, from a value close to zero to a value that is close to one.
An immersive virtual reality mobile platform was proposed by Cittern et al. [37] which implements parts of the self-attachment protocol.This platform aims to enable the user to earn secure attachment and thus learn how to self-regulate their emotions.Their paper presented a prototype mobile app where the authors suggested different development options for efficient implementation of the self-attachment protocol.Their current VR platform consists of different protocols of self-attachment intervention.Ghaznavi et al. [38] provided an evaluation of that platform and several problems were identified, including non consistent navigation choices and non precise ray-casting controls.
The main limitation of their platform is that most of the protocols are not interactive and require many manual selections by the user, making the platform not very user-friendly.For example, the user is required to take the initiative to select an appropriate subprotocol by ray-casting and to correctly assess their own emotional state that is to be projected to the child.In practice, due to these limitations, the VR platform cannot be used in any therapeutic setting.

New VR platform for self-attachment intervention
Our new immersive VR platform includes features and functionalities that create an interactive userfriendly environment, where users can practice most of the self-attachment protocols using an Oculus Quest device.The platform includes a virtual agent that is controlled by a dialogue manager in order to help the user during the experience.Furthermore, a realtime emotion recognition algorithm contributes to the creation of a personalised platform by predicting the emotional state of the user based on their speech input.In order to make it easier for the user to interact with the environment, we use the handtracking functionality which also allows more natural interactions with the child avatar such as embracing.
Having in mind that the platform is going to be used by people with anxiety and depression, the environment must be pleasant and simple to use.User interface is the means that are available to the user and allow them to interact with the computer.Therefore, it is very important that these means are user-friendly and well designed in order to increase the user's involvement with the platform.Having the user engaged to the platform is vital for the successful completion of the intervention.As a consequence, this study lays emphasis on the design of the different elements and functionalities of the platform, with some of the most important listed below.

Software and Hardware
The software is implemented with Unity3D [39] which is a widely used tool and is appropriate for the implementation of self-attachment intervention.The implementation requires a number of development tools, for example Android SDK, Oculus SDK and Avatar SDK.Oculus Quest headset [40], which is a virtual reality head-mounted display (HMD) device, is used to test the implemented software.It provides a six degrees of freedom system experience with the aid of two hand-controllers.This device is very easy to use and does not need any further equipment to work properly and for this reason is also suitable for patients carrying out the self-attachment intervention.

Hand tracking
Hand tracking is the latest feature of Oculus Quest headset which, as the name suggests, tracks the hand movement and gestures of the user in real-time.Hands are the main input method for any VR application, while hand tracking allows a natural representation of the user's hands.Hand tracking is a very powerful tool since it provides an increased sense of presence and immersiveness [41], and is more user-friendly given that there is no need for the user to learn how to use the controllers.Figure 1 shows how the hands of the user are rendered in the platform.The reason we are using hand tracking is because it allows the implementation of all the features required by the intervention inside the platform, such as the user's interactions with the child avatar which can become more realistic.

Navigation control
Due to the absence of controllers we have had to implement a functionality that allows the user to navigate and move inside the platform in order to visit the different rooms.As a result, we have used the hand tracking feature to implement a gesture that allows the user to move forward.By bringing the middle finger with the thumb together (pinch) at both hands simultaneously as shown in Figure 1, the user is able to move.

Buttons
The usability of the platform is vital for the successfulness of the intervention, thus there are different buttons that the user can press with the index finger so as to make a selection inside the platform.Using the buttons to turn on/off the lights, to access the platform tutorial and to provide feedback on the platform increase the levels of realism and engagement of the user.Additional buttons to answer questions are also provided, such as the 'Yes' or 'No' buttons, as illustrated in Figure 2, and the emotion buttons, as shown in Figure 3.These buttons are deployed in front of the user to select the appropriate answer to the virtual agent's question.The emotion buttons are used to correct the prediction for the emotional state of the user.

Microphone
The microphone of the headset is an additional method for the user to communicate with the virtual agent.We record the user's voice so as to make predictions about their emotional state using the incorporated emotion recognition algorithm.The use of speech is an easy way of providing information and is very similar to psychotherapy, where the patient is involved in private sessions with their therapist.

Interactions with the child avatar
During some stages of the self-attachment protocol the user is required to interact with the customised child avatar.These interactions can be accomplished both by speech and touch.In cases where the user has to sing to the child, we have developed an algorithm that is able to detect the user's song and trigger an action.This action can be either a change in the emotional state of the child or some background music.In some other cases, the user has to approach and embrace the child avatar using their virtual hands.Similar to before, the platform is able to recognise the user's movements so as to trigger the necessary actions.Such abilities of the platform make the intervention procedure very immersive and simulate real-life situations where the user is encouraged to act normally; for example, to act in the same way as they were going to interact with a living child.

Virtual agent
The creation of the virtual agent avatar, who acts as an assistant to the user, is an important part of the platform.This agent uses speech in order to give clear instructions to the user on how to complete the selfattachment intervention.In addition to the instructions, the agent asks the user targeted questions that allow the emotion recognition algorithm to determine the emotional state of the user.The virtual agent stands in the living room and its avatar is illustrated in both Figures 2 and 3.The dialogues that the agent needs to communicate to the user are a predetermined set of interactions and questions.The virtual agent is a humanoid character whose body avatar is created by FAtiMA toolkit [42].

Child avatar
The self-attachment intervention is based on the interactions between the adult-self and the childhoodself of the user.As a result, the existence of a representation of the child inside the platform is necessary.This can be done by the creation of an arbitrary child avatar, but based on various studies, an avatar that closely resembles one's own self has higher cognitive effects [43,44].Furthermore, Franco et al. [45] provide evidence that demonstrate stronger self-representation impact on participants when using avatars that look like them rather than random avatars.Given that the self-attachment intervention requires an empathic care-giving behaviour by the user towards the child [35], a photo-realistic child avatar that looks like their childhood-self can motivate such behaviour [46].
With that in mind, we have created a realistic child avatar that not only resembles one's childhood-self but can also interactively change its emotional face expressions.This aims to urge the individual to relate in a higher degree with their photo-realistic child avatar, rather than with their childhood photos.Therefore, the child avatar is customised for each user based on their favourite 2D photograph from their childhood.The head of the avatar is created by the Avatar SDK [47] as a 3D model and is attached to a standard body using an automated procedure.To achieve higher levels of realism, manual modifications, such as the style and colour of the hair, and the colour of the clothes, can be made.An example avatar is depicted in Figure 4.
In addition, animations are used in order to control the body and face of the child avatar.By using animations we manage to have specific emotions displayed by the child avatar and make the avatar move and dance.We have used animations so as to expose emotions like 'happy', 'sad' and 'fear', as well as an animation for dancing.For example, the 'happy' animation is demonstrated in Figure 4. Self-attachment intervention is based on emotions and for that reason the ability of the platform to display emotions through the child avatar plays a vital role.An emotionally active child avatar that looks like the childhood-self of user, gives another motivation to the user to take actions that will change the negative emotional state of the child.By singing to the child avatar or by embracing it, the user acts as a care-giver and based on these actions the child avatar reacts and expresses positive emotions by  smiling and dancing.Customisation of the child avatar together with emotion animations seek to make the platform more user-friendly and effective, as well as to help the user to easily connect and bond with the realistic avatar.

Emotion recognition algorithm
Most importantly, the platform has the ability to recognise the emotional state of the user and thus it can provide a more personal experience and a more effective intervention to the user.Humans express emotions in many ways, for example using speech and facial expressions, which give us the auditory and visual modalities.During the intervention, the user wears the head-mounted display which covers a large part of the face in this way making it difficult to accurately use the visual modality to predict emotions.Therefore, the emotions are recognised using only the input speech of the user, which is converted to text, so as to create two modalities: audio and text.Using only these two modalities should not be a problem as studies have shown that audio and text outperform audio and video in different models [48].
The purpose of the emotion recognition algorithm is to predict the emotion of the user in real-time by making a prediction for every user's spoken utterance.In this way, the predicted emotion can be used for a personalised intervention where the emotion is instantly projected to the child avatar.A member of our group has initially created the emotion recognition model, which is an end-to-end neural network that uses audio and text modalities in order to do multi-class emotion classification.The model was trained on the IEMOCAP dataset [49] and generated some satisfying results.The advantages of this model over other best performing models is its ability to operate on live input data and does not require data pre-processing.Later, the model was modified and improved by another member of our group, who has adapted the model in order to do multi-label emotion classification using the CMU-MOSEI dataset [50] for training.The model predicts the six basic emotions (happy, sad, fear, disgust, surprise and anger) and has demonstrated good performance.
The emotion recognition algorithm is running on a local server that executes the code when it is required and returns the results.The idea is to send the user's recorded audio to the server to execute the emotion recognition algorithm and return the classified emotion.We are planning to implement this algorithm to a remote server, thus allowing people around the world to have access to it for the purpose of remote future trials of the platform.Also, the server can be generalised and used in other tasks and projects that require emotion recognition capabilities.

Evaluation of VR platform
We have conducted small preliminary trials of the platform on a non-clinical population in order to evaluate the effects and the usability of the platform.We received ethical approval from our institution's Ethics Committee in order to conduct these trials and we are planning to conduct a trial on a larger scale when the situation allows us to do so.

Hypotheses
The purpose of the trials was to test the following two hypotheses: (i) Users will find the VR platform realistic.(ii) Users will experience more intensive compassion and a more significant positive change of emotion when they interact with their look-alike child avatar than with a generic child avatar.

Methods
The trials were conducted as follows.Five volunteers (3 males and 2 females in their mid-twenties) were selected based on the following inclusion and exclusion criteria: physically healthy, over 18 years, with no current psychiatric disorders.The mental health condition of the volunteers was evaluated using two psychometric tests, the Beck Depression Inventory (BDI) and Beck Anxiety Inventory (BAI).Volunteers that satisfy the inclusion criteria were then invited to participate to the trials where they had to follow a simple protocol inside the VR platform.The protocol we have used for the trials consists of the following steps: 1.An introduction about the self-attachment intervention is presented by the virtual agent.
2. The virtual agent attempts to elicit sadness in the participants by asking them to describe a recent adverse event where they experienced sadness.
3. Using the recorded voice of the participant, the platform performs emotion recognition to predict their emotional state.
4. The participant enters the child's room where they interact with the child avatar by reassuring and embracing it, so as to change the emotional state of the child from sad to happy.
The trial for each participant was split into two half-hour sessions separated by seven to ten days.The same protocol had been used in both sessions, except from the fact that during the first session participants were interacting with their look-alike customised child avatar, whereas during the second session, they were interacting with a generic child avatar.

Results
After each session, participants were asked to complete a questionnaire to evaluate the platform.The questions were formulated in collaboration with a consultant psychiatrist.We have used Likert scale questions with possible answers 'Not at all', 'A little', 'Moderately', 'Very' and 'Extremely', as well as open-ended questions, split into two types, 'Emotional impact' and 'User experience' questions.The questionnaire used for the first session is presented in Table 1 and the results are shown in Figures 5, 6, 7 and Tables 2, 3.In the second session we did not include the 'User experience' questions to avoid repetition as presented in Table 4 and the results are shown in Figures 8, 9 and Tables 5, 6. Acknowledging the very small sample size as a limitation of these trials, the following results were obtained.
The user experience was classified as 'very' realistic and useful which corroborates the first hypothesis.Most significantly, all of the participants found the interactions with the child avatar 'very' realistic, and the virtual agent and the hand-tracking feature were characterised as 'very' or 'extremely' useful.They commented that the interactions with the virtual agent could be more realistic which we will improve for the full trial of the framework.
Regarding the emotional impact, during the first session where the participants interacted with their customised child avatar, most of them felt 'a little' sad after sharing their sad event with the virtual agent, but most of them felt 'very' compassionate towards the child avatar.In addition, the majority of them felt 'very' happy and empathic, and 'not at all' frustrated during the experience.They characterised the experience as "intense" and "unique", triggering them plenty of feelings such as "compassion", "affection" and "sympathy" towards the child avatar.The answers to Question 5 in Table 1, followed by those to Questions 6-9, show that the participants engaged and interacted with the child avatar with empathy as in a real experience.Their emotions mirrored those of the avatar, changing from sad to happy, relief and satisfied which resulted in a positive change of emotion and an overall positive experience.
On the other hand, during the second session with the generic avatar, participants were less compassionate towards the child as they did not feel that much need to console the avatar.Also, by comparing Question 10 in Table 4 with the equivalent Question 8 in Table 1, we can understand that participants were less empathic with the generic child avatar.Participants' comments showed that they had similar emotions between the two sessions, but with less intensity during the second session as they had a bit less sympathy towards the generic child avatar in comparison to their photorealistic child avatar.By always having in mind the small sample size, we are confident that the second hypothesis can be confirmed as well.1.

Limitations
Apart from the small sample size, there are a number of different limitations related to the platform.The emotion recognition algorithm only uses speech and text modalities; it can be enhanced by including the visual modality.More data can be collected to more accurately infer the emotional state of the user from wearable devices, such as a smart watch to measure the individuals heard rate.In addition, the predetermined Q1 Did you feel sad after explaining the sad event to the virtual agent?Q2 Did the virtual agent correctly recognise your emotion?Q3 Did you see the expression of sadness on the face of child avatar?Q4 Did you feel compassion towards the sad child avatar?Q5 How did you feel when the child avatar's facial expression turned from sad to smiling after you moved closer to embrace it?Q6 Did you then stop feeling sad?Q7 Did you then feel happy?Q8 Did you feel empathic?Q9 Did you feel neutral?Q10 Did you feel frustrated?Q11 Please describe the overall emotional impact of your experience.Q12 How realistic the environment?Q13 How realistic were your interactions with the virtual agent?Q14 How realistic were your interactions with the child avatar?Q15 Did you find the hand-tracking feature useful?Q16 Did you find the emotion recognition useful?Q17 Did you find the virtual agent useful?Q18 How realistic was the whole experience?Q19 How useful was the whole experience?Table 2. Answers to Q5 in Table 1.
i I felt empathy towards the child avatar, affection and satisfaction.ii It was very satisfying to see the expression of happiness clearly portrayed on the face of the child.The change on its expression made me want to embrace it even more as I was more happy than before.iii I felt relief and satisfaction that sadness is not there any more.iv I felt happy.v I felt happy, it was like seeing my younger self be happy.set of dialogues does not allow the virtual agent to adapt to the conversation with the user, thus leading to less natural communication between the two parties.Thus, we need to create a virtual agent who has a more natural voice and is able to sustain an empathic conversation with the user.Currently, only part of the self-attachment protocol can be practised by our in Table 1.
Table 3. Answers to Q11 in Table 1.
i It was a very intense experience that created in me feelings of compassion and affection.ii It is a very unique experience which made me feel affection and responsibility over the child avatar, resulting in a very satisfying feeling.iii Due to the short time that I had in there, my emotions did not change much.However, I felt a bit better after the procedure.iv It was an emotional experience, felt both happy and sad, also sympathetic for the child.v I was a bit worried in the beginning but by the end I felt more comfortable and happy.1.
VR platform.Some of the other sub-protocols of the intervention need to be appropriately formulated in the context of the platform by augmenting the current framework with additional tools.Such tools, for example, can enable the user to revisit and reprocess their past traumas based on exposure therapy.4.

i
It was satisfying to make the child happy again, although I felt more compassionate and I could sympathise more with my personalised child avatar.ii Very positive!Making the child happy was very satisfying!iii I was kind of sad when describing the sad event but then a little satisfied with the happy child.iv It was an emotional roller-coaster, at the beginning I felt sad about the incident in my past and the child but then felt satisfied and happy for making the child laugh.v It was fun to have the cartoon avatar this time.
I was in a really good mood after seeing the avatar.

Conclusion
The unprecedented global experience in 2020 due to COVID-19 pandemic has made clear the need to use technology in order to deal with serious mental disorders.The forced confinement at home and social alienation, the diverse work difficulties, as well as the stress and insecurity about our physical health are all crucial outcomes of COVID-19 that have specifically exacerbated the problems of people with mental disorders.As a result, this aggravated situation has put huge pressure and strain on community mental health centres and psychiatric wards [51].The self-attachment intervention which can be accessible through the platform described in this paper, can be helpful especially during such difficult times, as it aims to facilitate the process of treating people at their own home, without further burdening hospital units and specialised clinicians.All the aforementioned highlight the need for such a platform which can offer help and treatment to people with mental disorders in 11 EAI Endorsed Transactions on Pervasive Health and Technology Online First a more user-friendly and effective way.It may be argued that technology alone cannot work wonders.However, we think that such a virtual reality platform can create a comprehensive and high-quality mental health care.
The main achievement of this study is the first step in the creation of an immersive VR platform that can eventually help the user to practice most of the sub-protocols of self-attachment intervention.Importantly, the procedure can be carried out by anyone, without the assistance of a human.In addition, the platform is personalised for each user which is a great advantage.The level of immersiveness and the sense of presence are significant.This comes as a result of the combination of a great deal of small components that come together in the platform.The results of the impact evaluation are promising, but there is still a long way to go before the application can be used by patients.
In the longer term, a virtual psychotherapist can be created, which is managed by a machine learning algorithm and can eventually replace an actual psychotherapist.This virtual agent should be able to infer the patient's mental state and safely suggest the appropriate stage of the self-attachment protocol that the patient should follow.Equally important will be the ability of the virtual psychotherapist to understand whether the mental state of the patient originated from any past trauma or from any recent problems that the patient may face at the time in order to recommend the most effective sub-protocol of the self-attachment intervention.

Figure 1 .
Figure 1.Pinch gesture.The middle finger is touching the thumb at both hands simultaneously.

7
EAI Endorsed Transactions on Pervasive Health and TechnologyOnline First

Figure 4 .
Figure 4. 'Happy' emotion animation is presented on a customised avatar (real photo in the background).

Figure 5 .
Figure 5. Overall percentage selection for questions Q1 to Q4 in Table1.

Figure 6 .
Figure 6.Overall percentage selection for questions Q6 to Q10 in Table1.

Figure 7 .
Figure 7. Overall percentage selection for questions Q12 to Q19 in Table1.

Table 4 .
Likert scale (N=5) questionnaire: Trial with generic child avatar.Q1 Did you feel sad after explaining the sad event to the virtual agent?Q2 Did the virtual agent correctly recognise your emotion?Q3 Did you see the expression of sadness on the face of child avatar?Q4 Did you feel you wanted to console the sad child avatar?Q5 Did you feel you changed the child avatar's emotions?Q6 Did you feel that the change in the child avatar's emotions had an impact on you?Q7 How did you feel when the child avatar's facial expression turned from sad to smiling after you moved closer to embrace it?Q8 Did you then stop feeling sad?Q9 Did you then feel happy?Q10 Did you feel empathic?Q11 Did you feel neutral?Q12 Did you feel frustrated?Q13 Please describe the overall emotional impact of your experience.

Figure 8 .
Figure 8. Overall percentage selection for questions Q1 to Q6 in Table4.
iI felt happy myself.ii It was a very overwhelming feeling making a child, even though unfamiliar to me, to smile and dance.iii I didn't have any intense emotions.I was kind of happy.iv Very happy.v I started laughing.

Figure 9 .
Figure 9. Overall percentage selection for questions Q8 to Q12 in Table4.