Smart Feedback and the Challenges of Virtualisation

The use of audio feedback is becoming more prevalent and it would be possible to use avatars for this purpose. When audio feedback is recorded by a human tutor, the recording contains not only the text of the feedback, but also additional information associated with the intonation and manner of delivery of the voice. Experiments were conducted to investigate student’s responses to the use of audio in comparison with other forms of feedback. Students were generally positive about audio feedback; results also indicated that the conveyed emotion or intent is significant and that it is perceived by the student as an important part of the feedback. We also explore this in the context of strategies for the deployment of virtual agents in the provision of feedback.


Introduction
The development of intelligent agents, affective computing and virtual spaces for training and education, together with the convergence of media platforms, is allowing the development of smart educational environments. Automated systems for providing advice and feedback could, where appropriate, provide rapid support for students in their learning. This supports the encourages and facilitates existing identified good practice, but does not place an unrealistic burden on the tutor. One of the challenges in deploying such systems is to take full advantage of the new technologies while retaining the benefits of existing tried and tested methods.
A strategy that has emerged recently and has been successfully introduced into many courses is the use of recorded audio as feedback. However, the reasons for this success are not entirely clear. In this paper we explore some factors in the use of audio feedback, including student responses to audio feedback compared to other forms and the significance of tone of voice, in order to better understand students perceptions of this mode of feedback. This in turn allows us to consider requirements for the provision of audio in smart educational environments.
1940s, although recently the use of satellite broadband technology for this purpose has become more prevalent. The children in these programmes live in remote communities and rely on this communication for both their formal education and for socializing with their fellow pupils. The system has been shown to be at least as effective, if not more so, than face-to-face teaching [1]. The main issues with these schemes appear to have been reluctance on the part of schools to engage with the material [2], preferring to do things their own way, rather than specific issues with the characteristics of the Allport and Cantril [3] point out that the place of visual aids and supply the personality of the tea This view is supported by Lehman [4], who considered the role of emotion in distance education and the importance of presence, and concluded A more complete understanding of emotion as a component of cognition and behavior and of the role of emotion in creating a sense of presence in teaching and learning can help instruct us in effective teaching, instructional design, In order to effectively provide a context for this work, we explore the nature and importance of student feedback, the use of voice in feedback and emotion analysis and what can currently be achieved in terms of expressing emotion in artificial voices.
A flexible and useful model of the role of feedback in learning is presented by Nicol and Macfarlane-Dick [5], in which they consider the learning process to comprise both internal and external feedback cycles that are followed in an iterative manner. There is a great deal of published work on the importance of feedback in the learning cycle and a number of heuristics for assessing the quality of feedback have emerged from identified good practice, some of which are: Timeliness Useful for improving future performance Personal Understandable Puts grade into context Encourage teacher and peer dialogue Encourage positive motivation and self-esteem Facilitate self assessment Gibbs [6] explored the problem of increased workload for staff in providing feedback of appropriate quality to large cohorts of students. Findings by previous studies have concluded that, with appropriate tools and workflow, the provision of audio feedback can reduce the time taken to provide feedback when compared to written feedback.
Speech contains information not only in that which is said, but also in the manner in which it is said, and the potential ability of smart environments to analyse for emotion and stress cues has implications for privacy in addition to potentially leading to more responsive systems. The merging of emotion and computing is an example of affective computing, which was first described by Picard [7]; it describes the potential for emotions to be both analysed and expressed by computational devices. Emotion is difficult to define, and difficult to measure, which makes it an interesting challenge [8,9].
Linnenbrink [10] explores how emotions play an integral role in education and brings together a wide range of theories and models to explore the integration of affect, motivation and cognition. It is clear that there are many challenges and this is a relatively new area of research.
Robison et al [11] developed an automated system to investigate the consequences of affective feedback in intelligent tutoring systems. The system was text based, but did identify the importance of identifying appropriate Previous studies [12,13,14,15,16] have found that the use of audio feedback had a wide range of benefits for both students and tutors. The students appreciated the feedback for a wide range of reasons, including the additional detail often provided, the tone of voice in which comments are made and the feeling that they were being exposed to a thinking process.
Kapas et al [17] differentiate between different studies and consider Emic and Etic markers, which refer to those voice parameters that can be identified by a human as characteristic of a given emotion and those that can be identified by analysis, but not by another human. With audio feedback, user interpretation of emotion and intent is based on their cultural framework, experience and the human-identifiable markers.
Issues such as the number of identifiable emotional states and how these differ ethnographically, depend on the parameters chosen and the model for emotion adopted [8]. Some research has focussed on considering a limited range of emotions to suit the relevant purpose, which makes recognition more accurate [18].
Cowie et al [8] consider some of the difficulties associated with resolving emotion and the range of existing models for detecting emotion in the voice. Generating emotion-based speech is less complicated, but it still presents considerable challenges. An example is Papous the Virtual Storyteller [19], in which the use of emotion tags allows a virtual storyteller to express a range of emotions. The authors concluded that the voice was more synthetic than they had hoped for; that is, it did not sound like a human voice. Another strategy for audio EAI Endorsed Transactions on Future Intelligent Educational Environments 09 2014 | Volume 1 | Issue | e 3 feedback would be to use combinations of pre-recorded phrases, as is often used for public transport announcements. The use of pre-recorded phrases would limit the potential richness and individualisation of the feedback, but would have the advantage of sounding natural. Their use in audio systems might be similar to the use of feedback banks [20]. Tao et al [21] summarise a wide range of speech synthesis strategies and conclude that continued work is necessary to improve synthetic speech quality.

Work Undertaken
responses to pre-recorded audio feedback, in terms of emotional perception and content (although these factors are not independent). Our studies take an emic approach, where we are interested in the perceptions of the students and not on any automated analysis of emotion. Three studies were carried out to obtain qualitative data on human-voice audio feedback and a pilot study to understand the implications of the use of virtual audio feedback. In the first study, forty students were asked for their views on the use of audio feedback in two pieces of formative coursework (towards a technical report) in a final year undergraduate I.T. module. In the second, eighty students from the same course and two independent tutors were asked to identify emotion and intent in the voice used for audio feedback in two pieces of formative coursework. The third study was in respect of summative audio feedback on a multimedia artefact for fourteen final year multimedia computing students. The students were asked the same questions as in the second survey. In each study, the audio files were recorded on a Zoom H2 recorder and compressed and processed using the batch facility in Audacity.
The purpose of the first study was to determine whether the use of audio feedback was appropriate for the task. The factors being considered were: Was it simple for the lecturer to produce the feedback? Were there any benefits for the lecturer in using audio feedback? Did the students find audio feedback as useful as written feedback? audio feedback was straightforward, once a workflow had been established. It was also possible to provide more feedback in a given amount of time using this method. Figure 1. shows the structure of the assignment for the first two studies. The students submit two 500 word drafts, before submitting a final 3000 word consultancy report. This allows them to make mistakes early on and learn from them prior to any summative work. It also allows them to develop a clear understanding of expectations and the quality required to achieve a good grade. It is important to note that the provision of the audio feedback was generated in real time and that the audio files provided to the students were not edited or produced in any way other than basic noise reduction and compression as part of the batch processing in Audacity. One student with profound hearing loss was given their feedback as a text file.
After receiving audio feedback for their first formative assignment, the students were asked whether they wanted the same approach to be used for their second formative submission or whether they would prefer text-based feedback. All of the forty students chose to receive audio files and felt that they were useful and appropriate; one and it that could be provided as text. At this stage, students were not asked for any other information.
After their second assignment, the students were asked two questions and also asked to provide further responses if they had any additional comments. The questions asked were: Was the audio feedback useful? All of the students felt that the feedback had been provided earlier than previous written feedback and that it was easier to understand, a typical student comment being we can tell what the tutor really likes by the tone in their voice when talking about a certain attribute . Students were not generally concerned that the recordings had been made in real time and contained pauses and additional noise, although one student reported that the file was very noisy and in this case, the file was sent again. These results are in line with findings from other institutions [12,13,14,15,16].
In the second study, 80 students and the independent tutors were asked to identify emotion and intent in audio feedback for two assignments and they were also invited to comment more generally on the delivery of the feedback.
Fifty-four students responded positively to the format of the feedback, of which 22 responded directly to the questions about emotion and intent. One student asked if they could be provided with text based feedback and two files had to be compressed again and resent to students as a result of noise generated in the batch conversion process.
The questions asked were: When you listen to the feedback, does my tone of voice help you with understanding what I mean? Would it be better if the feedback was written? Would it be better if I tried to keep my voice more formal? How would you describe my tone of voice? Do you think that feedback by voice allows you to understand more than text alone?

Responses indicated that
Students felt that the audio feedback contained more detail than written feedback. An informal tone of voice was the most appropriate. Receiving audio feedback provided a similar experience to receiving one-to-one physical feedback from the tutor. The tone of voice helped with understanding of the content. Audio files should not be too long, as it is more difficult to rewind to a section. The independent tutors felt that the feedback sounded consistently positive and supportive, and supported the idea of providing feedback in this way.
The third study used students from a different subject area, namely multimedia technology. Whilst the previous studies had involved formative feedback on written work, the third study used summative feedback on a YouTube video recording of an individual project.
Thirteen of the fourteen students surveyed felt that the tone of voice was important in understanding the feedback. All the students felt that audio feedback helped them understand more than text alone. Two students would have liked to receive additional text feedback. Students mostly preferred an informal voice to a more formal one, but two students felt that a more formal tone would have been appropriate. One student commented Comments also gives a feeling as if I am getting direct feedback from a It is interesting to note that the comments received were very similar to those of the second survey and that the nature of the comments were subject independent.
For the next stage of this work, we wish to explore the effect of using an artificially generated voice, perhaps with an avatar-based interface, for providing feedback. Issues here would include the ty with the voice and the extent to which appropriate emotions could be embodied in it.
A small pilot study was conducted with 10 students, who were given audio feedback provided via an artificial speaker. In order to create this effectively, the audio feedback was provided by the lecturer and transcribed before being played through a text to speech engine.
The students had all received audio feedback using the for an earlier piece coursework and had responded positively to its use. They were asked if the machine-generated audio feedback was as useful and whether it was preferable to written feedback.
The response was unanimous; they felt that the audio feedback via the text to speech engine was not as useful as that using nts asked if they could receive the feedback as text in preference to the text to speech engine.
The pilot study indicated that the emotion and sense of presence could only be provided by the voice of the lecturer and not by the artificial speaker. It is difficult to know, without further study, the role that expectation plays in student perception, as these students had become accustomed to receiving audio feedback from their tutor.
It is important to note that this was a qualitative study; we were not attempting to obtain statistical data based on a detailed questionnaire, but rather to tease out any EAI Endorsed Transactions on Future Intelligent Educational Environments 09 2014 | Volume 1 | Issue | e 5 insig as to the effectiveness of audio feedback. An example was the unanimous perception among the students that they had received feedback earlier when it was provided in audio form. This was not actually true, and the perception was probably due to the students being more ready to engage with the feedback in audio form than they had been when it was provided in text form. It appears that students often ignored or failed to remember text-based feedback, with the audio feedback had a greater impact on the students. Of course, this could be a short-term effect, due to the novelty of the method, but only time will tell.
Of course, there are always caveats. Students sometimes tell their tutors what they want to hear and this might have skewed the results.
Although our study was concerned with emotion in verbal feedback, the overall conclusion that students preferred a friendly, cheerful voice and felt that this was appropriate does not necessarily explore the potentially complex changes in emotional state that the student might be experiencing when listening to the feedback [10], or any deep understanding of how to leverage these for optimal motivation and engagement.

Discussion of Implications
Our studies show that the use of the recorded voice for feedback provides a richer experience for the recipient, as more information can be extracted from listening than is possible with the written word alone. The same words spoken with a positive, supportive tone of voice are more motivating than they would be if the recipient were reading them from a screen. However, this is a two-edged sword, as unconscious, negative nuances in the voice of the tutor might also be picked up on by the student. People are very good at tuning in to such subtleties, and this places an onus on the provider of feedback to try to avoid intonation that might demotivate the recipient. The other side of this coin is that the student will not be able to read the visual cues that are an important part of face-toface conversation, which makes the quality of the aural cues even more important. The recording of verbal feedback in real time does not allow the tutor as much hen providing written feedback and this might cause them to use their natural mode of speech, thereby revealing emotional content that they might otherwise have hidden in the interest of motivating the student. It is often said that one should emphasise the rather than picking out the faults, but this strategy could be undermined in the above circumstances.
Värlander [22] cannot be turned off automatically, and may last for days. In such situations, a learner may be unreceptive to emotional content whether in writing or verbal, can be taken as criticism of the individual rather than their work, and can arouse feelings of failure or inadequacy in the student that can persist for a long time. This emphasizes the need for care when presenting feedback. The problem can arise in written feedback, particularly when this is given in a terse style. For example, it is often noted that emails and text messages can unintentionally appear abrupt and sometimes offensive. However, with verbal feedback, the range of expressible emotions is much greater, as there is clearly more room for subtle, nuanced expression of emotion in this form in communication than in the written form. The very advantage of rapidly produced verbal feedback recordings, i.e. the impression for the student of a personal dialogue with their tutor, can also be a danger, as any perceived negative nuances will also be seen as coming directly from the tutor.
Another possible issue with recorded verbal feedback is that, when speaking, professionals will tend to use the common, shared language idioms and vocabulary of their profession. This is often the case even when they are discussing subjects not related to their discipline, as noted in the work on cognitive discourse analysis by Tenbrink et al [23]. With written feedback, tutors might moderate their language level, but with verbal feedback, they are more likely to speak in the manner that comes naturally to them. Of course, one of the things that the students are supposed to be learning is the language of their chosen field of study, so perhaps this is not always a bad thing. However, tutors operate across two domains and will be using not just language specific to their specialist subject areas but also that of education itself. Evidence from sources such as the National Student Survey suggests that students often struggle with education jargon and do not understand concepts such as feedback , reflective approaches , paradigms heuristics etc. It is therefore doubly important for tutors to use language appropriate to the It would clearly be desirable for virtual agents to be able to provide audio feedback. According to Ivanovic, [24] a lot of evidence has been gathered to suggest that virtual agents induce positive feelings in humans during interaction, if the agents are capable of displaying emotions. Our results indicated that with audio feedback the role of emotion was critical; however no students expressed a desire to hear a range of emotions.
Cafaro et al [25] interpersonal worked when one of the participants was a virtual agent that exhibited non-verbal cues. They found that it took an average of only 12.5 seconds for people to form an EAI Endorsed Transactions on Future Intelligent Educational Environments 09 2014 | Volume 1 | Issue | e 6 impression of the virtual agents; in other words, their natural reactions to the virtual agents were similar to those they would have exhibited when encountering another human. In the context of feedback, therefore, it would be important that the text-to-speech virtual avatar could accurately express the emotions implicit in the associated text (and, of course, that the latter was appropriate in terms of student motivation in the first place).
Although there has been a lot of research into creating avatars that can express human-like emotions, state of the art virtual agent systems still do not allow a wide range of emotions to be accurately expressed. For example, Lee et al [26] attempted to develop an avatar capable of conveying Ekman s six classic emotional states i.e. anger, disgust, fear, happiness, sadness and surprise, via facial features. Their avatar managed to accurately reproduce happiness and sadness, but had mixed results with the other four states. This emphasizes the difficulty with incorporating emotion into avatar-based systems.
However, our studies revealed a general consensus among our students that a cheerful, informal tone was preferred. This limited emotion would be easier to implement with a virtual agent than a system with a wide range of emotional expressions.
Even if this problem was solved, there would still be the linguistic problem of automatically and accurately interpreting the emotional content of written text, so that the avatar could respond appropriately.
Given that producing tutor-generated verbal feedback can be quick and effective (speaking the feedback does not take longer than typing it), it seems that such systems would not be appropriate for feedback provision, and indeed, one of the most positive features of verbal feedback for our students was the perception of personal contact with their tutor.
Another issue for a virtual agent would be generating the content of the feedback. In most cases this involves high-level cognitive activity on the part of the tutor, which is beyond the capabilities of current virtual agents. However, certain elements of assessment feedback do lend themselves to automation. For example, it is possible to automatically analyse documents for structure and general use of language, or to seek key words and phrases. It is also possible to automate assessment of documentation and style in computer programs submitted as assessment, to automatically test the functionality of such programs against predetermined test suites [27], or to use a model which may involve AI techniques to allow analysis of a structured response [28]. Assessment of some mathematics exercises can also be automated.
Kumar [29] considered the feasibility of automated tutors that could help students learn and considered two different purposes; those that assess and those that learn.
The important distinguishing feature is the provision of feedback. The feedback may be immediate, or demand feedback provided when the problem is solved. Kumar pointed out that if an answer is incorrect, then ideally the tutor can point out why it is incorrect and how this may be fixed. Where such examples are based on logic and rules, it is simpler to code.
It is possible to provide some more general feedback from rule-based systems, although this does require significant upfront work on the part of the tutors. For combinations of predetermined phrases can be generated in response to combinations of answers to multiple-choice questions but these are rather limited applications. Current virtual agent systems do not have the sophistication to produce generalised feedback in the manner of a human tutor. Furthermore, although feedback using such systems can be very fast, which is appreciated by students, the loss of the impression that the tutor is spending the time to engage with the work might reduce the impact of a virtual tutor. As we have found, students like to hear the familiar voice of their tutor; this makes the feedback feel more personal to them, and perhaps, therefore, would make them more likely to act on it. Programmed Learning approaches [30] traditionally use a linear approach and it would be possible to apply them in this context, but the feedback is often very limited in its scope, with the core concept being one of progress only when a response is correct.

Conclusions and Future Work
The provision of audio feedback seems to be valued by students for its timeliness and for its clarity in terms of meaning. Such feedback is viewed by students as more personal and immediate, and gives the impression that the lecturer is engaging with and interested in the students work. The method is also advantageous for the tutor as such feedback can be recorded quickly, without too much concern for production values. The intent is to provide personalised, supportive and informative content for the student, and not to produce broadcast-quality material. The caveat is that the tutor should maintain an empathetic, supportive tone throughout in order to engage the student.
It is important to gain an understanding of how this might translate to artificial voices in the virtual world. Our pilot study with the text to speech system revealed that, not only did students prefer feedback with voice (which might be expected) but they also preferred written feedback to the artificial voice. The text to speech system does not provide feedback more quickly, or save from which the voice is generated still has to be produced, so at this stage it seems there is little point in pursuing this method. might accrue if such a system could be implemented with a rule based approach using a virtual agent, to generate the feedback automatically, but this is currently only possible in a limited number of areas.
We have not explored the role that expectation plays in the response to feedback. If students were submitting to a virtual environment expecting automated feedback, they might respond very differently to tone and have no expectations of a personal approach. There was also an interesting suggestion from one student, that the recorded audio feedback is not only personal, but that it seems fair because every student is getting a similar share of the le t always feel that this was the case with face to face dialogue.
Another possible strategy enable students to obtain their own feedback by answering a series of questions from a virtual tutor. Each ques s answers to previous questions, thereby providing more personalised feedback and encouraging them to take a more reflective attitude to their work. Nicol and Macfarlane-Dick [5] considered elements internal to the student and how they are linked by paths of internal feedback, as shown in Figure 2, which is adapted from [5]. They do state that feedback might be provided from a range of sources including computer-generated feedback. The Virtual Mirror approach encourages students to reflect on their understanding of their own knowledge, goals and learning outcomes by facilitating articulation of these processes; it does not provide feedback on the students work. We are not suggesting modification of the model proposed in [5], but the deployment of this in the development of a reflective strategy.
Lei et al [31] explored the use of agents that collect the self-reflections of learners in simulation based e-learning. Although this was a text-based approach, it allowed the use of a simple natural language processing technology to provide a path through questions developed using a semantic network approach. The problems encountered included the use of slang and the conversation database was updated to take this into account.

Figure 2. Virtual Mirror (Adapted from [5])
A planned future extension to this work is to employ screen capture software, to produce video feedback in which scrolling through an essay or computer program is augmented with voiceover feedback.
Another planned extension arises from the observation that the students surveyed in all cases came from diverse cultural backgrounds. We intend to explore whether there are any differences in interpretation of the emotional cues by different cultures.