Usability of serious games for the training of people with dementia

Dementia is a progressive syndrome affecting executive and motor functions. Serious gaming (SG) is an emerging treatment. However, its added benefit is difficult to establish since standardized usability evaluations are missing. We apply a recently developed observer-rated scale to determine the usability of two SG scenarios for people with dementia (PwD). Raters watched video recordings of a SG (MobiAssist) played by PwD and a virtual city through which healthy older adults walked. Raters completed the scale for both data sets and gave a prospective rating of VR city used by PwD. Usability was highest for MobiAssist, intermediate for VR city by healthy older persons, and lowest for the prospective rating of VR city used by PwD. The difference between the highest and the lowest data set was statistically significant with moderate magnitude but seems not substantial enough to exclude PwD from cognitively demanding training environments (e.g. VR city).


Introduction
Dementia is a chronic and progressive syndrome that affects executive functions including memory, comprehension, planning or decision making, as well as motor functions such as balance, gait or flexibility (1,2). One of the early symptoms of dementia is spatial disorientation: People with dementia (PwD) become first lost in unfamiliar places, then in more familiar ones, such as their home, and finally become completely unaware of time and place (3)(4)(5)(6). During the past decades, the worldwide number of PwD has increased steadily and reached 47 million in 2015 (2). In 2030, around 82 million people will be diagnosed with dementia (7). This number is projected to more than triple to over 150 million people by 2050 (7). Dementia is the fifth leading cause of all global deaths following ischaemic heart disease, stroke, chronic obtrusive pulmonary disease and lower respiratory infections (8). The global costs of dementia are expected to increase from 818 billion US dollars in 2015 to more than 1 trillion US Dollars in 2030 and therefore, account for an equally high economic burden as cardiovascular diseases (9,10).
Traditional treatment of dementia consists of prescription drugs in combination with cognitive and/or physical training. Drug treatments were found to have only a small but significant effect on cognitive function (11). Cognitive training was found to have no or only moderate effects on memory (12,13). Physical activity interventions seem to be beneficial for the improvement of activities of daily living in all stages of dementia, however studies recommend caution in interpreting these findings (14,15).
In the age of digitalization, an emerging treatment alternative in the field of dementia is training by computerized games. There are two approaches known as "gamification" and "serious games". Both strive to reach serious goals, such as educating, motivating, training, improving health or persuading users to change their behavior patterns rather than emphasizing only fun or competition (16,17). Whereas gamification refers to "the addition of game elements to non-game contexts" (16) such as scores, rewards and quests, serious games use "gaming as central and primary medium" (16). Serious games integrate "technology to combine three components which are EAI Endorsed Transactions on Serious Games 07 2019 -08 2022 | Volume 6 | Issue 1 | e1 multimedia, entertainment, and experience" (17). The common definition of serious games is "games that do not have entertainment, enjoyment, or fun as their primary purpose" (18). Serious games include, for example, cognitive and/or physical activity components and comprise computer games, training simulation and sports or board games (17).
The present work deals with the second approach, i.e., with serious games. A range of studies administered serious games to PwD, but success was limited: a systematic review of literature revealed no added benefit of serious games compared to traditional intervention (19). One possible reason for this outcome is that most serious games were not specifically designed for PwD, and therefore possibly overtaxed the limited cognitive abilities of these persons.
To verify and eventually overcome the above concern, it is highly desirable to assess the usability of serious games specifically for PwD. Usability is defined as the "extent to which a system, product or service can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use." (20). In the past, usability of serious games administered to PwD has only been assessed through informal questioning and qualitative observations, not through systematic and standardized instruments (19), possibly due to many PwD being unable to fill out self-rating questionnaires such as the System Usability Scale (SUS) (21), the Post-Study System Usability Questionnaire (PSSUQ) (22), the Generic User Interface Questionnaire (QUIS) (23) or the Software Usability Measurement Inventory (SUMI) (24).
To overcome this problem, we recently developed an observer-rated usability scale, called Usability Evaluation Scale for Serious Games for PwD (USeG; (25)). The new scale is rooted in the International Organization for Standardization (ISO) standard 9241-11 with its categories effectiveness, efficiency and satisfaction (20). ISO defines effectiveness as the "accuracy and completeness with which users achieve specified goals", efficiency as the "resources used in relation to the results achieved", and satisfaction as the "person's perceptions and responses that result from the use of a system, product or service" (20). The scale was developed as follows: a search for usability-related terms in the literature was conducted followed by a ranking of expressions according to ISO 9241-11, and statistical analysis including e.g. discriminant function analysis (DFA) (25). "[The] literature search yielded 105 expressions that, in the authors' view, refer to usability. Raters ranked those expressions with respect to the three ISO categories" (25). The expressions differentiated significantly between categories according to DFA and thus make USeG a valid instrument (25). In the final version of USeG, each of those categories comprises five items with the highest discriminant power. An example for the category Effectiveness is the item "performance of the player", for Efficiency the item "reliability of the game", and for Satisfaction the item "enjoyment and fulfillment". Observers can evaluate each item on a Likert scale from 1 (does not fit at all) to 7 (fits perfectly). USeG yields a score for each category, but also a Global score based on all 15 items.
The purpose of the present study was to apply USeG for assessing the usability of two serious games available at our university. One is called MobiAssist (26,27) and was specifically designed for cognitive-motor training of PwD. The other is a virtual urban environment called "VR city", and was designed for spatial navigation training of healthy older adults. To assess the usability of both serious games, we videotaped participants while they engaged in either MobiAssist or VR city. We passed those registrations on to dementia experts, and asked them to use USeG and rate (1) the usability of MobiAssist for PwD, (2) the usability of VR city for healthy older persons and (3) the prospective usability of VR city for PwD. We expected that usability scores will be high for PwD in MobiAssist, as it was specifically designed for that person group. We further expected that usability scores will be low for PwD in VR city, as navigation through a virtual environment will exceed the cognitive capacity of those persons. In fact, we deliberately refrained from introducing PwD to VR city, out of concern that they may become highly frustrated, which can lead to verbal and physical aggression, and as a consequence may compromise their safety on the treadmill (28).

Participants
The MobiAssist group was recruited among the residents of a daycare facility ("Stiftung Diakoniestation Kreuztal"). Inclusion criteria were (1) diagnosed Alzheimer's disease (AD) or vascular dementia, (2) capable of giving consent, (3) capable to independently get up from a chair and walk for six meters, (4) need for care. Exclusion criteria were (1) any other form of dementia (e.g. frontotemporal dementia), and (2) acute diseases (e.g. infections or fever). Group size was limited to five in order to ensure a well-controlled, supportive atmosphere. Before testing began, the project was explained to the facility management, the possible study participants and their family caregivers. Written informed consents were handed out to the study participants and their family caregivers. Study participants signed the informed consents, and family caregivers gave consent in a personal phone call. The VR city group consisted of 20 healthy older persons enrolled in an orientation-training program. Data registration for the present purposes took part during their first encounter with VR city. Inclusion criteria were (1) community-living (2) no need for care, and (3) healthy by self-report. Exclusion criteria were (1) diagnosed dementia, (2) balance impairments, (3) use of walking aids, (4) brain Usability of serious games for the training of people with dementia 3 surgery, (5) plaster bandage on arms or legs, and (6) unable to walk for 45 minutes without pain, shortness of breath and/or abnormal increase in heart rate. Before testing began, the research project was explained and written informed consents were signed by the study participants.
The two projects as well as our present top-up study were all pre-approved by the ethics commission of the German Sport University.

Screening Instruments
All participants completed a demographics questionnaire (age, gender and level of education). Participants from the MobiAssist group were additionally screened by the Global Deterioration Scale (GDS; (6)) which classifies cognition into seven levels, and by Demenz-Detektions-Test (DemTect; (29)), which classifies cognition into three levels.

MobiAssist
The MobiAssist system runs on a computer connected to a TV screen. A camera (Kinect® 1; Microsoft) on top of the screen detects the movements of a player standing in front of the screen, and transmits them in real-time to the computer. This signal is used to display, on the TV screen, an avatar which mimics the player's body movements. Body-and-avatar movements are the main means of interaction between player and computer. In addition, a remote control with different-colored buttons is used to navigate through the menu (e.g. to choose between different games and levels) and to enter responses in cognitive games where no avatar is available.
MobiAssist includes three types of games, coordination games (e.g. picking apples from a tree), cognition games (e.g. calculation tasks) and creative games (e.g. singing and dancing). Difficulty levels can be adjusted to the specific abilities of a given PwD. A reward system is incorporated to keep PwD motivated. It presents in-game cheering sounds, as well as post-game winner tunes, trophies, motivational slogans and game statistics. In the apple picking game, an avatar stands in front of an apple tree. As apples grow and their color changes from green to red, the apples are ready to be picked. The player in front of the screen has to lift either their right or left arm and with this guide the avatar's right or left arm to one of the apples on the tree. After holding the position for a second, the apple is transferred from the tree to the hand of the avatar. Then, the apple must be placed in one of the baskets that stand next to the right and left leg of the avatar. The player must then take down their arm to either their right or left side of their body. In higher levels, some apples turn gold instead of red and thus are more valuable or stay for only a limited amount of time on the tree before they fall down. A picture of the apple picking game can be found in Figure 1.
In this particular research project, games from each category were included in every 45-minute session, scheduled twice a week for four weeks. During each session, participants sat in a semicircle around the TV screen. When playing coordination or creative games, one of the participants stepped in front of the Kinect camera and became the active player, while the other participants cheered and gave hints. When playing cognition games, one of the participants was given the remote control; that person had to enter the response once all participants have discussed and agreed upon one. The "active player" role rotated between players after each game.
During the game, the supervisor periodically encouraged some of the less engaged participants to take over the active player role by emphasizing the fun associated with the role. This encouragement was enhanced by statements of the other participants, such as "You can do it! We are going to help you. Let us do this together!". Furthermore, the supervisor assisted with navigating through the menu and handling the remote control, read out loud the quiz questions to bring structure to the session, and explained or demonstrated movements of coordination games where required. In effect, none of the PwD refused to be an active player.

VR City
Participants walked at their preferred speed on a nonmotorized treadmill. They could hold on to a handrail to their left and right, and were secured with an upper-body harness attached to the ceiling. A virtual city was rendered on three screens in front of the treadmill by custom-designed software. It consisted of numerous streets with a large number of nondescript as well as distinctive buildings, stationary objects such as bus stops and trees, and moving objects such as cars and pedestrians. Forward progress EAI Endorsed Transactions on Serious Games 07 2019 -08 2022 | Volume 6 | Issue 1 | e1 through the city was controlled by treadmill movement, and the direction of progress by two buttons fastened to the left and right handrail. Pushing the left button rotated the pointof-view leftwards, and pushing the right button rotated it rightwards.
Participants had to walk through the city and find four destinations: a bakery, a supermarket, a tall red building and a green meadow. Pictures of the VR city can be found in Figure 2 and 3. They were given one ten-minute trial, followed by two trials of five minutes each. A supervisor assisted with stepping on and off the treadmill, instructed participants before each trial, but did not tell them which direction to take at any intersection of any trial (i.e., free exploration). Each participant was tested individually rather than in a group. Neither the software nor the supervisor provided feedback or encouragement to participants.

Data Registration and Analysis
MobiAssist participants were given eight group sessions (a total of 360 minutes) and each VR city participant was given one session (a total of 500 minutes). All sessions were recorded in full length by video camera, and the footage was saved on an external disc immediately after a session has ended. The camera was placed at the back of the room so that the TV screens were fully captured and participants were recorded from an "over the shoulder"-perspective, without their faces being registered.
The complete video footage was viewed by five raters (four female and one male) of 21 to 50 years of age. The raters are defined as so-called double experts in the field of gerontology and dementia due to their holistic knowledge base in those areas, many years of experience working with this target group and "strong familiarity about the domainunder-study" (30).
Having five raters watch the video footage is more than sufficient to ensure internal consistency: Cronbach's alpha "does not require two administrations of the scale, or two or more raters" (31). It therefore is not surprising that other studies used even less than five raters, e.g., three (32).
Raters watched the video footage independently from each other. They were instructed to watch at least the first, middle and last five minutes of each session, so that they could get a holistic impression of how the participants interacted with the systems. They first watched the MobiAssist recordings and then completed the USeG (see Introduction), keeping in mind the participants they just observed. Raters subsequently watched the VR city recordings and then completed the USeG, keeping in mind the participants they just observed. Finally, they completed the USeG once more, this time envisaging PwD as participants of VR city. We thus yielded three data sets, one pertaining to the usability of MobiAssist for PwD, one to the usability of VR city for healthy older persons and one to the prospective usability of VR city for PwD.
Statistical analyses were run in SPSS Statistics (33). Internal consistency of ratings was quantified as Cronbach's alpha, separately for each data set and USeG category (i.e., Effectivity, Efficiency, Satisfaction and Global). Cronbach's alpha should range between about 0.7 and 0.9; substantially lower values would reflect inconsistency between items, while higher values would indicate unnecessary redundancy (31). Ratings for each category were submitted to separate analyses of variance (ANOVA) with repeated measures on the factor 'data set' (level 1: MobiAssist with PwD; level 2: VR city with healthy older subjects; level 3: VR city with PwD). Significant effects were explored by Tukey's HSD.

5
The roadmap of the study can be found in Figure 4.

Results
Characteristics of study participants are reported in Table 1.
No dropouts occurred in either project. Mean education level of VR city participants was higher than that of MobiAssist participants. Screening of the latter group yielded cognition levels representing "moderate impairments" (GDS; mean level 4.4 (±0.55)) and "potential dementia" (DemTect; mean score 5.2 (±1.48) = level 3), respectively.
Cronbach's alpha values are presented in Table 2. They were acceptable or close-to-acceptable for Global usability scores, but were often low or even negative for individual usability categories.
Mean usability scores are presented in Figure 5. Global as well as component scores were consistently higher for MobiAssist than for VR city, and in the latter environment consistently higher for healthy older adults than (prospectively) for PwD.

January -February
Screening of VR city study participants

April -August 2018
Rating of video footage by the experts

September -October 2018
Data analysis

November 2018
Usability of serious games for the training of people with dementia EAI Endorsed Transactions on Serious Games 07 2019 -08 2022 | Volume 6 | Issue 1 | e1 Figure 5. USeG scores. Bars represent across-rater means, and error indicators represent between-rater standard deviations

Discussion
The present study compared the usability of two serious game environments for people with dementia measured by a recently developed usability scale for observer ratings of serious games (USeG). Expert raters used this scale to provide three data sets: (1) usability of MobiAssist for PwD,  (34,35), which have indeed been reported for USeG (25), o subjectivity in the scoring of items (36), and o availability of only five items per category (34,35).
Given the low consistency of usability categories, the subsequent discussion will focus on Global scores.
Global usability scores were highest for MobiAssist, intermediate for VR city with healthy older participants and lowest for the prospective rating of VR city used by PwD. However, only the difference between the former and the latter data set reached statistical significance. We attribute the high scores for MobiAssist to the fact that it was specifically designed as a serious game with playful elements, and with levels of difficulty that can be adjusted to players' individual abilities. We attribute the low scores for VR city prospective for PwD to the fact that gamification elements were only available in the form of one quest (finding destinations in the VR city) and adaptability was absent, and to the raters considering the task to be challenging for the prospective population.
When we conceived the present study, we were concerned that VR city might overtax the cognitive abilities of PwD and we therefore refrained from physically testing such persons (see Introduction). Our data suggest that we were overly cautious: although the difference between MobiAssist and VR city for PwD was statistically significant, it amounted to only 1.44 points on a 7-point scale (Introduction and Table 3). This difference seems not dramatic enough to preclude future testing of PwD in VR city. Indeed, cognitive decline in the early stages of dementia can be difficult to notice outside standardized neuropsychological tests (6)), and might therefore interfere little with VR city use. As a practical consequence, persons diagnosed with mild or moderate dementia should not be denied access to cognitively demanding training environments such as VR city. Unfortunately, only few studies have been conducted in the field of dementia and VR navigation training. However, one study examined the feasibility of a VR outdoor park for PwD (37). PwD had to walk through the park and perform functional activities, such as finding a post box. The researchers found that the VR environment was an "appropriate medium for assessing functional behavior" (37) and that PwD did not face any adverse events, such as simulator sickness or physical discomfort (37). These findings seem promising and support our suggestion to conduct VR navigation studies with PwD in the future. Another study examined "age-and AD-related differences in route learning and memory using VR" (38). They found that people with AD "made more mistakes on the recognition task in particular, being more likely to mistakenly affirm having seen an element in the city when it was in fact a foil" and suggest that VR applications "may 7 help place the science of neuropsychology on firmer scientific grounds in terms of its validity to real world function and dysfunction" (38). Another study investigating spatial navigation strategies among people with AD found that the "preference for egocentric over allocentric strategy increased with AD severity" (39). These studies together with the findings of our study represent the first starting points for future studies when implementing VR scenarios for the training of spatial navigation and route learning strategies in PwD.
Additionally, we recommend improvements of both computerized environments in order to increase their usability for PwD. The serious game MobiAssist could, for example, improve the system stability to avoid system crashes that interrupted game play periodically, expand the variety of quiz game questions and reduce the difficulty gap between levels of the games. The usability of VR city with its gamification approach could be improved by integrating more game elements such as a reward system with encouraging text lines (e.g. "Good job" or "Wonderful"), verbal feedback, winner music sequences, virtual gold medals and a brighter or more colorful game design.
One limitation of the present study is that usability was rated from video footage. Raters therefore could not fully appreciate participants' engagement in the task, i.e., their "flow". A second limitation is that raters had to rely on their past experience to judge the usability of VR city for PwD. This could have influenced the raters' evaluations and biased the scores. Nonetheless, we believe that their prospective ratings for PwD in the VR city are of excellent quality and trustworthy as the raters were true experts in the field.

Conclusion
Global usability was significantly lower for the prospective rating of VR city used by PwD than it was for MobiAssist, but the difference was not substantial enough to exclude PwD from future training in VR city or in other serious games and gamification environments of comparable complexity.