EAI Endorsed Transactions on Pervasive Health and Technology Research

We designed and evaluated a whack-a-mole (WAM) style game (see Figure 1) in a clinical randomized controlled trial (RCT) with reminder-assisted but self-initiated use over the period of a month with 43 participants from a post-lesion pool. While game play did not moderate rehabilitative progress indices of standard neuropsychological control tests, it did significantly improve in-game performance when compared to the control group. Its performance indicators and interaction data were highly accurate in predicting neglect and which hand the patients used for input. Patients found playing beneficial to their rehabilitation and attributed gains in the attention training properties of the game. The game showed potential for bedside assessment, insight support, and motivation by providing knowledge about rehabilitative progress. Received on 23 September 2016; accepted on 02 June 2017; published on 18 July 2017


Introduction
Increasing health-care costs and ageing populations will require patients to take more responsibility to improve and maintain their health [29]. Rehabilitation is costly and leaves time for motivated patients to train if they can carry out relevant activities unassisted, find them beneficial, and muster the initiative. Much research focuses on motor recovery, but there is a greater need to address cognitive training [43], and going from proof of concept to providing evidence of effectiveness of interventions, e.g. through randomized controlled trials (RCT). Strokes are the leading cause of severe disability [27], and many patients suffer from neglect with poor insight into their inability to attend to their left visual field.
Clinicians are urged to embrace evidence based practice (EBP) and adjust treatments according to the patients' condition and progress. EBP requires digital versions of standard tests based on paper and pencil [11,31], which are slow, expensive to administer (Jehkonen et al. 1998), and less precise. Additionally, digital tests can record valuable temporal * Corresponding author. Email: hk@create.aau.dk data to improve diagnostics and monitoring of chronic patients.
The trend in turning activities beneficial to patients' health into games or gamifying them has seen a large push to tap the intrinsic motivation that can help patients adhere to or increase their the required an intense and repetitive regimen while at the same time providing data for diagnostics and monitoring. Unlike standardized tests games can easily support varying degrees of difficulty as patients might be unwilling to complete tasks that they -due to cognitive, initiative, or motivational deficits -experience or deem too difficult [38]. Research needs to address rehabilitation games, which allow for: 1. long-term, self-initiated, and unsupported play by elastically adapting to a range of impairment degrees, 2. using performance parameters as indicators of patients condition and progress [13], 3. improving patients' insight into their own condition [19] and progress, 5. verification that interactions originate from the patients, and were carried out as prescribed.
To this end, we developed a game simple enough even for more severely affected patients but still worthwhile for the majority to play over a four week period in a randomized controlled trial, analyzed its interaction data and compared it to neuropsychological control measures. While playing WAM did not result in measurable gains in neuropsychological control measures its game performance data allowed for a solid classification of neglect patients. We further used touch interaction data to classify with high accuracy, which hand patients had used for input. Patients used WAM as a way to recognize progress and train their attention. Clinical staff found it a useful a tool for bedside assessment and as an independent measure that provided grounds for discussions with patients to improve their insight into their condition.

Background and related work
Unilateral spatial neglect (USN) is a disorder in which patients, despite functioning eyes, have difficulty attending to the left hand side of their visual field. Neglect follows right hemisphere stroke in the acute stage in about 50% of cases [8]. USN patients typically have poor insight into their own condition and exhibit poor coping strategies, e.g., they do not adapt a different head body orientation to counter their impairment vis-a-vis their environment as, for example, a patient with hemianopia might. Mattingley et al. showed that neglect patients could exhibit motor neglecta difficulty in initiating leftward movements towards  [26]. Proponents of visual restoration therapy (VRT) posit that by presenting visual stimuli at the border of the field of vision patients can increase their field of vision [34].

Neuropsychological measures
Methods for USN diagnosis include copying pictures with pen and paper [6], striking off each dot in a dotted letter, judging whether which of two bars (left/right) appears first [33], bisecting lines or cancellation tests.
The Line Bisection (LiBi) test requires participants to mark the middle of a series of horizontal lines [16,37]. The examiner clearly points out each end of each line. The test-retest reliability ranges from 0.84 to 0.93. Ferber used a cut-off criterion of 14% (2.6% ± 2 SD) relative displacement from the bisection center and identified 60% of documented inattention patients [12]. Control subjects had an average of 2.9mm deviations in Schenkenberg's version. Halligan et al used a three line version of the test with an average sensitivity across genders of 70% to detect unilateral attention deficits [16]. Halligan & Marshall found that the bias to bisect a line on the right hand from the center was reversed for lines shorter than 5cm [17]. The shortest line was 25mm with a bisection bias of around 4mm to the left of the center. We could not find any published work on how neglect might affect the performance in acquiring targets smaller than 25mm wide. To this end, we devised our own line bisection test with very short lines.
In the Line Crossing or cancellation (LiCcl) tests, participants should cross out 40 lines that are arranged in seven columns but appear randomly scattered on a sheet of paper. Two or more omissions in crossing out on the three left (18 lines) or right columns (18 lines) in a non-time constrained test indicate non-normality [42].
The letter cancellation task (LetCcl) from the Behavioural Inattention Test (BIT) battery contains five rows of 34 letters of which 40% are targets (E, R) to cancel out without a time limit [16]. In a control group of 50, non-impaired people (age range 33-40) omitted 2±2.0 targets and a cut-off score of 8 omissions correctly identified all left-sided lesion patients and 77% of rightsided lesion patients with documented inattention.
Poor performance in these three tests (LiBi, LiCcl, LetCcl) above their established cut-off scores on both left and the right hand side indicate a general attention deficit rather than USN.
The Catherine Bergego Scale (CBS) assesses degrees of neglect in ten daily life tasks in the personal, peripersonal, and extrapersonal space from both selfand observer reported (CBS obs ) ratings [5]. For example, it assesses whether a person exhibits no, mild, average or severe signs of neglect when shaving his face. The difference between ratings the patients' self-assess and those from clinical staff provide a measure of the patients' insight deficit into their condition (CBS id ).
The Symbol Digit Modality Test (SDMT) assesses the scanning and tracking aspect of attention similar to the ones at work in the Letter Cancellation and visual selective attention [40]. In the written SDMT used in this study participants have 90 seconds to fill in numbers on a page of 120 symbols according to the key found on the top of the page. The maximum score is 110, averaged normative scores of men and women in the age range 60-64 are around 50 with a standard deviation of 9.76 and a test-retest correlation in healthy adults of 0.80 [40]. Scores below 1.5SD from the normative suggest cerebral dysfunction [39].
The Functional Independence Measure (FIM) battery is a more general and common measure in rehabilitation that assesses the quality of activities of daily living on two sub-scales -motor (FIM M ) and cognitive (FIM C ) [14]. Their scores can range from 13-91 and 5-35 and they have a mean reliability of 0.97 and 0.93 respectively [28]. Unimpaired people should attain maximum scores.

Rehabilitation progress
Typical ways of quantifying rehabilitation impact use measures obtained at admission (AD), discharge (DC), and in some cases pre-morbid (PM) or maximum scores attainable (MAX). Rehabilitation impact indices include, absolute and relative gain, effectiveness (REs) and efficiency (REy) -see [10] for details. In this paper we use the revised Montebello Rehabilitation Factor Score (MRFSR) a measure of relative functional gain to measure gains in CBS, SDMT, and FIM.

Related work
Applications for (self-)rehabilitation need to be motivating, provide interactivity and progress to be used continuously [41]. Games are sought as a means to tap into the intrinsic motivation they provide to counter the repetitive and uninteresting nature of many rehabilitation activities [25]. HCI research has started to address the design, implementation and evaluation of bespoke games, e.g. for physical therapy [13], rehabilitation [15], enjoyment for children with motor disabilities [18]. Bespoke games for this audience often need to be simple [25], require simplified control schemes or accessibility features [18], and need to adapt to player performance that can vary over the course of a day [3]. Game challenge needs to adapt to player performance or risks becoming boring within the span of one session [1] or too hard and disengage patients as neuropsychological tests do [38]. However, the adaptation of challenge has to happen gently as patients may perceive steep challenge increases as aggressive behavior on the part of the game [7]. Maintaining player motivation for longterm rehabilitation requires the player to either find fun in the games or to recognize a beneficial effect from playing the game. Alankus et al. ran a six-week physical rehabilitation program with one participant using a motion-controlled game. The participant reported only playing for fun for the first half of the study, after which she started to recognize increasing performance in everyday life, greatly increasing motivation to continue as well as the desire to set and achieve personal goals [1]. Tests of sustained attention typically measure the ability to detect events [38]. Silverstein et al provided an extensive list of design goals for sustained attention tests, which need to: 1. be easy to administer, 2. be simple, usable even for most impaired patients, 3. not rely on perceptual organization, working memory, context or language processing, 4. vary exposure time to targets with ability, 5. have targets appear at random intervals without alerting players, 6. covering a large sensitivity range e.g. avoid ceiling and floor effects, replayable, length should cover a wide range and depend on current ability.
Balaam detailed design tensions between conflicting goals of enjoyable barrier-free game play and rehabilitation needs, i.e., making actions difficult enough to yield both training effects and motivation [3]. According to motivation theory [36] a feeling of growing competence is important for developing and sustaining intrinsic motivation.
Jamieson stressed the importance of communicating the need for assistive technology to people with acquired brain injury to motivate them to engage with it [19]. Helping patients gain insight into their condition is an important concern for neglect patients mirrored by the dedicated measure of the Catherine Bergego Scale.

Design considerations
The game presented in this paper followed the idea of visual restoration therapy [34] by repeatedly presenting visual stimuli at the border of the field of vision to increase it. The design was based on understood performance measures to quantify performance in cancellation tests (see Dalmaijer et al.'s overview [11]). We focused on performance measures that match with the conceptual design of WAM. Rorden  targets from e.g. a line cancellation test [35]. This spatial measure allowed for both lateral and near-far (targets towards the bottom are closer than those on the top) neglect discrimination. Rabufetti found that the temporal measure of inter-cancellation times -the time between one and the following cancellation -was higher for patients than for controls [31], which translated well into the game time divided by the number of moles hit in WAM. In an earlier study [21], we found that empirical parameters from Fitts' law (a, b) from modelling rapid touch interactions in games can help predict neglect. Chatterjee et al. used a spatial measure for performance from logistic regression that indicated where in the left-right continuum patients had a 50% chance of detecting a target obtained [9].
A number of other cancellation test measures that have been suggested such as Quality of search (Q), revisits of already cancelled targets, and best R do not apply in WAM due to their spatial nature. In comparison to cancellation tests only a few targets are visible in WAM at a any given time.

Design
We implemented a tablet-based game -Whack-a-mole (WAM), in which 6mm small targets (moles) appear and stay for three seconds before disappearing (expiring) and the player hits by tapping on them (see Figure 1b). When targets are present the center button is mostly white (see Figure 1b), otherwise green. To direct the player's gaze back to the center, a hit target -in a springing motion -flies back (see Figure 1a) to merge with the center and a center button tap spawns new targets. Initially, targets appear close to the center and successful hits increase the radius in discrete steps of 10% from the current maximum distance hit. Expired targets reduce the radius such that the game adjusts the challenge in each session individually and USN patients play close to the border of their visual field. Targets hit fast (within one second) increase the pitch of the feedback sound with no pitch ceiling for consecutive fast hits. But a single slow hit or expiry brings the pitch straight down to its starting value. After an initial calibration phase with 14 single targets on the right hand side, the game goes through different stages with single, sequential, multiple targets, and multiple targets along with distractors. Advancement through stages depends on game performance and in the case of the stage with distractors on a fixed time schedule. While WAM was mainly developed for USN patients it was built to support competitive play with no ceiling on the number of possible hits.
The game had been iteratively tested and evaluated with patients (both neglect and attention deficit disorder) and staff in individual sessions and a short pilot trial of 10 days [21]. We paid particular attention to both feedback of in-game actions and session performance and game challenge [7]. The tests informed design choices such as target expiry times, their placement, tactual recognition field size, and audio-visual stimuli during game-play, both for spawning and feedback when hitting them. We removed initial extant information such as in-game point counters so as not to cognitively overload severely impaired players.
In earlier versions of the game targets appeared at random intervals as suggested by Silverstein et al.'s design goals for attention tests. But this resulted in a rather discontinuous game-play and we decided to remove it to provide a better game flow.
The resident psychologist judged that WAM required and trained sustained attention. The occupational therapist advised to keep game play and time to review the results to within 10 minutes. After eight minutes, the game ended and depicted the number and spatial positions of both hits and misses (c.f. (see Figure 1c), and hit delays along the 10 axes in two summary screens ( Figure 1d). However, it did not provide an overview of the progress of the patient over time.

Study
We designed a field trial using the instruments in Table 1 to see whether self-decided playing of WAM: 1. was possible for a range of patient impairments, 2. usage could be predicted from the perceived fun, ease of use and benefit after an initial game play, 3. yielded performance indicators in line with the patients' recovery, 4. had clear outcome benefits for participants, and 5. whether hard and software choices were compatible with the constraints of a clinical setting.
We used a randomized controlled trial to test (6) whether playing WAM had an effect on the rehabilitation indices of measures of:

Data collection
See Table 1 for an overview of the data collection instruments which an occupational therapist administered at entry and exit to the trial and -patient stay duration permitting -during further bi-weekly periodical tests in between.
The demographic questionnaire collected important control variables whose fulfilment or higher values the literature has associated with decreases of patients' rehabilitation impact indices [22], which were our dependent variables. These control variables included: age, trial duration (LoT) and in the clinic (LoS), cognitive impairment (FIM C ), time delay from lesion onset to rehabilitation unit admission (admission delay), gender (female), and USN. It further recorded patients previous experience with mobile devices coded as no use (N), use (U) and use incl. games (G), handedness, family support, and lesion details.
Along with the diagnostic tests for neglect: Line Bisection, Letter Cancellation, and Line Cancellation, we used the following control tests to measure rehabilitation impact: SDMT, CBS (both observed and self-reported), cognitive (FIM C ) and motor (FIM M ) subscores of FIM with their dates. We used a three 20cm line staircase version of the Line Bisection test along with our own version that included five randomly positioned shorter lines from 10cm down to 6mm (in half steps). Tables 2 and 3 summarize the most important variables we obtained tallied by treatment and control groups as well as the different patient types: neglect, attention deficit, and all other.
We used video-recorded supervised app consisting of a an eight minute play-through to check for idiosyncrasies, e.g. which hand(s) and finger(s) the patients used. The app questionnaire focused on the perceived benefit, fun, and ease of use (experienced difficulty inversed) of WAM.
Logging all in-game interactions with timestamps and spatial coordinates on the devices allowed for applying neglect measures such as Center of Cancellation, performance comparisons (number of hits, misses, and expires) of left and right hand sides, and temporal modelling of interaction data with Fitts'law [21]. The clinical staff logged incidents on paper forms on the patients' desks when patients required help or had problems with the app or hardware. We obtained feedback both during and after the trial from the clinical staff on further observations of and comments from patients and how WAM worked for the clinical staff as an addition to supervised patient activities.

Participants
Patients at the clinic who volunteered to participate were excluded from the study if they could not: Table 1. Instruments for data collection. Measures marked with * were obtained at submission and discharge from clinic.

Instruments
Entry Periodical Exit during WAM use problem logs incident based clinic. staff interviews throughout trial (a) give informed consent, (b) complete a game session with a therapist's assistance due to poor eyesight or hearing, lack of arm-hand mobility, or cognitive capabilities, or (c) respond to the alarms set on the tablet.
Originally, 52 patients of a rehabilitation clinic volunteered to participate and were randomly assigned to either treatment or control group. Out of these, 42 yielded complete data sets; a move to a different facility was the most common reason for such drop-outs. Thirty-one men and eleven women (63 years old on average, SD: 14.8) completed the trial. Table 2 summarizes the participant profiles by control variables.
We relied on Jehkonen et al's test suite (line bisection, line cancellation and letter cancellation [20]) and their cut-offs from the literature for neglect classification. A positive outcome in one of the three test classified participants as having neglect. Four participants suffered from neglect (three in the treatment group).
Seven participants (3 in the treatment group) had above cut-off scores on both the left and right hand side of the paper tests and the health care professional in charge of conducting all tests (and co-author of this paper) classified them as attention deficit disorder cases. We refer to the patients by initials for neglect (N), attention deficit (A) and other participants (P) along with their participation number.
We found no significant differences between the control and treatment groups for the control variables in Table 2.

Procedure
An occupational therapist screened and recruited patients who had a week to deliberate and decide with their family whether to participate or not. During the roughly one hour enrollment (entry) apart from the instruments detailed in Table 1 the therapist instructed the patients in handling the tablet, responding to reminders (alarms scheduled on the tablet), unlocking the tablet screen, starting the game, logging in as themselves by tapping on a button with their name (a second button was labeled guest). On the tablet, the therapist set three alarms compatible with the patients rehab schedule.
Patients received a personal tablet (iPad2 or iPadAir in a protective shell) during their trial period and a patient's table was -if necessary -equipped with some anti-slip rubber sheet for the tablet to rest on while playing. The tablet contained WAM and a gamified Trail Making Test (TMT) [32] -a popular instrument to measure attention and executive functioning -that was stratified with different difficulty levels. We limit reporting in this paper to the WAM game.
Both periodical and exit tests followed a procedure similar to the entry test c.f. Table 1 for the taken measures.

Data preparation
To test whether playing WAM had an effect on the patients' rehabilitation impact indices we computed the rehabilitation index (MRFSR) for our control measures: SDMT, CBS obs , CBS id , FIM C , and FIM M . We relied on modeled FIM scores from a linear regression to account for gains during non-trial times since the FIM scores were only available at admission and at discharge from the clinic. In the absence of pre-morbid test scores we used the maximum scores possible for FIM C (35), FIM M (91), and CBS (30, inverted). For the maximum score for SDMT we relied on the age specific normative test scores plus three times the normative standard deviation (both from [23]) for each participant. In absence of prior knowledge we assumed linear recovery regarding the patients' attention ability as measured through SDMT tests. To reduce noise in the SDMT scores (test-retest reliability is 0.8 in healthy adults) we used linear regression modeled scores from the obtained SDMT scores (entry, periodical(s), exit) wherever periodical test results were available.
In Table 3, line bisection scores are the average absolute deviation from the middle of the lines: zero percent being the middle and 100% being either end of the line. The letter and line cancellation scores are computed as the imbalance of omissions on either half of the page; the percentage points of omissions, relative to the entire page, on one half of the page subtracted from the percentage points of the other half. For example, if on the left hand side 4 out of 10 and on the right 6 out of 10 targets were omitted then the scores would be 6/20-4/20=10%.
From the logged data we extracted the following parameters on a per game basis both for targets on the left and right hand side of the screen: Averages of hit delays, number of hits and expiry counts, x-and y target 6 EAI Endorsed Transactions on Pervasive Health and Technology 07 2017 -07 2017 | Volume 3 | Issue 11 | e1 touch offsets between a target hit coordinates and the center of the target, and Fitts law model variables (a and b) of all moles distances and their corresponding hit delays. To obtain a center of hit measure (CoH) we summed up the average x and y distance of hits from the center in millimeters (left negative, right positive). A binary logistic regressions provided the point on the xaxis where the participants had a 50% chance of missing a target. We used this x value and the R 2 square fit of the model in the subsequent analysis.

Results
After having experienced the apps for the first time, the participants in the treatment group found WAM easy to play (mean five-point Likert score 4.0 inversed from difficulty, SD=1.3), fun (3.6, SD=1.3) and beneficial for their rehabilitation (3.6, SD=1.3). The initial opinions were similar for patients with and without neglect. All patients' initial assessments of WAM's difficulty was negatively correlated to a moderate degree with the number of hits scored in WAM (r s =0.52).
At exit participants in the treatment group found WAM easier to play (4.3), similar in fun (3.5) but less beneficial (2.8, SD=1.4) than at entry. In comparison to the treatment group, the control group found WAM at entry easier to play (4.2), similarly fun, but less beneficial (3.3). The control group's opinions of WAM remained similar at exit in terms of fun and easy to play, and with a smaller reduction in perceived benefit (3.0).
The most common problems for which participants required assistance during the trial related not to playing the game but charging the tablet, waking it from sleep and disabling the set alarms, which annoyed roommates. Some patients felt stressed by the alarms, since they were not able to act on the alarm or could sometimes not remember its purpose -to remind them to initiate their self-training. A few participants needed support up to the first five times to start up the tablet, open and start the game.

I never thought I could figure it out. But after having been shown it 3-4 times I could even start the game up and play when I had time -and I actually ended up finding it pretty fun -I asked my children for an iPad for Christmas. (P37)
All but one patient (P30, f, 71yrs) managed to play WAM by themselves. Neither P30 nor P21 could perform the SDMT test at their entry test but P21 had no problems playing WAM. Usage occurred generally between 8:00 and 21:00, mostly during the morning (9-11), early afternoon (15)(16) and early evening (18)(19) throughout the week (7.3min per day) with a reduction over the weekends (4.2min). Some of them went home over the weekend and could take along the tablet but we did not track who did.
On average, the treatment participants played WAM for 5.6 minutes per day (SD 1.8). Usage varied hugely between participants (see Figures 2 and ?? for an overview). We used Jenks natural breaks optimization [30] to classify their per day use in minutes into four (no, low, medium, high) levels of use: 1. up to 0.6 -one participant (P30 in Figure 2 In other words the medium use group played a little more than one game and the high use group two games per day. With the exception of P21, the neglect and attention deficit patients, unfortunately, played much less than the other patients. The neglect patients played a median of 0.23 games per day and the the attention deficit 0.29 -a fraction of the use of the other patients (1.4).
We tested for potential novelty effects and their wear offs. Due to varying participation duration in the trial we compared the first nine days after enrolment with the subsequent nine days in the trial to investigate  possible novelty effects. We found no reductions in average daily usage after the beginning of the trial but a few participants who had been playing clearly stopped after a while (P20, 25,12,35).
Usage of WAM was positively correlated to a moderate degree with numerical values of the perceived fun (r=0.43, N=23) and benefit (r=0.48) reported at entry to the trial. The more fun the participants reported and the more they deemed to benefit the more they would later play. The perceived ease of use did not correlate with playtime. Treatment group participants without neglect whose perceived benefit from playing WAM dropped significantly during the course of the trial did not play less than those whose perceived benefit remained the same or improved. One participant (P37) reported having played only out of obligation to the trial and not due to any perceived or assumed health benefits.
Participants derived fun from quick successions of hits that were possible in multi-target stages and competing with themselves and others. It is fun when you go faster and faster (P2). There is some competition to the game. It is fun (P4). Monotonous, but good that you have to be quick (N1). Becoming faster in the game became a goal in itself for some. It is good to practice getting quicker (P24). While for some participants the difficulty level was just right: I have played this the most, have an easier time remembering how to play it (AD20). The more able participants found that the degree of challenge could have been higher. The game is easy to get started with -though . . . I might have wanted more challenges -it got harder, but also got a little boring over time (P32). The participants associated difficulty with the number of targets and distractors on the screen and not the speed at which they could hit these as long as it was within the expiration time. Was fun after a while . . . the difficulty ramps up too slowly (P43), There could be more dots onscreen, it could be more difficult (P51).
The result screen was instrumental in judging performance and progress. Many patients found it sufficient to compare the outcome of a game session with their current remembered best score to gauge their improvements. I like to compete with myself . . . I can try to reach more hits next time (P2).
A couple of participants desired more support to compare game performance with historical scores as a manifestation of progress and as motivation. I lack being able to see my progress from game to game. It would be a carrot for me (P52). Another one inquired about normative data on performance and recovery. The result screen is good, but you need an explanation of what you are working towards and how you have been doing so far (P27).
One participant discovered differences depending on the time of day by comparing his attained scores to remembered ones. The results screen is nice. You can see whether you are quicker at certain times of day. I am most fit before noon (P52). A mixed linear effect analysis tested whether the number of hit targets was moderated by time of day. The model that included . For this analysis we removed data from times of day that had too few instances (between 10pm and 7am) and each participant contributed their hit count average at each time of day. Figure 4 provides an overview of patient's performance by time of day. In the morning hit performance was slightly better, around noon and early afternoon roughly equal to, and in the afternoon and early evening slightly below average. After eight o' clock performance improved but with large deviations between the few patients who contributed data during that time window.
For rehabilitation impact, we used linear regressions including age, gender, length of stay, admission delay, having neglect, and cognitive impairment (FIM C ) as 8 EAI  predictors of the rehabilitation impact index (MRFSR) of FIM C , FIM M , SDMT, CBS id , and CBS obs . We found no significant effects of WAM play time on any of these. The rehabilitation impact index scores of FIM M were moderated by age, having neglect, and length of stay, FIM C and CBS obs by length of stay. Higher values of these predictors were associated with reductions of their rehabilitation indices as known from the literature. Patients who were older, those who had neglect, and those who stayed longer in the clinic had lower rehabilitation indices.
We found a moderate positive correlation (r=0.68, N=40) between number of hit targets in WAM and the participants' SDMT scores on the day of enrolment. However, there were significant gains in WAM hitting performance that were not mirrored by SDMT gains at the end of the trial and the correlation between number of hits and the SDMT scores diminished (r=0.44, N=40).
We found a significant difference in the change of hitting performance over time between the treatment and control group -the treatment group participants improved their performance by 90 additional targets on average by the end of the trial -more than the control group (33 additional targets) t(30.9)=2.7, p<0.05 -indicating a training effect. Some patients attributed their performance gains in WAM to improved concentration from playing WAM. It is concentration that is trained (P9). I think it helps me a lot. It helps you think faster (P18). In this context one participant specifically valued the game not requiring reading. Being able to practice concentration, without being able to read and write, which can often be an obstacle (P43).
Given the importance of number of hit targets in tracking progress we tested whether patients improvement in hit targets (hits at exit minus hits at entry to study) correlated with the perceived benefit reported at the exit of the trial in the treatment group. We found only a very weak correlation between these two (r s =0.12, N=23).

Figure 5. ROC curve of the neglect classifier
The clinical staff liked that the patients were able to engage in meaningful activities that did not require their supervision. The occupational therapist found WAM results a useful point of reference in discussions with the patients. Specifically, patients could not blame poor performance on other exogenous factors, e.g. in a mundane situation where they might explain away their inability to attend to important events by blaming distracting factors. The game by designed provided no additional stimuli. Therefore, the game made for and was understood by the patients as a reliable and objective measure. After the end of the trial, the therapist in charge continued using WAM in her dayto-day work with patients and three participants asked for the game to continue playing after discharge.
We ran a cross-validation on the per session data of WAM to measure how well game play performance predicted neglect. From all WAM sessions in which participants had hit at least ten targets after the calibration phase we selected randomly half as the training data set for a stepwise binary logistic regression. Three parameters were significant predictors of the binary outcome variable: CoH x in mm from the centre of the screen, p<0.01, Fitts a left (a L ), p<0.001, and Fitts' b left (b L ), p<0.01. On the test set these predictors (see Eq. 2) yielded a classification accuracy of 98.3% and 98.2% on the training set. LiCcl and in LetCcl had an above cut-off score only on the right side. N1 recovered during the trial, which was mirrored by improvements in CoH x . Two of N21's four misclassified sessions appeared to be outliers in terms of performance in CoH x and hits and the therapist had seen his wife helping on occasion, who might have tried the game under his login.
Another way of spotting input anomalies was through lateral touch bias. While right-handed input had a consistent rightward bias left-handed entry showed a leftward bias. Most patients with left-side lesions had to resort to using their left hand for input. Figure 6 shows the distribution of touch bias from the mole centers With a simple classification based on a WAM session's average touch x-bias (bi x ) being positive (right hand) or negative (left) we were able to correctly classify 95.1% of the 707 sessions' input hands. A ten-fold cross-validation using the x-bias as a predictor in a logistic regression classified on average 95.5% of sessions correctly. We excluded data from an ambidextrous participant P44 from this set.
During our initial testing we had observed a large number of 'unintentional' touch events, which were too far away from targets and typically clustered on the ipsi-lateral side of the input hand (c.f. Figure 8). We attributed these entries to the patients' hand making contact with the touch screen. We included a sharp sounding notification as feedback for players when touch input did not result in a hit. One patient pointed out that he found it difficult to keep his hand hovering above the tablet and that Using a pen helps as to avoid unintentionally touching the screen (P43).
We conducted a follow-up analysis on unintentional touch input -further than 20mm away from any current target's center. Figure 8 shows an overview of the spatial distribution of unintentional touches aggregated per session (the x, y positions represent the averages of all unintentional touches in that session) and Figure 9 the corresponding density distribution. We can see that the majority of Adding the average unintentional touch position (uit x ) of each session improved the prediction of input hand. The 10-fold cross validation predicted input hand with 96.3% accuracy on average. The logistic regression

Discussion
We found no evidence that playing WAM in the self-administered amounts observed in the study had measurable effects on FIM M , FIM C , CBS obs , CBS id or SDMT. However, at 5.6 minutes on average per day -a fraction of the patients' supervised rehabilitative efforts (around 4 hours per day) -we should not expect to find measurable effects. Especially when the length of stay at the clinic, which correlated directly with supervised rehabilitation, had no significant effect on SDMT or CBS id either. Nevertheless, 5.6 minutes training per day yielded statistically significant performance gains for the treatment compared to the control group.
Regardless of having neglect or not the patients found playing WAM initially beneficial, easy to use and fun but WAM usage was not related to gender, age, and mobile use, or perceived benefit, ease of use, and fun at entry into the trial. Whether the drop in perceived benefit from entry to exit in the treatment group was due to the game becoming easier to use and less challenging (c.f. [4]), disappointment in experienced vs. expected training gains, or less benefit from the game at a later stage in rehabilitation remained unclear. Responding to the alarms appropriately and to use them to start selftraining had to be learned and in some cases assisted several times both verbally and in more difficult cases physically. Future studies and interventions need to budget for these activities.
Participants found playing WAM helpful for their ability to concentrate and the statistical analysis of game performance that compared the treatment to the control group confirmed that playing WAM Figure 9. Distribution density of unintentional touch x-position session averages from Figure 8 yielded training gains. While a resident psychologist had assessed and attested WAM requiring sustained attention we had not explicitly designed the game for attention training. The absence of significant effects of playing WAM on SDMT could be due to the fact that the scanning and tracking attention measured by the SDMT test was too dissimilar to the attention required for better performance in and trained by WAM.
The clinical staff welcomed this form of self-initiated and administered rehabilitation. However, they had hoped patients would make more use of the app and keep to the suggested three sessions per day for an overall involvement of 30 minutes daily (including retrieving the tablet and reviewing the results). But given differences in interest and what people find motiving we should not assume that WAM will be a good fit for all people to self-rehabilitate [4] especially in its current state. While little research has focused on user needs in rehabilitative games, the field of personalized informatics and quantified-self has identified user needs in terms of the following questions that people who collect data about themselves seek to answer [24] and might equally apply to games like WAM: Patients and clinical staff used WAM scores and the result screens depicting hits and misses as a way to assess the current status. For better motivation and documentation of progress participants sought access to their performance history, which matching findings in other rehab applications [2,7,41]. In a shorter ten day pilot trial [21] with four participants (two neglect and two attention deficit patients) this concern had not emerged. Some patients competed with themselves and some of them with one another and used the scores to this end by comparing current with recently achieved scores. WAM did not provide goals e.g. through showing normative data from healthy or rehab patients of scores or typical improvements in scores over time. Our patient used the scores to compete with themselves and others as goals related to the game. Due to the absence of explicit goals WAM could not show discrepancies between goals and the patients current status either and therefore the patients could not reflect on these. While WAM did not collect nor provide contextual information about scores some 11 EAI Endorsed Transactions on Pervasive Health and Technology 07 2017 -07 2017 | Volume 3 | Issue 11 | e1 patients did figure out that e.g. time of day had an effect on their ability to play.
While our quantitative analysis did not provide evidence of WAM improving insight measures (CBC id ) the therapist found WAM useful for bedside assessment and as a neutral reference point providing tool to illustrate the patient their weaknesses to improve their insight. During the trial we did not require patients to train their weaker side. But the accuracy of inferring input hand from WAM interaction data from touch bias and unintentional touches with very high accuracy (>96%) was of particular interest to the therapist who is often confronted with or worried about non-compliance in exercises targeting the patient's weaker side and valued such support for monitoring compliance in future versions.
We found that the WAM performance indicators Center of Hit and Fitts' law's b components had high predictive accuracy for neglect classification. But the model was not sensitive enough to detect the mild neglect case of N16, who did not test positively on LiBi, LiCcl, and LetCcl either. While earlier research found that interaction with this type of rapid touch interaction games yielded Fitts law data that allowed comparisons with healthy people we found that in many cases sessions yielded negative Fitts b values, meaning that hitting targets further away took (after an initial reaction time) less time than targets closer by indicating a longer initial response time. Still, this model component was a significant predictor of neglect as it summarizes the temporal performance of hitting targets along with Fitts' a component.

Limitations
Unlike typical randomized controlled trials our study could not control the exposure to the game posing a threat to internal validity due to self-selection bias. Controlling exposure, however, would go against the study's aim of investigating rehabilitative gains in self-initiated self-rehabilitation and the current setup allowed for disambiguating WAM training gains from rehabilitative gains.
Controlling for activities the control group engaged in would have been helpful but was beyond the study's budget. Running the study on the clinical side already required roughly two months more of clinical staff time than what had been planned for. This was due to more time required for signing up participants (e.g. relatives and or patients repeatedly wanted to know more information), retrieving the patients for enrolment, periodical, and exit tests, and helping with initial problems or changing schedules.

Future Work
As discussed WAM needs provide more support in terms of providing and visualizing historical data, providing goals through normative scores or improvements over time of similar patients. Rather than having a fixed time limit, the game should adapt the length based on the user's ability to concentrate. Given the large fluctuations in game performance future work needs to address how to measure concentration and the effort the patients put into a game, build multi-dimensional outlier detection, and better tune the challenge in an elastic way for each participant depending on, e.g. time of day (c.f. [4]) and specifically for severely impaired and high performers. This was especially important for sessions in which participants hit only very few targets, which increased the likelihood for a misclassification.

Conclusion
We turned an understood neuropsychological measuring concept (Center of Hit) in neglect quantification into a game, which was simple enough to be played by all but one participant, showed a higher sensitivity range than, e.g. the SDMT test, and allowed patients to realize and become aware of performance gains. Playing the game did not result in measurable gains in SDMT, CBS or FIM measures but the patients ascribed in game performance improvements to concentration training gains from the game. Playing the game did not harm the participants; nor did the physical setup conflict with the clinical routine life apart from the auditory alarms. We found potential for WAM and similar solutions for bedside assessment, insight support, and providing knowledge and documentation of performance and its progress as motivation for rehabilitation activities.