Self-rehabilitation of acquired brain injury patients including neglect and attention deficit disorder with a tablet game in a clinical setting

We designed and evaluated a whack-a-mole (WAM) style game (see Figure 1) in a clinical randomized controlled trial (RCT) with reminder-assisted but self-initiated use over the period of a month with 43 participants from a post-lesion pool. While game play did not moderate rehabilitative progress indices of standard neuropsychological control tests, it did signiﬁcantly improve in-game performance when compared to the control group. Its performance indicators and interaction data were highly accurate in predicting neglect and which hand the patients used for input. Patients found playing beneﬁcial to their rehabilitation and attributed gains in the attention training properties of the game. The game showed potential for bedside assessment, insight support, and motivation by providing knowledge about rehabilitative progress.


Introduction
Increasing health-care costs and ageing populations will require patients to take more responsibility to improve and maintain their health [31]. Rehabilitation is costly and leaves time for motivated patients to train if they can carry out relevant activities unassisted, find them beneficial, and muster the initiative. Much research focuses on motor recovery, but there is a greater need to address cognitive training [44], and going from proof of concept to providing evidence of effectiveness of interventions, e.g. through randomized controlled trials (RCT). Strokes are the leading cause of severe disability [29], and many patients suffer from neglect with poor insight into their inability to attend to or slow reactions towards stimuli in their left visual field.
Health care professionals (HCP) are urged to embrace evidence based practice (EBP) and adjust treatments according to the patients' condition and progress. EBP * Corresponding author. Email: hk@create.aau.dk requires digital versions of standard tests based on paper and pencil [11,33], which are slow, expensive to administer (Jehkonen et al. 1998), and less precise. Additionally, digital tests can record valuable temporal data to improve diagnostics and monitoring of chronic patients.
The trend in turning activities beneficial to patients' health into games or gamifying them has seen a large push to tap the intrinsic motivation that can help patients adhere to or increase their required regimen while at the same time providing data for diagnostics and monitoring. Unlike standardized tests, games can easily support varying degrees of difficulty as patients might be unwilling to complete tasks that they experience or deem too difficult due to cognitive, initiative, or motivational deficits [39]. Research needs to address rehabilitation games that allow for: 2. using performance parameters as indicators of patients condition and progress [13], 3. improving patients' insight into their own condition [20] and progress, 4. providing benefits in non-game scenarios, while 5. verifying that interactions originate from patients, and are carried out as prescribed.
To this end, we developed a game simple enough, even for severely affected patients but still worthwhile for the majority to play over a four week period in a randomized controlled trial, analyzed its interaction data and compared it to neuropsychological control measures. While playing WAM did not result in measurable gains in neuropsychological control measures, its game performance data allowed for a solid classification of neglect patients. Further, we used touch interaction data to classify with high accuracy, which hand patients had used for input. Patients used WAM as a way to recognize progress and train their attention. Clinical staff found it a useful tool for bedside assessment and as an independent measure that provided grounds for discussions with patients to improve their insight into their condition.

Background and related work
Unilateral spatial neglect (USN) is a disorder in which patients, despite functioning eyes, have difficulty attending to the left hand side of their visual field. Neglect follows right hemisphere stroke in the acute stage in about 50% of cases [8]. USN patients typically have poor insight into their own condition and exhibit poor coping strategies, e.g., they do not adapt a different head body orientation to counter their impairment vis-a-vis their environment as, for example, a patient with hemianopia might. Mattingley et al. showed that neglect patients could exhibit motor neglecta difficulty in initiating leftward movements towards targets on the left side of their visual field [28].

Neuropsychological measures
Methods for USN diagnosis include copying pictures with pen and paper [6], striking off each dot in a dotted letter, judging whether which of two bars (left/right) appears first [35], bisecting lines or cancellation tests.
The Line Bisection (LiBi) test requires participants to mark the middle of a series of horizontal lines [16,38]. The examiner clearly points out each end of each line. The test-retest reliability ranges from 0.84 to 0.93. Ferber used a cut-off criterion of 14% (2.6% ± 2 SD) relative displacement from the bisection center and identified 60% of documented attention deficit patients [12]. Control subjects had an average of 2.9mm deviations in Schenkenberg's version. Halligan et al. used a three line version of the test with an average sensitivity across genders of 70% to detect unilateral attention deficits [16]. Halligan & Marshall found that the bias to bisect a line on the right hand from the center was reversed for lines shorter than 5cm [17]. The shortest line was 25mm with a bisection bias of around 4mm to the left of the center. We could not find any published work on how neglect might affect the performance in acquiring targets smaller than 25mm wide. To this end, we devised our own line bisection test with very short lines.
In the Line Crossing or cancellation (LiCcl) tests, participants should cross out 40 lines that are arranged in seven columns but appear randomly scattered on a sheet of paper. Two or more omissions in crossing out on the three left (18 lines) or right columns (18 lines) in a non-time constrained test indicate nonnormality [43]. The letter cancellation task (LetCcl) from the Behavioural Inattention Test (BIT) battery contains five rows of 34 letters of which 40% are targets (E, R) to cancel out without a time limit [16]. In a control group of 50, non-impaired people (age range 33-40) omitted 2±2.0 targets and a cut-off score of 8 omissions correctly identified all left-sided lesion patients and 77% of right-sided lesion patients with documented inattention.
Poor performance in these three tests (LiBi, LiCcl, LetCcl) above their established cut-off scores on both left and the right hand side indicate a general attention deficit rather than USN.
The Catherine Bergego Scale (CBS) assesses degrees of neglect in ten daily life tasks in the personal, 2 EAI Endorsed Transactions on Pervasive Health and Technology 07 2017 -07 2017 | Volume 3 | Issue 11 | e1 peripersonal, and extrapersonal space from both selfand observer reported (CBS obs ) ratings [5]. For example, it assesses whether a person exhibits no, mild, average or severe signs of neglect when shaving his face. The difference between ratings the patients' self-assess and those from clinical staff provide a measure of the patients' insight deficit into their condition (CBS id ).
The Symbol Digit Modality Test (SDMT) assesses the scanning and tracking aspect of attention similar to the ones at work in the Letter Cancellation and visual selective attention [41]. In the written SDMT used in this study participants have 90 seconds to fill in numbers on a page of 120 symbols according to the key found on the top of the page. The maximum score is 110, averaged normative scores of men and women in the age range 60-64 are around 50 with a standard deviation of 9.76 and a test-retest correlation in healthy adults of 0.80 [41]. Scores below 1.5SD from the normative suggest cerebral dysfunction [40].
The Functional Independence Measure (FIM) battery is a more general and common measure in rehabilitation that assesses the level of independence in activities of daily living on two sub-scales -motor (FIM M ) and cognitive (FIM C ) [14]. Their scores can range from 13-91 (FIM M ) and 5-35 (FIM C ) and they have a mean reliability of 0.97 and 0.93 respectively [30]. Unimpaired people should attain maximum scores.

Rehabilitation progress
Typical ways of quantifying rehabilitation impact use measures obtained at admission (AD), discharge (DC), and in some cases pre-morbid (PM) or maximum scores attainable (MAX). Rehabilitation impact indices include, absolute and relative gain, effectiveness (REs) and efficiency (REy) -see [10] for details. In this paper we use the revised Montebello Rehabilitation Factor Score (MRFSR) a measure of relative functional gain to measure gains in CBS, SDMT, and FIM.

Related work
Applications for (self-)rehabilitation need to be motivating, provide interactivity and progress to be used continuously [42]. Games are sought as a means to tap into the intrinsic motivation they provide to counter the repetitive and uninteresting nature of many rehabilitation activities [27]. HCI research has started to address the design, implementation and evaluation of bespoke games, e.g. for physical therapy [13], rehabilitation [15], enjoyment for children with motor disabilities [19]. Bespoke games for this audience often need to be simple [27], require simplified control schemes or accessibility features [19], and need to adapt to player performance that can vary over the course of a day [3]. Game challenge needs to adapt to player performance. Otherwise it risks becoming boring within the span of one session [1] or too hard and disengage patients as neuropsychological tests do [39]. However, games need to adapt challenge gently as patients can interpret abrupt changes in challenge as aggressive behaviour towards them [7] as they might perceive the system as a person or actor e.g. as a pushy therapist. Maintaining player motivation for long-term rehabilitation requires the player to either find fun in the games or to recognize a beneficial effect from playing the game. Alankus et al. ran a six-week physical rehabilitation program with one participant using a motion-controlled game. The participant reported only playing for fun for the first half of the study, after which she started to recognize increasing capabilities in everyday life, greatly increasing motivation to continue as well as the desire to set and achieve personal goals [1]. Tests of sustained attention typically measure the ability to detect events [39]. Silverstein et al. provided an extensive list of design goals for sustained attention tests, which need to: 1. be easy to administer, 2. be simple, usable even for most impaired patients, 3. not rely on perceptual organization, working memory, context or language processing, 4. vary exposure time to targets with ability, 5. have targets appear at random intervals without alerting players, 6. covering a large sensitivity range e.g. avoid ceiling and floor effects, replayable, length should cover a wide range and depend on current ability.
Balaam detailed design tensions between conflicting goals of enjoyable barrier-free game play and rehabilitation needs, i.e., making actions difficult enough to yield both training effects and motivation [3]. According to motivation theory [37] a feeling of growing competence is important for developing and sustaining intrinsic motivation.
Jamieson stressed the importance of communicating the need for assistive technology to people with acquired brain injury to motivate them to engage with it [20]. Gaining insight into their condition is an important concern for neglect patients mirrored by the dedicated insight measure of the Catherine Bergego Scale. targets from e.g. a line cancellation test [36]. This spatial measure allowed for both lateral and near-far (targets towards the bottom are closer than those on the top) neglect discrimination. Rabufetti found that the temporal measure of inter-cancellation times -the time between one and the following cancellation -was higher for patients than for controls [33], which translated well into the game time divided by the number of targets hit in WAM. In an earlier study [22], we found that empirical parameters from Fitts' law (a, b) from modelling rapid touch interactions in games can help predict neglect. Chatterjee et al. used a spatial measure for performance from logistic regression that indicated where in the left-right continuum patients had a 50% chance of detecting a target obtained [9].

Design considerations
Other cancellation test measures such as Quality of search (Q), revisits of already cancelled targets, and best R were not compatible with the design of WAM due to their spatial nature. In comparison to cancellation tests, only a few targets were visible in WAM at a any given time.

Design
We implemented a tablet-based game -Whack-a-mole (WAM), in which 6mm small targets (moles) appeared and stayed for three seconds before disappearing (expiring) and the player hit by tapping on them (see Figure 1b). When targets were present the center button was mostly white (see Figure 1b), otherwise green. To direct the player's gaze back to the center, a hit target -in a springing motion -flew back (see Figure 1a) to merge with the center and a center button tap spawned new targets. Initially, targets appeared close to the center and successful hits increased the radius in discrete steps of 10% from the current maximum distance hit. Expired targets reduced the radius such that the game adjusted the challenge in each session individually. This was based on the assumption that targets further away from the body midline on the neglected side would be more challenging than those closer to the center. Targets hit fast (within one second) increased the pitch of the feedback sound with no pitch ceiling for consecutive fast hits. But a single slow hit or expiry brought the pitch straight down to its starting value. After an initial calibration phase with 14 single targets on the right-hand side, the game went through different stages with single, sequential, multiple targets, and multiple targets along with distractors. Advancement through stages depended on game performance and in the case of the stage with distractors on a fixed time schedule. While WAM was mainly developed for USN patients it was built to support competitive play with no ceiling on the number of possible hits.
The game had been iteratively tested and evaluated with patients (both neglect and attention deficit disorder) and staff in individual sessions and a short pilot trial of 10 days [22]. We paid particular attention to both feedback of in-game actions and session performance and game challenge [7]. The tests informed design choices such as target expiry times, their placement, tactual recognition field size, and audio-visual stimuli during game-play, both for spawning and feedback when hitting them. We removed initial extant information such as in-game point counters so as not to cognitively overload severely impaired players.
In earlier versions of the game targets appeared at random intervals as suggested by Silverstein et al.'s design goals for attention tests. But this resulted in a rather discontinuous game-play and we removed it to provide a better game flow. The resident psychologist judged that WAM required and trained sustained attention nevertheless.
The occupational therapist advised to keep game play and time to review the results to within 10 minutes. After eight minutes, the game ended and depicted the number and spatial positions of both hits and misses (c.f. (see Figure 1c), and hit delays along the 10 axes in two summary screens ( Figure 1d) to allow patients to gain insight into their shortcomings and progressthrough remembering previous high scores. However, it did not directly provide an overview of score progress over time.

Study
We designed a field trial using the instruments in Table 1 to see whether self-decided playing of WAM: 1. was possible for a range of patient impairments, 2. usage could be predicted from the perceived fun, ease of use and benefit after an initial game play, 2. an attentional measure (SDMT) to test whether WAM in-game performance improvements due to training transferred to this related but untrained measure, and 3. more general motor and cognitive skills (FIM).

Data collection
See Table 1 for an overview of the data collection instruments which an occupational therapist administered at entry and exit to the trial and -patient stay duration permitting -during further bi-weekly periodical tests in between. The demographic questionnaire collected important control variables whose fulfillment or higher values the literature has associated with decreases of patients' rehabilitation impact indices [24], which were our dependent variables. These control variables included: age, trial duration (LoT) and in the clinic (LoS), cognitive impairment (FIM C ), time delay from lesion onset to rehabilitation unit admission (admission delay), gender (female), and USN. It further recorded patients previous experience with mobile devices coded as no use (N), use (U) and use incl. games (G), handedness, family support, and lesion details.
Along with the diagnostic tests for neglect: Line Bisection, Letter Cancellation, and Line Cancellation, we used the following control tests to measure rehabilitation impact: SDMT, CBS (both observed and self-reported), cognitive (FIM C ) and motor (FIM M ) subscores of FIM with their dates. We used a three 20cm line staircase version of the Line Bisection test along with our own version that included five randomly positioned shorter lines from 10cm down to 6mm (in half steps). Tables 2 and 3 summarize the most important variables we obtained tallied by test and control groups as well as the different patient types: neglect, attention deficit, and all other.
We used video-recorded supervised app consisting of a an eight minute play-through to check for idiosyncrasies, e.g. which hand(s) and finger(s) the patients used. The app questionnaire focused on the perceived benefit, fun, and ease of use (experienced difficulty inversed) of WAM.
Logging all in-game interactions with time stamps and spatial coordinates on the devices allowed for applying neglect measures such as Center of Cancellation, performance comparisons (number of hits, misses, and expires) of left and right hand sides, and temporal modelling of interaction data with Fitts' law [22]. The clinical staff logged incidents on paper forms on the patients' desks when patients required help or had problems with the app or hardware. We obtained feedback both during and after the trial from the clinical staff on further observations of and comments from patients and how WAM worked for Table 1. Instruments for data collection. Measures marked with * were obtained at admission and discharge from clinic.

Instruments
Entry Periodical Exit during WAM use problem logs incident based clinic. staff interviews throughout trial the clinical staff as an addition to supervised patient activities.

Participants
Patients at the clinic who volunteered to participate were excluded from the study if they could not: (a) give informed consent, (b) complete a game session with a therapist's assistance due to poor eyesight or hearing, lack of arm-hand mobility, or cognitive capabilities, or (c) respond to the alarms set on the tablet.
Originally, 52 patients of a rehabilitation clinic volunteered to participate and were randomly assigned to either test or control group. Out of these, 42 yielded complete data sets; a move to a different facility was the most common reason for such drop-outs. Thirtyone men and eleven women (63 years old on average, SD: 14.8) completed the trial. Table 2 summarizes the  participant profiles by control variables and Table 3 by their neuropsychological test scores at entry.
We relied on Jehkonen et al.'s test suite (line bisection, line cancellation and letter cancellation [21]) and their cut-offs from the literature for neglect classification. A positive outcome in one of the three test classified participants as having neglect. Four participants suffered from neglect (three in the test group). Eight participants (3 in the test group) had above cut-off scores on both the left and right hand side of the paper tests. The health care professional in charge of conducting all tests (and co-author of this paper) classified them as attention deficit disorder cases. We refer to these patient groups by names and their members through initials along with their participation number: neglect (N), attention deficit (A) and other participants (P). 5 EAI Endorsed Transactions on Pervasive Health and Technology 07 2017 -07 2017 | Volume 3 | Issue 11 | e1  We found no significant differences between the control and test groups for the control variables in Table 2.

Procedure
An occupational therapist screened and recruited patients who had a week to deliberate and decide with their family whether to participate or not.
During the roughly one hour enrollment (entry) apart from the instruments detailed in Table 1 the therapist instructed the patients in handling the tablet, responding to reminders (alarms scheduled on the tablet), unlocking the tablet screen, starting the game, logging in as themselves by tapping on a button with their name (a second button was labeled guest). On the tablet, the therapist set three alarms compatible with the patients rehab schedule.
Patients received a personal tablet (iPad2 or iPadAir in a protective shell) during their trial period and a patient's table was -if necessary -equipped with some anti-slip rubber sheet for the tablet to rest on while playing. The tablet contained WAM and a gamified Trail Making Test (TMT) [34] -a popular instrument to measure attention and executive functioning -that was stratified with different difficulty levels. We limit reporting in this paper to the WAM game.
Both periodical and exit tests followed a procedure similar to the entry test c.f. Table 1 for the taken measures.

Data preparation
To test whether playing WAM had an effect on the patients' rehabilitation impact indices we computed the rehabilitation index (MRFSR) for our control measures: SDMT, CBS obs , CBS id , FIM C , and FIM M . We relied on modeled FIM scores from a linear regression to account for gains during non-trial times since the FIM scores were only available at admission and at discharge from the clinic. In the absence of pre-morbid test scores we used the maximum scores possible for FIM C (35), FIM M (91), and CBS (30, inverted). For the maximum score for SDMT we relied on the age specific normative test scores plus three times the normative standard deviation (both from [25]) for each participant. In absence of prior knowledge we assumed linear recovery regarding the patients' attention ability as measured through SDMT tests. To reduce noise in the SDMT scores (test-retest reliability is 0.8 in healthy adults) we used linear regression modeled scores from the obtained SDMT scores (entry, periodical(s), exit) wherever periodical test results were available.
In Table 3 on the right 6 out of 10 targets were omitted then the scores would be 6/20-4/20=10%.
From the logged data we extracted the following parameters on a per game basis both for targets on the left and right-hand side of the screen: Averages of hit delays, number of hits and expiry counts, x-and y target touch offsets between a target hit coordinates and the center of the target, and Fitts law model variables (a and b) of all moles distances and their corresponding hit delays. To obtain a center of hit measure (CoH) we summed up the average x and y distance of hits from the center in millimeters (left negative, right positive). A binary logistic regressions provided the point on the xaxis where the participants had a 50% chance of missing a target. We used this x value and the R 2 square fit of the model in the subsequent analysis.

Results
All but one patient (P30, f, 71yrs) were able to play WAM by themselves. Neither P30 nor P21 could perform the SDMT test at their entry test but P21 had no problems playing WAM. The most common problems for which participants required assistance during the trial related not to playing the game but charging the tablet, waking it from sleep and disabling the set alarms, which annoyed roommates. Some patients felt stressed by the alarms, since they were not able to act on the alarm or could sometimes not remember its purpose -to remind them to initiate their self-training.

Impressions at entry and exit
After having experienced the apps for the first time, the participants in the test group found WAM easy to play (mean five-point Likert score 3.9 inversed from difficulty, SD=1.3), fun (3.6, SD=1.2) and beneficial for their rehabilitation (3.9, SD=1.3) -see Figure 2 for an overview. All patients' initial assessments of WAM's difficulty was negatively correlated to a moderate degree with the number of hits scored in WAM (r s =0.52) during the entry session.
At exit, participants in the test group found WAM easier to play (4.3), similar in fun (3.5) but less beneficial (2.8, SD=1.4) than at entry. In comparison to the test group, the control group found WAM at entry slightly easier to play (4.2), similarly fun (3.5), but less beneficial (3.7). The control group's opinions of WAM remained similar at exit compared to entry in terms of fun, but found it easier to play (4.4) and a small reduction in perceived benefit (3.0).
Compared to other patients the initial opinions in terms of perceived benefit and difficulty were similar for neglect and attention deficit patients they found the game less fun at entry -see Figure 2. The above results constitute the revised and corrected values reported earlier in [23].

Usage
On average, the test participants played WAM for 5.6 minutes per day (SD 1.8). Usage occurred generally between 8:00 and 21:00, mostly during the morning (9-11), early afternoon (15)(16) and early evening (18)(19) throughout the week (7.3 minutes per day) with a reduction over the weekends (4.2min). Some of them went home over the weekend and could take along the tablet but we did not track who did. Usage varied hugely between participants (see Figures 3 and 5 for an overview). We used Jenks natural breaks optimization [32] to classify their per day use in minutes into four (no, low, medium, high) levels of use: 1. up to 0.6 -one participant (P30 in Figure 3 In other words the medium use group played a little more than one game and the high use group two games per day. With the exception of P21, the neglect and attention deficit patients, unfortunately, played much less than the other patients. The neglect patients played a median of 0.23 games per day and the the attention deficit 0.29 -a fraction of the use of the other patients (1.4). The mean median hit count during a game of neglect (93) and attention deficit (115) patients was half that of the other patients (228) (see Figure 4).
We tested for potential novelty effects and their wear offs. Due to varying participation duration in the trial we compared the first nine days after enrolment with the subsequent nine days in the trial to investigate possible novelty effects. We found no reductions in average daily usage after the beginning of the trial but a few participants who had been playing clearly stopped after a while (P20, 25, 12, 35).

Usage motivation
Usage averages of WAM of the test group were positively correlated to a weak degree with the numerical values of the perceived fun (r s =0.36, N=23) and benefit (r s =0.33) reported at entry to the trial. The more fun and benefit the participants judged to derive at entry the more they played during their  trial. The perceived difficulty did not correlate with playtime for the whole test group but negatively to a weak degree for the neglect and deficit patients ((r s =-0.39, N=5). Test group participants without neglectwhose perceived benefit from playing WAM dropped significantly during the course of the trial -did not play less than those whose perceived benefit remained  the same or improved. One participant (P37) reported having played only out of obligation to the trial and not due to any perceived or assumed health benefits.
Participants derived fun from quick successions of hits that were possible in multi-target stages and competing with themselves and others. It is fun when you go faster and faster (P2). There is some competition to the game. It is fun (P4). Monotonous, but good that you have to be quick (N1). Becoming faster in the game became a goal in itself for some. It is good to practice getting quicker (P24). For some participants the difficulty level was just right: I have played this the most, have an easier time remembering how to play it (AD20). But the more able participants found that the degree of challenge could have been higher. The game is easy  to get started with -though . . . I might have wanted more challenges -it got harder, but also got a little boring over time (P32). The participants associated difficulty with the number of targets and distractors on the screen and not the speed at which they could hit these as long as it was within the expiration time. Was fun after a while . . . the difficulty ramps up too slowly (P43), There could be more dots on-screen, it could be more difficult (P51).
The result screen was instrumental in judging performance and progress. Many patients found it sufficient to compare the outcome of a game session with their current remembered best score to gauge their improvements. I like to compete with myself . . . I can try to reach more hits next time (P2). A couple of participants desired more support to compare game performance with historical scores as a manifestation of progress and as motivation. I lack being able to see my progress from game to game. It would be a carrot for me (P52). Another one inquired about normative data on performance and recovery. The result screen is good, but you need an explanation of what you are working towards and how you have been doing so far (P27).

Factor affecting performance
One participant discovered differences depending on the time of day by comparing her attained scores to remembered ones. The results screen is nice. You can see whether you are quicker at certain times of day. I am most fit before noon (P52). A linear mixed effect analysis with participants as random effects showed that hour of day (from 8:00 to 21:00) affected the number of hit targets χ 2 (1) = 5.26, p = .022, lowering them by 1.9 ± 0.82 (standard errors) per hour of day. For this analysis we removed data from times of day that had too few instances (between 10pm and 7am) and each participant contributed their hit count average at each time of day. Figure 6 provides an overview of patient's performance by time of day. In the morning hit performance was slightly better, around noon and early afternoon roughly equal to, and in the afternoon and early evening slightly below average. After eight o' clock performance improved but with large deviations between the few patients who contributed data during that time window. neglect, and length of stay, FIM C and CBS obs by length of stay. Higher values of these predictors were associated with reductions of their rehabilitation indices as known from the literature. Patients who were older, those who had neglect, and those who stayed longer in the clinic had lower rehabilitation indices.
We found a moderate positive correlation (r=0.68, N=40) between number of hit targets in WAM and the participants' SDMT scores at entry. However, there were significant gains in WAM hitting performance not mirrored by SDMT gains at the end of the trial and the correlation between number of hits and the SDMT scores diminished (r=0.44, N=40).
We found a significant difference in the change of hitting performance over time between the test and control group -the test group participants improved their performance by 90 additional targets on average by the end of the trial -more than the control group (33 additional targets) t(30.9)=2.7, p<0.05 -indicating a training effect.
Some patients attributed their performance gains in WAM to improved concentration from playing WAM. It is concentration that is trained (P9). I think it helps me a lot. It helps you think faster (P18). In this context one participant specifically valued the game not requiring reading. Being able to practice concentration, without being able to read and write, which can often be an obstacle (P43). Given the importance of number of hit targets in tracking progress, we tested whether patients improvement in hit targets (hits at exit minus hits at entry to study) correlated with the perceived benefit reported at the exit of the trial in the test group. We found only a very weak correlation between these two (r s =0.12, N=23).

WAM for neglect prediction
To measure how well game play performance predicted neglect, we ran a cross-validation on the per session data of WAM. From all WAM sessions in which participants had hit at least ten targets after the calibration phase, we selected randomly half as the training data set for a stepwise binary logistic regression. Three parameters were significant predictors of the binary outcome variable: CoH x in mm from the centre of the screen, p<0.01, and based on targets hit on the left (L) side of the screen Fitts a (a L ), p<0.001, and Fitts' b (b L ), p<0.01. On the test set these predictors (see Eq. 2) yielded a classification accuracy of 98.3% and 98.2% on the training set. Figure 7 depicts the corresponding ROC curve yielding a 0.939 area under the curve (auc). Misclassified sessions are depicted in red for each parameter in Figure 5. Misclassifications were mostly (11/14) false negatives of sessions from USN patients. For example, all three of N16 and 3/5 of N1's sessions were misclassified. N16 had not tested positive for neglect in either LiBi nor LiCcl and in LetCcl had an above cut-off score only on the right side. N1 recovered during the trial, which was mirrored by improvements in CoH x . Two of N21's four misclassified sessions appeared to be outliers in terms of performance in CoH x and hits and the therapist had seen his wife helping on occasion, who might have tried out the game under his login.

WAM measures for input hand adherence
Another way of spotting input anomalies was through lateral touch bias. While right-handed input had a consistent rightward bias left-handed entry showed a leftward bias. Most patients with left-side lesions had to resort to using their left hand for input. Figures 8  and 9 show the distribution of touch bias from the mole centers.
With a simple classification based on a WAM session's average touch x-bias (bi x ) being positive (right hand) or negative (left) we were able to correctly classify 95.1% of the 707 sessions' input hands. A ten-fold crossvalidation using the x-bias as a predictor in a logistic regression classified on average 95.5% of sessions correctly. We excluded data from an ambidextrous participant P44 from this set.
During our initial testing we had observed a large number of 'unintentional' touch events, which were too far away from targets and typically clustered on the ipsi-lateral side of the input hand (c.f. Figure 10). We attributed these entries to the patients' hand making contact with the touch screen. We had included a sharp sounding notification as feedback for players when touch input did not result in a hit. One patient found it difficult to keep his hand hovering above the tablet and that Using a pen helps as to avoid unintentionally touching the screen (P43). We conducted a follow-up analysis on unintentional touch inputfurther than 20mm away from any current target's center. Figure 10 shows an overview of the spatial 10 EAI   distribution of unintentional touches aggregated per session (the x, y positions represent the averages of all unintentional touches in that session) and Figure 11 the corresponding density distribution. We can see that the majority of unintentional touches happened on the lower half of the screen and roughly lay on the diagonal from their respective lower corners of the screen to slightly beyond the center.
Adding the average unintentional touch position (uit x ) of each session further improved the prediction of input hand. A 10-fold cross validation predicted input hand with 96.3% accuracy on average. The logistic regression found x-bias (p<0.001), unintentional touch x-position (p<0.01), and their interaction (p<0.001) as significant predictors of input hand (see Equation 3).

WAM for patient HCP interactions
The clinical staff valued that the patients were able to engage in meaningful activities that did not require their supervision. The occupational therapist found WAM results a useful point of reference in discussions with the patients. Specifically, patients could not blame poor performance on other exogenous factors, e.g. in a mundane situation where they might explain away their inability to attend to important events by blaming distracting factors. The game by design did not provided additional stimuli. Therefore, the game made for and was understood by the patients as a reliable and objective measure. After the end of the trial, the 11 EAI Endorsed Transactions on Pervasive Health and Technology 07 2017 -07 2017 | Volume 3 | Issue 11 | e1 therapist in charge continued using WAM in her dayto-day work with patients and three participants asked for the game to continue playing after discharge.

Discussion
We found no evidence that playing WAM in the self-administered amounts observed in the study had measurable effects on FIM M , FIM C , CBS obs , CBS id or SDMT. However, at a usage of 5.6 minutes on average per day -a fraction of the patients' supervised rehabilitative efforts (around 4 hours per day)we should not expect to find measurable effects. Especially when the length of stay at the clinic, which correlated directly with supervised rehabilitation, had no significant effect on SDMT or CBS id either. Nevertheless, 5.6 minutes training per day yielded statistically significant in-game performance gains for the test compared to the control group.
A resident psychologist had assessed and attested WAM requiring sustained attention. Our participants found playing WAM helpful for their ability to concentrate and the statistical analysis of game performance that compared the test to the control group confirmed that playing WAM yielded training gains. The absence of significant effects of playing WAM on SDMT could be due to the fact that the scanning and tracking attention measured by the SDMT test was too dissimilar to the attention required for better performance in and trained by WAM.
The WAM performance indicators Center of Hit and Fitts' law's b components had high predictive accuracy for neglect classification. But the model was not sensitive enough to detect the mild neglect case of N16, who did not test positively on LiBi, LiCcl, and LetCcl either. While earlier research found that interaction with this type of rapid touch interaction games yielded Fitts law data that allowed comparisons with healthy people we found that in many cases sessions without HCP supervision yielded negative Fitts b values, meaning that hitting targets further away from the center took (after an initial reaction time) less time than targets closer by. This indicates that these interactions were not model conforming (c.f. [18]). Still, the model components a and b were significant predictors of neglect when obtained from the temporal performance of hitting targets on the left side. Future research could shed more light on this phenomenon.
Regardless of having neglect or not the patients initially found WAM beneficial and easy to use. WAM usage during the trial was not related to gender, age, previous knowledge with digital platforms (mobile use), or its perceived difficulty. Whether the drop in perceived benefit from entry to exit in the test group was due to the game becoming easier to use and less challenging (c.f. [4]), disappointment in experienced vs. expected training gains, or less benefit from the game at a later stage in rehabilitation remained unclear.
The clinical staff welcomed this form of self-initiated and administered rehabilitation. However, they had hoped patients would make more use of the app and keep to the suggested three sessions per day for an overall involvement of 30 minutes daily (including retrieving the tablet and reviewing the results). While our quantitative analysis did not provide evidence of WAM improving insight measures (CBS id ) the therapist found WAM useful for bedside assessment and as a neutral reference point providing tool to illustrate the patient their weaknesses to improve their insight. But for only one out of three neglect and none of the attention deficit patients in the test group did the study setup result in self-initiated play. The setup included a) presumable benefits from an intervention they signed up for b) bedside assessments, c) alarm based reminders, and d) entry, periodical, and exit play throughs. They almost entirely played during the HCP-facilitated (entry, exit, and periodical) sessions and bedside assessments.
One explanation could be simply due to sampling as we only had three of each (neglect and attention deficit) in the test group. Another explanation could be due to opinions about WAM. Both groups found WAM initially less fun than other participants and fun was positively correlated with usage. The perceived benefit among neglect patients diminished substantially during the trial and they found WAM very easy to play at the end. However, these two reasons could not explain the low usage for the attention deficit group whose benefit remained steady and perceived difficulty increased during this period. While WAM did not provide much variation and for some patients a too gradual increase in challenge this did not stop most of the other patients from using it.
One often cited reason could be a lack of insight and initiative of these patient groups. Lacking insight into their deficits and the benefit of a treatment towards improving it might render the patient only engaging in rehabilitation work out of courtesy to their therapist but not because they feel the work benefits them. The neglect patients performed poorly in terms of WAM hit counts and expired targets when compared to the other patient group but found WAM not difficult at all at exit (c.f. Figure 2). This provides some evidence for low insight from game and opinion measures. We drew on the scheduled weekly rehabilitation hours spent improving insight, initiation, and memory with HCPs. The neglect patients in the test group trained initiative (15.0 hrs/week) and insight (16.8) much more than the attention deficit (5.0; 13.3) and other patients (2.7; 10.5). From this a lack of insight and initiation seems a plausible explanation for the low WAM usage of neglect patients. Attention deficit participants had 12 EAI Endorsed Transactions on Pervasive Health and Technology 07 2017 -07 2017 | Volume 3 | Issue 11 | e1 similarly low WAM hit counts but fewer expired targets, and found WAM more difficult at exit than at entry. While they received less support on initiation (5.0) and insight (13.3) they received the most of all three groups on memory (10.0) training. For the attention deficit patients in the test group a combination of memory problems and insight and not so much a lack of initiative could explain the low WAM usage.
Given differences in interest and what people find motivating especially when handicapped by low insight and initiative we should not assume that WAM will be a good fit for all people to self-rehabilitate [4] especially in its current state, which leaves the task of gaining insight to the patient actively engaging with the result page. While little research has focused on user needs in rehabilitative games, the field of personalized informatics and quantified-self has identified user needs in terms of the following questions that people who collect data about themselves seek to answer [26] and might equally apply to games like WAM that generate data in each session:  [2,7,42]. In the shorter ten day pilot trial [22] with four participants (two neglect and two attention deficit patients) this concern had not emerged. Some patients competed with themselves and some of them with one another and used the scores to this end by comparing current with recently achieved scores. WAM did not provide goals e.g. through showing normative data from healthy or rehab patients or typical improvements in scores over time. Our patients used their remembered scores to compete with themselves and others as goals. Due to the absence of explicit goals WAM could not show discrepancies between goals and the patients current status and therefore the patients could not easily reflect on these.

Limitations
Unlike typical randomized controlled trials our study did not control the exposure to the game posing a threat to internal validity due to self-selection bias. For example, participants who played WAM more than others and could have higher than average rehabilitation outcomes. Controlling exposure, however, would have gone against the study's aim of investigating rehabilitative gains in self-initiated self-rehabilitation while in a clinical setting. The current setup with control and test group allowed for disambiguating WAM training gains from rehabilitative gains since we used the amount of time playing WAM as a continuous predictor of the rehabilitation indices. But in general, it might be beneficial for future work to study self-rehabilitation and novel treatment approaches such as WAM separately especially with target groups that are known to have low regimen adherence, low initiative, or poor insight into their conditions. Another limitation of the study concerns the low number of neglect and attention deficit patients that participated in the study, which limited the possibilities for statistical analysis.
Controlling for activities the control group engaged in in their own time would have been helpful but was beyond the study's budget. Running the study on the clinical side already required roughly two months more of clinical staff time than what had been planned for. This was due to more time required for signing up participants (e.g. relatives and or patients repeatedly wanted to know more information), retrieving the patients for enrolment, periodical, and exit tests, and helping with initial problems or changing schedules.

Future Work
As discussed, WAM needs to better support interactions with historical data and its link with progress tracking and goal setting e.g. through normative scores or improvements over time of similar patients. The app needs to improve people's insight into their shortcomings when not assisted by HCPs. Rather than having a fixed time limit, the game should adapt the length based on the user's ability to concentrate. Given the large fluctuations in game performance future work needs to address how to measure concentration and the effort the patients put into a game, build multi-dimensional outlier detection, and better tune the challenge in an elastic way for each participant depending on, e.g. time of day (c.f. [4]) and specifically for severely impaired and high performers. This was especially important for sessions in which participants hit only very few targets, which increased the likelihood for misclassifications of having neglect. During the trial we did not require patients to train their 13 EAI Endorsed Transactions on Pervasive Health and Technology 07 2017 -07 2017 | Volume 3 | Issue 11 | e1 weaker side. But the accuracy of inferring input hand from WAM interaction data from touch bias and unintentional touches with very high accuracy (>96%) was of particular interest to the therapist who is often confronted with or worried about non-compliance in exercises targeting the patient's weaker side. She valued such support for monitoring compliance in future versions.

Conclusion
We turned an understood neuropsychological measuring concept (Center of Hit) in neglect quantification into a game, which was simple enough to be played by all but one participant, showed a higher sensitivity range than, e.g. the SDMT test, and allowed patients to realize and become aware of performance gains. Playing the game for six minutes a day did not result in measurable gains in SDMT, CBS or FIM measures but the patients ascribed in game performance improvements to concentration training gains from the game. However, we found potential for WAM and similar solutions for insight support in bedside assessment and providing knowledge about performance and its progress as motivation for rehabilitation activities. The neglect and attention deficit patients who had the most to gain from using WAM either for training attention or to improve their insight did not sufficiently use it despite electronic reminders and up to two mandatory (periodical) sessions observed by an HCP. Our results to some degree call into question the very tenet of selfrehabilitation for patients with poor insight. Research in self-rehabilitation needs to focus more on how to improve insight and initiative through applications or rehabilitation system in use in the absence of continuous support from HCPs.
Touch bias and unintentional touches can be used to predict input hand with very high accuracy. Future apps for rehabilitation should consider verifying that tasks are carried out as designed for and need to consider that stroke patients might create considerable amounts of unintentional touch data.
Playing the game did not harm the participants; nor did the physical setup conflict with the clinical routine life apart from the auditory alarms. Responding to the alarms appropriately and to use them to start selftraining including all required steps with the tablet device had to be learned and in some cases assisted several times both verbally and in more difficult cases physically. Future studies and interventions need to budget sufficient resources for these activities.