Analysis of Targeted Mouse Movements for Gender Classification

Gender is one of the essential characteristics of personal identity that is often misused by online impostors for malicious purposes. This paper proposes a naturalistic approach for identity protection with a specific focus on using mouse biometrics to ensure accurate gender identification. Our underpinning rationale lies in the fact that men and women differ in their natural aiming movements of a hand held object in twodimensional space due to anthropometric, biomechanical, and perceptual-motor control differences between the genders. Although some research has been done on classifying user by gender using biometrics, to the best of our knowledge, no research has provided a comprehensive list of which metrics (features) of movements are actually relevant to gender classification, or method by which these metrics may be chosen. This can lead to researchers making unguided decisions on which metrics to extract from the data, doing so for convenience or personal preference. Making choices this way can lead to negatively affecting the accuracy of the model by the inclusion of metrics with little relevance to the problem, and excluding metrics of high relevance. In this paper, we outline a method for choosing metrics based on empirical evidence of natural differences in the genders, and make recommendations on the choice of metrics. The efficacy of our method is then tested through the use of a logistic regression model. Received on 29 November 2017; accepted on 02 December 2017; published on 07 December 2017


Introduction
The popularity of online social networks, online forums, and various online dating sites has significantly increased the visibility of online users' personal information.However, these online sites also allow a great deal of anonymity in the sense that a user's identity is tied to the user's account but not personally to the user.This anonymity has been exploited by impostors, such as sexual predators, who lie about their gender or age for malicious purposes, while a victim user has little way of verifying that the provided information is valid.To date, very little has been done to address this problem of fake online personal identity.A strict registration policy, such as providing legal documents, is just not feasible for regulating this problem.
One promising alternative involves the use of physical or behavioral biometrics, such as keystroke dynamics or mouse dynamics, to enhance user authentication.These biometrics are non-invasive and can be used actively as a confirmation step or passively through continuous re-authentication to determine the demographic characteristics of a user.However, previous soft biometric systems tend to take a very data driven approach based on simple aggregate measures (e.g., averages) of behavioral metrics.In this paper, we present a new naturalistic approach to using behavioral biometrics for verifying an online user's demographics.We will illustrate the advantages of this approach by applying mouse biometrics to discriminate a user's gender.Our approach takes advantage of intra-user variability in mouse movements, and has the potential N. Van Balen, H, Wang, C. Ball to overcome generalizability issues when using mouse biometrics for user verification.
The proposed approach is mainly based on two important assumptions regarding naturally occurring mouse movements: (1) Gender differences naturally exist when performing two-dimensional aiming movements of a hand held device.The support for this assumption comes from a variety of basic and applied research domains, which include occupational health, physical therapy, public health, ergonomics, human anatomy, and perceptual-motor control theory.(2) The gender differences alluded to in the first assumption can be further elaborated by tracking the changes to naturally occurring mouse movements that are imposed by different target parameters.These target parameters are defined by the horizontal and vertical distances between the start and endpoint target locations, and by the size of the endpoint target.All three task parameters are known to affect aiming movements [12,34,37] while recent research in perceptual-motor control has highlighted that gender can also mediate these effects [5,30,31].
As a result of these two assumptions, this approach incorporates a much wider array of mouse movement metrics than those used in previous security applications of mouse biometrics.Consequently, the data analysis of these metrics required a different statistical approach from that used in traditional investigations of mouse biometrics.Twenty one different mouse movement metrics (temporal, spatial, and accuracy) were extracted from the movements recorded, and then each metric was expressed as a vector of four variables.The four variables correspond to the intercept and three unstandardized regression coefficients that are obtained from a multiple regression equation formulated to predict each metric using the three target parameters (vertical distance, horizontal distance, and target size).Binary logistic regressions were then employed to predict each participant's gender using an optimal subset of the multiple regression coefficients.
In this paper a model described above is built and validated in order to test its viability for use in classifiers.The goal of this validation is to test if the model can perform better than guessing geneder based on the distribution of the population.The proposed model was validated with mouse movement data collected from 94 participants (45 male and 49 female) who each performed 256 movement trials.The model's accuracy was tested on both labeled and unlabeled data.The labeled data is used as a verification step to test our method's ability to accurately fit the model to the real data and identify a user that has uncommon mouse movement characteristics as an outlier, while the unlabeled data is used to test the ability to accurately classify a user who has not yet been sighted before.Based on the evaluation results in both labeled and unlabeled data, an analysis of the outliers' impact was further performed to test the impacts that outliers, i.e., those users with mouse movement characteristics greatly different from the average, would have on the model.The achieved maximum accuracy is 89.4% for the full labeled data set and 100% after removing outliers, while 72.4% for the unlabeled data set and 75.9% after removing outliers.
The remainder of the paper is structured as follows.Section 2 describes the logic behind the naturalistic approach, along with a summary of related work.Section 3 details the methodology used to collect data, filter data, and extract the metrics from the data to be used for gender classification.Section 4 presents the two analysis steps used in building the statistical models for predicting the gender of each participant.Section 5 reports the results of testing the statistical models.Section 6 reviews the findings and limitation of the study, as well as describing future directions for this naturalistic approach.Finally, Section 7 summarizes the paper.

Background
In this section, we first highlight the gender difference in anthropometrics, as well as background in behavioral biometrics for authentication including its induced differences in movement behaviors and grip postures.We then present the background of using behavioral biometrics for user authentication.

Gender Difference in Anthropometrics
Men and women clearly differ in their physical dimensions as described by anthropometric data recorded in many countries for the purposes of monitoring public health and designing ergonomically sound work environments.Figure 1 illustrates the important anthropometric attributes of an individual working with a typical computer system.Maneuvering a computer mouse across a 2-Dimensional work space requires the complex coordination of the upper and lower arms in combination with the wrist and fingers.As shown in Figure 1, the anthropometric data for the upper arm length (reported by the United States Health Department [1]) reveals large consistent gender differences in the physical dimensions of a key limb component for moving a mouse on a table top.Physical differences like these arguably underlie many of the movement and grip differences that will be described in the remainder of this section [20].
Moving a computer mouse is classified as an aiming movement by researchers in the field of motor behavior, and aiming movements are generally composed of consistent temporal and spatial characteristics.An aiming movement typically includes a ballistic component (single phase of acceleration followed by deceleration) that corresponds to the main movement of the hand into the general area of the target location.The ballistic component is followed by a sequence of sub-movements (multiple phases of acceleration and deceleration) that consist of small spatial corrections of the hand to reach the final target destination [27].The field of motor behavior suggests that men and women differ in their aiming movements with men tending to move faster than women and with less accuracy [5,7,10,30,36].For example, Barral and Dabu [5] reported gender differences in aiming movements between two target plates with both preferred and non-preferred hands.Their results showed that men had higher peak velocities and higher absolute error (absolute value of the difference between the end point It was also reported that the location of the target in relation to the hand being used affected the accuracy of movements made by men, but showed no significant effect on women's movements [30].This highlights the importance of including target parameters (target size, horizontal distance, and vertical distance) when examining gender differences in mouse biometrics.Rohr [30] conducted a mouse pointing task where participants moved the mouse from a starting point near the bottom edge of the monitor to targets with different values of the index of difficulty (ID) provided by Fitts' law [12].Rohr found that women produced greater accuracy and longer movement deceleration times for the ballistic component of these movements.Interestingly, male participants in this study were less accurate as ID increased whereas the female participants' accuracy remained unchanged.However, female participants increased the deceleration times of the ballistic component of their movements as ID increased, but the male participants did not.These results not only highlight gender differences in movement behavior again, but also stress the importance of incorporating target parameter effects when investigating these gender differences.Here the target parameters include target size, horizontal distance, and vertical distance.
Research in physical therapy that has examined the effects of mouse use on wrist and arm pain in computer users has shown gender differences in hand and arm postures when performing movements with a mouse.A study on the finger postures of mouse users showed that men more frequently had a finger posture, in which the finger used for mouse clicking had a lifted finger posture where the middle portion of the finger was not in contact with the mouse [22].Male participants in this study were also more likely to show an extended finger posture with a flexion angle of less than 15 degrees when gripping the mouse (refer to Figure 2 for an illustration of relevant movement terms).These different grip postures may not only affect mouse movement characteristics, but also influence mouse button presses that can also be an important component of mouse biometrics.Johnson et al [18] found that women exerted more relative force on the mouse when gripping it, while Wahlstrom et al [39] reported that women exerted more force on the mouse button while pressing it.Johnson and colleagues also revealed different wrist postures between men and women when moving the mouse with women showing higher wrist extensions, larger ulnar deviations (refer to Figure 2), a larger range of motion in the wrist, and higher wrist velocities.A similar study by Yang and Cho [41] reported larger elbow flexion angles in men as well as different ulnar deviations, but in this study it was the men who exhibited the larger ulnar deviation angles.All of these different grip postures have the potential to affect mouse movement characteristics, including mouse button presses that can also be an important component of mouse biometrics.The results of these studies suggest that mouse biometrics should not only consider movement characteristics of aiming movements, but also consider movement characteristics unique to the physical manipulation of gripping a computer mouse.

Behavioral Biometrics
The use of biometrics is an attractive option for user authentication since it is inherently based on "who you are," and unlike other conventional methods cannot be lost, forgotten, or stolen.A large variety of user characteristics are used in biometric identification with some involving physiological recording, such as iris scanning, fingerprint scanning, facial recognition, and pulse recording [29] 1 ; and some involving behavioral recording, such as keystroke and mouse dynamics [40].The behavioral biometric systems, however, have the distinct advantage of not requiring specialized hardware to record the user behaviors.Research interest in behavioral biometrics started in the 1990s with the study of keystroke dynamics [24] that eventually led to research involving keystroke dynamics combined with mouse dynamics [2].
Behavioral biometrics have been used in the past to predict the gender of a user, but these studies have primarily focused on keystroke dynamics.Fairhurst and Da Costa-Abreu [11] conducted a study using a multiclassifier system on the GREYC-keystroke database [13], They combined three simple classifiers (K-Nearest Neighbors, Decision Trees, and Naive Bayesian Learning) into a multi-classifier system by incorporating three fusion techniques (Dynamic Classifier Selection based on local accuracy, Majority Vote, and Sum).They reported and achieved an accuracy for gender prediction of 95%.Giot et al [14] conducted a similar study using fixed-text input 1 It records the response at the palm of the hand while sending a low voltage electrical current through the body from the other palm.
for gender prediction and reported an accuracy of 91%.They also reported that traditional keystroke authentication systems had an accuracy increase of 20% when combined with the user's gender prediction model.These studies achieve impressive accuracy for gender classification, but further research is required to determine if these results can be generalized to different sets of keyboard data that are not fixed, as well as to different types of keyboard interfaces.In addition, authentication systems based on keyboard dynamics may not be suited to new graphical password interfaces (see Biddle et al for a survey of these interfaces [6]).Another study in keystroke dynamics shows that it is possible to protect against bot attacks that attempt to mimic human behaviours [35].There is, however, still merit to studying the use of mouse biometrics for use in gender prediction.There are certain systems and application in which the mouse is the predominant source of input as opposed to the keyboard.Should one wish to perform continuous authentication with a behavioral biometrics component on such system or application, it would be better to use the mouse input.Additionally, the movement profiles derived from targeted mouse movements allow for the extraction of a richer set of features than a keyboard.These features could be used to compliment a keyboard system and increase its accuracy by capturing characteristics of the user that were not apparent in keyboard features The study of mouse biometrics has mostly involved authentication of the userâĂŹs identity, and although steady improvement has been seen with this approach it has not achieved the same level of success as keyboard dynamics [19].One of the successful studies used mouse dynamics Mouse dynamics have been employed as a means of reauthentication to discriminate the identities of web browser users [28].Ahmed et al [3] used neural networks to learn a user's mouse dynamics in a specific environment while performing continuous identity authentication.Hamdy and Traore [15] combined mouse dynamics with cognitive measures of visual search capability and short term memory to create a static user verification system.The system used presented the user with a on-screen keyboard with randomly shoufled characters which the user needed to click with the mouse, and opted to use statistical both simple sum and weighted sum fusion methods.These studies highlight the utility of using mouse biometrics in user re-authentication; however their findings are limited to identity authentication and have not been generalized to other purposes.To the best of our knowledge, no previous studies have reported the use of mouse biometrics to classify users' gender.
Analysis of Targeted Mouse Movements for Gender Classification

Methodology
This section describes the apparatus and method used for data collection, as well as the data analysis procedures.

Data Collection
There are 94 participants (45 men and 49 women) aged between 17 and 48 years participated in this study.The participants consist of students, faculty, and staff who were all experienced computer mouse users.The male and female participants did not differ statistically with respect to prior computer use experience or age.
All participants were seated in a static non-reclining chair in front of a computer monitor with the right hand resting comfortably on the same mouse and table surface used by all participants.Participants were instructed to find a seating location and arm posture in which moving the mouse would feel the most natural to them.They were requested to maintain this posture while conducting all experiment trials.
Raw mouse movement data were collected using an application implemented with the processing programing language.The same home (starting point) target was used on all trials and was displayed within an application window.Once a participant positioned the cursor on the home target and clicked the mouse button, this target was hidden and a new endpoint target was displayed.The screen position of the mouse was recorded at a rate of approximately 100Hz with each data point consisting of a timestamp, the x screen coordinate, the y screen coordinate, and a tag that identified what type of a movement event was recorded.The movement events consisted of a standard movement event (mouse stationary or in motion without the left button being depressed), a target click event (left mouse button depressed while the mouse cursor is located inside the target area), a click event (left mouse button depressed while the cursor is outside of the target area), and a new target event (a new target displayed and the location and size of the target are recorded, instead of the mouse location).
The display window consisted of a rectangular frame (1680 px × 1050 px) displayed on a 45 × 30 cm computer monitor.As Figure 3 shows, the home target consisted of a blue 30 px radius circle located in the center of the display window.All endpoint targets were displayed as red circles and consisted of one of two possible target sizes (30 px or 60 px radius) located at one of 16 possible locations.The endpoint target locations varied in their direction of approach and in their distance from the starting target position.
Each participant was instructed to move the mouse cursor from the home target to the endpoint target .Once the participants had located the cursor in the home target circle, they were requested to click the mouse button to start the trial.The participants were instructed to only pick up the mouse when readjusting the starting position of their hands on the table, during which they were moving the screen cursor back to the home target.Each participant conducted a sequence of 32 practice trials that consisted of all 32 possible combinations of target size, target distance, and angle of approach as describe above.After successfully completing the practice trials, each participant then performed four blocks of 64 movement trials with each block of trials consisting of a random sequence of two trials for each combination of the 16 target locations and 2 target sizes.The participants were allowed to take a short rest after completing each block of movement trials.

Movement Metrics
The profiles of distance and velocity were extracted from the raw data of each movement trial.These profiles were used to calculate 10 temporal metrics that distinguish aiming movements and button presses.The each movement was smoothed, and then 6 spatial metrics were calculated to highlight differences in the trajectory.Five accuracy metrics were also calculated for each mouse movement.Following the naturalistic approach, the choices of these metrics were guided by previous empirical research on gender differences in aiming movements that have used the same or similar metrics [5,7,10,16,30,31,36].For example, researchers have reported that men are quicker at perceiving object location, faster in their movements, rely less on visual guidance of the ballistic omponent of the movement, perform less visual corrections towards the endpoint of the movement, and are less accurate when they reach the endpoint of the movement.Some additional metrics were calculated, because prior empirical research would imply gender differences are possible for these mouse metrics even if they were not reported in the actual studies.For example, males and females differ in their grip postures of the mouse and positioning of the finger over the mouse button [18,22,41], implying that gender differences could exist for metrics influenced by these grip postures.
Profiles.The distance profile was calculated from the Euclidean distance traveled between consecutive movement events, and smoothed using a Kolmogorov-Zurbenko (KZ) filter.The KZ filter belongs to the low pass filter class, and is a series of k iterations of a moving average with a window size of m, which is apositive odd integer.The KZ filter repeatedly runs a moving average filter with the initial input being the original data and the result of the previous run of the moving averages as the subsequent inputs.With this in mind, the first iteration of a KZ filter over a process X(t) can be defined as: the second iteration as: and so on.
In this study, we set m to 11 and k to 3, respectively.The value of m = 11 was chosen such that the window over which the data is averaged would correspond to 100 milliseconds or more.Thus, the window can cover a period of time with an intentional movement since smaller ones are likely to be just jitters.The value 11 was chosen, instead of 10, because the value of m needs to be odd.The value k = 3 was chosen because 3 was the smallest value that produced a smooth curve.
The velocity profile was then calculated from sets of pairs (t,v t ), where v t is the average velocity in pixels per millisecond (px/ms) over the time interval between t and the time at which the previous data point was recorded.
Aimed movements generally produce velocity profiles that are composed of one large peak (peak velocity) called the ballistic component followed by zero or more smaller peaks that reflect sub-movements used to position the cursor to the target (Figure 4).The velocity profile was used to calculate the temporal features of the mouse dynamics recorded.

Temporal movement and button press metrics.
• Reaction time (RT): the time from when the endpoint appears on screen until the movement towards it is initiated.The onset of the movement was determined to begin at the point when movement velocity exceeded 7% of the peak velocity (Figure 4).Various methods were tested for determining the beginning point of movements including measuring the slope of the velocity profile, pixels moved during consecutive time steps, and the percentage of peak velocity exceeded, using a visual inspection of a randomly selected group of trials and a set of known edge cases.Through this testing, we found that using the percentage of peak velocity exceeded with a value of 7% was the most effective solution.
• Peak velocity (PV): the maximum velocity in the movement (Figure 4).
• Time to peak velocity (TPV): the time interval from the beginning of the movement until the peak velocity was reached (Figure 4).
• Duration of ballistic component (DB): time until first local minima after peak velocity the time interval from the beginning of the movement until the first local minima on the velocity profile following the peak velocity (Figure 4).
•  • Time to click (TC): the time interval between the arrival at the endpoint of the movement and the pressing of the mouse button.
• Hold time (HT): the amount of time the user held the mouse button down after the endpoint of the movement was reached.
• Movement time (MT): the time interval from the beginning of the movement until the endpoint of the movement.
Spatial movement metrics.These metrics are calculated from the spatial trajectory traveled by the mouse cursor for reaching the endpoint of the movement.
• Path length (PL): the total distance traveled by the mouse cursor during the trial.It is calculated by summing the change in the distance for each time step starting a time 1 and finishing at the final time step as follows: where T is the total number of the trial, and ∆d t represents the distance traveled between time t and time t-1.
• Path length to best path ratio (PLR): the value of the path length divided by the length of the shortest path between the start and endpoints of the movement.
• Task axis crossings (TXC): the number of times that the movement path crossed the task axis.The task axis is defined as a straight line between the home target and the endpoint (Figure 5).
• Movement direction changes (MDC): the number of times the movement changed direction perpendicular to the task axis (Figure 5).
• Orthogonal movement changes (OMC): the number of times the movement changed direction parallel to the task axis (Figure 5).
• Movement variability (MV): the standard deviation of the distance of the movement path to the task axis.This metric measures the spatial consistency of the movement path.
Movement accuracy metrics.These metrics represent how closely a participant came to clicking the center of the endpoint target.
• Absolute error (AE): absolute error corresponds to the Euclidean distance between the movement endpoint and the center of the endpoint target.
• Horizontal error (HE): the difference in the horizontal (x) coordinates between the movemnet endpoint and the center of the endpoint target.
Negative errors reflect undershooting the target location whereas positive errors reflect overshooting the target location.
• Vertical error (VE): the difference in the vertical (y) coordinates between the endpoint of the movement and the center of the end position target.Negative errors reflect undershooting the target location whereas positive errors reflect overshooting the target location.
• Absolute horizontal error (AHE): the absolute value of the difference in the horizontal coordinates between the movement endpoint and the center of the endpoint target.
• Absolute vertical error (AVE): the absolute value of the difference in the vertical coordinates between the movement endpoint and the center of the endpoint target.
These defined errors are illustrated in Figure 6, where an absolute error consists of Euclidean distance between the end of a movement and the center of an endpoint target.The horizontal error corresponds to the difference in the x coordinates of the movement endpoint and the center of the endpoint target.The vertical error corresponds to the difference in the y coordinates of the movement endpoint and the center of the endpoint target.In both cases, a negative value depicts undershooting and a positive value depicts overshooting.

Data filtering
Before calculating the movement metrics for each participant as described above, the movement data were filtered to remove invalid trials where mouse movements did not fall within the acceptable criteria for successful movement recording.The trials in which mouse movements clearly left the designated screen window were rejected, as well as the trials where the reaction times were less than 150 ms.This value of 150 ms was chosen, because the lower end of human reaction time is 100 ms.However, the method of determining the start of the movement is not perfect and causes some false positives.The same visual testing for determining the movement onset was used here, and we found that the value of 150 ms made a good balance between the false positive ratio and the false negative ratio while determining if the reaction time value was realistic.Only 4% of data points were rejected for these reasons across those more than 24,000 trials recorded.

Model design
The gender classification model results from a two-step procedure of statistical analyses.The first step involves conducting least-squares multiple regressions to determine the effects of target parameters (target size, horizontal distance, and vertical distance) on movement metrics for each participant.The resulting unstandardized regression coefficients provide a movement signature for each participant, which will be used to distinguish the corresponding participant's gender.The second step involves conducting logistic regressions to select the statistical model that most accurately classifies participants by gender.

Mouse signatures
Traditional analyses of mouse biometrics usually rely on a single aggregate indicator (e.g., average) for each metric.Unfortunately, previous studies have shown that this approach may be ineffective.For example, in the study conducted by Rohr [30], men were shown to have their accuracy reduced as a target was made smaller and placed further away, whereas women were more consistent with their accuracy.By simply taking the average accuracy, the effect that the size and distance of the target had on men represented in the data will be be diminished or lost since the lower values would counteract the higher values.To counteract this it is necessary to find a way to produce features which capture not only the actual values observed in the data, but also the amount of change that the target parameters caused.Our approach involves a more detailed analysis that incorporates the effects of target parameters on these mouse metrics.The effects of target parameters on the mouse metrics were quantified by unstandardized regression coefficients obtained from a multiple linear regression analysis with least squares fitting conducted for each metric.Multiple regression analyses predict the scores of a dependent variable y by fitting a straight line defined by a set of independent variables {x 1 , x 2 , x 3 , ...} to a set of known data points (y i , x 1,i , x 2,i , ...) such that it satisfies the equation: where a and b k are unknown constants that are estimated, and ε i is the residual defined as the vertical deviation of the known data to the estimated line.If the estimated line is a perfect fit, all values of ε are zero.
The least squares fitting method estimates the values of a and b k by reducing the squares of the residuals such that the following equation is minimized: Three target parameters were chosen as predictor variables for these multiple regressions: the size of the endpoint target, the vertical distance between the home and endpoint targets, and the horizontal distance between the home and endpoint targets.The target distance was measured in separate horizontal and vertical components, because prior research suggests that these components should be the most influential on aiming movements rather than more complex combinations of the angle of approach and distance moved [38].Absolute values were used for the distances traversed because previous research also suggests that the direction of movement (left vs. right and up vs. down) does not affect movement metrics as much as Analysis of Targeted Mouse Movements for Gender Classification whether it is just a vertical movement or a horizontal movement [8,9].Consequently, the size and sign of the regression coefficients for the distance variables simply represent how much of an effect, moving vertically or moving horizontally, had on the predictability of a metric.
For each metric recorded, three regression coefficients and the intercept value were provided to highlight the effect of these target parameters on the metric.For example, if the peak velocity (P V ) was used as the dependent variable, four values were provided for this metric (intercept value P V const , regression coefficient for horizontal distance moved P V horz , regression coefficient for vertical distance moved P V vert , and regression coefficient for target size P V size ).This results in a metric vector for the peak velocity that specifies the following equation: It was expected that these regression variables would better reveal gender differences in the This assumption is supported by 4-way ANOVAs (gender × target size × distance × angle of approach) that were conducted for each metric.The significant results of these ANOVAs are summarized in Table 1.These results clearly show that many of the metrics revealed consistent target parameter effects, and these effects could be mediated by gender.

Gender prediction model
The second step in developing a gender prediction model involves with the input of the metric variables obtained from each participant in a logistic regression to predict the gender of a participant.The logistic regression is often used for classification when dependent variables have binary values.The curve used in this type of regression is an S shaped curve asymptotically tapered between 0 and 1 and is derived from the following linear relation: where logit(P ) refers to the natural logarithm of the odds function defined as follows: This function can then be substituted into the original linear relation and be solved for P giving the formula: P = e α+β 1 x 1 +β 2 x 2 +... 1 + e α+β 1 x 1 +β 2 x 2 +... , where P is the probability that the dependent variable has the outcome coded as 1 given the values of x i .
The values of constant α and coefficients β i are determined by maximizing the conditional probability of the observed data, given the parameters used as predictors.An initial model is constructed with arbitrary values for the coefficients, and the conditional probability is evaluated.The coefficients are then modified in order to increase this probability, and the procedure is repeated until the model converges or a maximum number of iterations are reached.A maximum of 20 iterations were allowed to determine the values of the coefficients, and the results lead to a threshold value of 0.5 (i.e., whose values above 0.5 were considered as male and whose values no larger than 0.5 were considered as female).This model also makes no assumptions about continuity, distribution, and correlation of the independent variable data.

Evaluation
The accuracy of the proposed approach for classifying a user' gender was evaluated on both labeled and unlabeled data.The labeled data consisted of the full data set, while the unlabeled data test was performed with 70% of the participants used as the training set and the remaining 30% of participants used as the test set.

Labeled data analysis
In this section, we verify how well a model may be fit the data and the accuracy of such a model on users who have been sighted before.We also use this step to identify any users with unusual characteristics as outliers.The purpose of this step is primarily to identify the prevalence of outliers in the population, but also serves as a proof of concept to ensure that a regression model can be fit to the features used.The logistic regression model was tested on all 94 participants, but given the very large number of predictor variables (21 metrics × 4 metric features = 84 predictor variables) only smaller sub-sets of predictor variables were actually tested.The first subset of predictor variables was determined by testing each metric separately.The four features of each metric were tested as a single group separate from the features of the other metrics.The statistical significances (p < 0.05) of each metric's variables for predicting gender determined if these variables were included in the first sub-set of predictor variables.The significant predictors included in this subset were: {HT const , P V horz , P B size , T C const , T C horz , MDC const , MDC horz , MDC size , AE const }.To improve the overall accuracy of this model, additional predictor variables were included while providing a moderate level of statistical significance (p < 0.1) in predicting gender when each metric was tested separately.Two additional variables were included to this sub-set of predictor variables: P B const and P LR vert .The amount of explained variance in gender classification using these two subsets of variables was 0.532 according to the Nagelkerke pseudo r-squared measure, and the classification accuracy based on this model was 75.5%.The first subset of predictor variables was reduced from a total number of 84 to 9 by examining each metric's predictive power one metric at a time.However, a better subset of predictors may be possible if multiple metrics are included in the initial logistic regression model.One way to reduce the number of tested metrics is to only include those metrics that can characterize significant gender effects from the previously conducted 4-way ANOVAs.These findings highlight the metrics that show consistent gender differences or interactions of gender with target parameters.We also included those metrics published by other researchers with significant gender effects.The logistic regression model was tested again with a new subset of predictors that included the four variables for each of these metrics: {RT , HT , T P V , P B, P L, MV , AE, HE, T C, P V , AHE, AV E, V E}.The 52 predictor variables in this subset were added to the original subset with a stepwise method, and the following 10 new variables were revealed as significant predictors: {RT size , RT horz , RT vert , T P V vert , MV const , MV vert , MV horz , P V const , P V vert , V E const }.It is interesting to note that the eight metrics (RE, HT , T C, P V , AHE, AV E, HE, and V E) suggested to have gender effects in previously published results largely overlapped with the first subset (HT _const, P V _horz, T C_const, T C_horz, and RE_const), and it is also worth noting that three of these metrics also showed gender interactions in the ANOVAs.The remaining metrics chosen from the ANOVAs alone only overlapped with the P B metric (P B_size).The amount of explained variance after the addition of these variables to the final model was 0.676, and the resulting classification accuracy was 89.4%.
We now test the effects that outliers had on the model.Five users were identified as having scores that were more than two standard deviations away from the mean.These are likely users with mouse movement characteristics that do not entirely fit the average for their gender, since there can be an overlap of physical characteristics between the two populations and such an overlap affects the features being used.After the removal of these outliers, our model can discriminate the gender of the remaining 89 participants with an accuracy of 100%.It is difficult to uncover the actual causes for these outliers, and they can occur for a variety of reasons including, but not limited to, distraction or injury.In a real application, one would likely test for outliers at input time, and if an outlier is detected, the user would be asked to re-do the input trials in the case of a one time authentication.However, identifying the best methods to handle outliers is beyond the this paper.

Unlabeled data analysis
To evaluate the accuracy of our approach on unlabeled data, the movement data from 65 randomly selected participants were used as the training set to create the logistic regression model.And the model was then tested on the movement data from the remaining 29 participants who comprised the test set.The same variable selection procedure was followed with the unlabeled data as the one used for the labeled data, except that substantially fewer participants were involved in these selections.
The statistically significant predictors determined for subset one were: HT const , T C horz , MDC const , MDC size , MDC horz , AE const , AHE const , AHE horz , RT const , P B size , and V E vert .Six of these predictor variables were consistent with the selections based on the full data set (labeled data).The fit of this model was tested on the training set and accounted for 0.449 of the explained variance in predicting gender with a correct classification of 76.9% of the participants in the set.The second subset included the following predictor variables: {P V const , P V vert , P V horz , MV vert , RT size , RT vert , RT horz }.All seven variables were included in the subset of the predictors obtained previously with the full data set (labeled data).This overlap shows that this feature selection method produces a set of features close to what is expected based on research in other fields.On the other hand, what can be observed over the entire set may still have sensitivity to the training set, which one should be careful of when fitting the model.The fitness of this model with the combined subsets was tested on the training set and accounted for 0.579 of the explained variance in predicting gender.This final model was tested on the test set and was able to achieve a gender classification accuracy of 72.4%.After removing the outliers identified previously in the labeled data analysis, the test set was then classified with a 75.9% accuracy.These results suggest that outliers have a visible effect on the classifier, but the negative impact is relatively small.

Discussion
Men and women differ naturally, both physically and psychologically.The development of computer security tools can take advantage of these natural differences by focusing authentication procedures on these differences.This study used the naturalistic approach to successfully classify male and female participants by measuring the temporal, spatial, and accuracy characteristics of their mouse movements while evaluating how these mouse metrics were affected by target parameters.In this section, we will discuss (1) the impacts of our approach on accuracy and (2) the implications of our feature extraction method.

Accuracy
The measurement of one such metric, movement accuracy, will be used to exemplify this approach to the biometric analysis of mouse dynamics.Previous research with aiming movements has revealed gender differences in the spatial accuracy of these movements with women being on average more accurate than men [5,30].However, this gender difference is actually more complicated than one suggested by simply comparing average errors, because target parameters (target size, distance moved, and direction of movement) can also differentially affect the movement accuracy of men and women [30].In support of this premise, our study also found complex interaction effects of gender and target parameters on spatial error.Consequently, rather than just recording the mean accuracy of each participant's movements, a multiple regression analysis was conducted to predict spatial error using target parameters (size, horizontal distance, vertical distance) as predictor variables.
This novel approach to biometric analysis comes with some cost, because there are now four variables representing each metric's potential contribution to the prediction model.Given the relatively large number of movement features already required by our approach, a large number of predictor variables could be introduced to discriminate the gender of a participant using logistic regressions.Therefore, two criteria were followed to reduce the set of predictor variables for testing: (1) each metric was tested individually and only those variables that were significant predictors of gender in these tests were included in the first subset of predictors, (2) all the metrics that produced significant ANOVA gender effects and those with gender effects suggested in prior research were included in a second subset.Our logistic regressions produced correct classification of a participant's gender at a rate of 89.4 -100% for the labeled data and 72.4 -75.9% for the unlabeled data.These results are very promising given the limited range of values provided for each target parameter in this study.These values are an improvement when compared to the base in which one simply guesses that all participants are of the larger group which in this case would be female.If we simply go by that distribution, the guess would be correct 52% of the time.
The optimal classification accuracy was achieved after removing outliers from the labeled data set and from the training data set for the analysis of unlabeled data.It is unclear why a few participants had such discrepant mouse metrics, and further research is needed to rule out the possibility of introducing user behavioral outliers into data collection and evaluation.However, their effects on the unlabeled data were minor, indicating that they do not have a large impact on classifying previously unseen users.
Once the recording accuracies of the movement metrics have been established, the current procedure has very low computational overheads because it relies on simple statistical models for computing predictor variables and gender classification.A client machine can collect the raw movement data and then send it to a server for feature extraction and prediction of gender with minimal overhead, and relatively low latency for the client.Consequently, static and continuous authentications are viable options with this approach.In fact, real-life mouse movements that are not constrained to an experimental manipulation, as was the case in the current study, should provide a larger range of target parameters and therefore better predictive accuracy.A larger, more diverse data set of participants would also facilitate the testing of this approach, because the majority of the participants in the current study were highly educated undergraduate college students.

Features
One major advantage of the naturalistic approach to biometric analysis is that the features used to create the predictive model are based on natural differences.Thus, the naturalistic approach has a universal, biological basis, and should be more generalizable than traditional data driven approaches.The biological basis assumes the model to be independent of users' cultural backgrounds and applicable across different groups of computer users, such as those from different countries or different educational backgrounds.In other words, it would be more applicable on large multi-cultural populations than those using non-biological behaviors such as browsing habits.
To a lesser extent, the biological basis of the model can also provide generalizability across different physical computing environments, such as having different table heights, table surfaces, or type of mouses.In these different environments, the same biological processes (e.g., the same muscle groups) are used for moving a mouse, and mouse movements can be captured by the same techniques.However, the different environmental factors may still affect the outcome of the movements, producing differences in the data and the values of aggregate metrics extracted from it.For example, some surfaces may provide much higher maximum velocities for a mouse than others, depending on their reflectivity and roughness.Unfortunately, this would result in building a new model for a different environment with different outcomes, making it difficult to apply a generic model to domains such as Web where there is not control over the user's physical environment.
The problem of different physical environments can be addressed through the innovative feature extraction method discussed in Section 4. By using regression weights as the features for classification, the model no longer measures the aggregate value of the outcome of the movement.Instead, these features provide a measure of the interaction between the target and the movement rather than measuring the movement itself, and are far less affected by a changing physical environment than the features based directly on aggregate values.This method for feature extraction can also prove useful when being applied to other problems, in which it can determine if the groups are affected differently when interacting with certain parameters.In some cases, it may even be beneficial to standardize the data set in order to remove any impacts the raw values may have on the model fitting for the classifier.The downside to this method is that it requires more data and is more computationally intensive.A minimum of two data points per individual is required to calculate a multiple regression, and would likely require more to gain an accurate value for the coefficients.It would also be necessary to run curve fitting every time the data of a new individual needs to be classified.These limitations should be considered before using this method to address a specific problem.

Future Work
A direct application for this method of feature extraction would be to explore the generalization across different hardware platforms and physical environments as discussed in Section 6.1.This method of feature extraction has promise to be able to produce a model that is hardware agnostic, because (1) its features are based on biological processes that are common to all targeted movements and (2) the features measure the interactions with targets rather than the raw movement values.This suggests that the model would remain applicable even if a user has a touch pad or is using the mouse on an unusual surface such as carpeted floor.The feature extraction method can be applied to traditional mouse dynamics classifiers that distinguish individuals to see if it produces similar results on different platforms and environments.It would also provide extra information on the effect that these environments have on mouse movements.
Another extension to this work is to apply these classification methods to mobile devices.Mobile devices provide a unique opportunity for the study of gender classification, ballistic movements by human users, and their applications for authentication.When mobile users move their fingers across the screen, it provides a way to directly measure the hand movements, instead of measuring them through a proxy device such as a mouse.Mobile devices are also equipped with a set of sensors, such as the accelerometer and gyroscope, which can be used to gain more data about the movements or can provide a more direct measure of the size of a user's hand to strengthen the gender classification with the approximation of a finger size.Moreover, mobile users regularly interact with their devices through finger movements on the screen, allowing for the measurement of ballistic movements.The Android pattern unlock screen, the swipe keyboard and swipe unlock all provide sources of movement data.
Finally, the current naturalistic approach can be extended to use biometrics in computer security applications to predicting the age of a computer user from their mouse movements.Significant risks are associated with adult content web sites, and currently there is no proven biometrics-based procedure for authenticating a computer user's age.Analyzing the mouse movements provided by children could provide a solution to this problem.Developmental research has revealed consistent changes in the anthropometrics of children as they age (refer to Figure 7) that also parallel maturational changes in perceptual-motor control during this age period [4,18,23,26].However, this research has focused on young children and basic motor control research is needed to determine if these developmental trends also apply to adolescent computer users.At the very least, distinguishing a child's mouse metrics from an older adult's metrics should be achievable and still very helpful.A similar approach could prove successful at the other end of the age spectrum, where older computer users are increasing in number while also being more susceptible to security breaches.Recent research with the elderly has revealed natural changes in the anthropometrics, biomechanics, and perceptual-motor control of these individuals [10,17,21,25,27,32,33,36] that could be used to predict their age status (e.g., over 65 years of age) using this naturalistic approach to biometric analysis.

Conclusion
This paper proposes a naturalistic approach for gender classification of computer users based solely on their mouse movements.computer security focuses on the naturally occurring differences that are fundamental to biometric authentication success.The current paper provides a test of this approach for the gender classification of computer users based solely on mouse movements.The design rationale of our approach lies in the observation that men and women differ naturally in how they make mouse movements.We defined a series of temporal, spatial, and accuracy metrics to quantify the mouse movement differences between male and female users.In particular, we identified the metrics related to peak velocity, length of the deceleration phase, target accuracy, finger posture, and reaction time are relevant to gender classification.There were 94 volunteers participated in this study, and a mouse signature was created for each participant.We evaluated the efficacy of our approach for gender classification by conducting binary logistic regression tests, and achieved promising results.
We also developed an innovative method of extracting features based on multiple regression.Instead of using the direct aggregate values of the data, this method uses coefficients from multiple regression to capture the interactions between the target parameters and the movement.This reduces the dependency on the raw values of the movement data and can make the model applicable to the movements at different scales and possibly on different platforms.This method deserves further exploration in other fields in which it my be applicable.However, it also suffers from some limitations: it requires more data to fit the curve of the multiple regression, and it has an extra computational step causing the entire process to be more computationally expensive.

Figure 1 .
Figure 1.Illustration of the major anatomical measurements relevant to using a mouse from a seated position.Graph of gender differences in upper limb length (data taken from Anthropometric Reference Data for Children and Adults : Unites States, 2007-2010; U.S. Department of Health and Human Services) [1].

Figure 2 .
Figure 2. Anatomical terms for motions of upper limb, wrist, and joints.

Figure 3 .
Figure 3. Illustration of screen target positions for movements of mouse cursor.Home target is shown in blue, and all endpoint targets in red.

Figure 4 .
Figure 4. Example of a velocity profile with various temporal metrics illustrated.
Shape of the velocity profile (SV): a measure of the symmetry of the ballistic component, which is calculated by dividing the time to the peak velocity by the duration of the ballistic component (refer to (Figure 4. • Proportion of the ballistic component (PB): The proportion of the movement time taken up by the ballistic component, which is calculated by dividing the ballistic component duration by the movement time (Figure 4).• Number of movement corrections (NC): total number of local maxima in the velocity profile after the ballistic component (refer to Figure 4).

Figure 5 .
Figure 5. Example of a mouse trajectory to illustrate differences between three movement change metrics with task axis drawn in a dashed line.

Figure 7 .
Figure 7. Graph of age differences in upper limb length for children aged 8-17 (data taken from Anthropometric Reference Data for Children and Adults : Unites States, 2007-2010; U.S. Department of Health and Human Services).

Table 1 .
Significant main effects and interactions found for 4-way ANOVAs (Gender × Distance × Angle of approach × Target size) conducted for each metric.

Table 2 .
Analysis of Targeted Mouse Movements for Gender Classification Accuracy of predicted results.Labeled set refers to the full data set used in Section 4.1.Labeled 70% and unlabeled 30% refer to the training set and test set used in Section 4.2, respectively.