Mauka-Mauka : Measuring and Predicting Opportunities for Webcam-based Heart Rate Sensing in Workplace Environment

Prolonged sitting and physical inactivity at workplace often lead to various health risks such as diabetes, heart attack, cancer etc. Many organizations are investing in wellness programs to ensure the well-being of their employees. Generally wearable devices are used in such wellness programs to detect health problems of employees, but studies have shown that wearables do not result in sustained adoption. Heart rate measurement has emerged as an effective tool to detect various ailments such as anxiety, stress, cardiovascular diseases etc. There are pre-existing techniques that use webcam feed to sense heart rate subject to some experimental constraints like stillness of face, light illumination etc. In this paper, we show that in-situ opportunities can be found and predicted for webcam based heart rate sensing in the workplace environment by analyzing data from unobtrusive sensors in a pervasive manner.


INTRODUCTION
People are spending increasing amount of time in workplace environments.With the evolution of computers, the number of desk jobs have increased phenomenally in the last few decades worldwide.In the United States, less than 20% of private sector jobs have moderate levels of physical activity, decreasing by nearly 30% compared to the early 60s.Nearly 4 out of 5 people have desk jobs in the United Kingdom [1].
Inactivity in workplace environment is becoming a high risk factor for health problems such as diabetes, heart attack and stroke, and increased mortality, among others [2,3].These health problems need to be detected early and prescriptive messaging is required to be put in place for affected employees.
Organizations are investing in wellness programs to improve social, mental, and physical health of their employees.These programs aim to improve quality of life, reduce the medical expenditure, and improve the productivity of the employees [4,5].Currently, wellness programs have started incorporating emerging wearable devices such as Fitbit, Nike+ FuelBand, Jawbone UP, etc for activity and physiological sensing in workplaces.It is estimated that more than 13 million wearable devices will be part of employee wellness programs by 2018 [5].Wellness program use these devices to sense physical activity, sleeping patterns, heart rate, etc and afterwards, employees are encouraged to be more active in their day-to-day life by generating personalized recommendations/prescriptions, gamification, and various incentives programs.However, many recent studies have shown that wearable devices do not result in sustained adoption and many people stop using them after sometime.For example, a study in US shows that more than half of those who have bought a wearable device stopped using it [6].Some of the known problems with wearable devices are that they are obtrusive and users often forget to wear them.Therefore, there is a need of non-obtrusive and pervasive sensing techniques which can drive sustained adoption in modern workplace environment.We envision that these pervasive sensing techniques can prove to be a low-cost alternative to wearable devices or they could be complementary to existing solutions and there by increasing sensing coverage.
Recently, there have been some research efforts to develop non-invasive and non-obtrusive sensing techniques using opportunistic use of computing devices and peripherals used as part of day-to-day life [7].Among these techniques, webcam has been used to estimate the heart rate of a person, which is an important vital parameter to detect various ailments such as anxiety, stress, cardio-vascular disease, etc.The key idea behind webcam-based heart rate sensing involves tracking tiny changes in color of the face due to blood flow through the vessels and subsequently, estimating the heart rate of a person [8,9].Similarly, there have been advancements to track respiratory rate too using video feed captured using webcam.
Due to easy availability of webcam equipped personal computing devices in workplaces, heart rate sensing could be very effective in early detection of various ailments .Though, there have been advancements in building mathematical and physiological models to increase the accuracy of heart rate sensing from webcam-based video feed [10].There are some practical challenges associated with webcam-based heart rate sensing approaches i.e. they require face region to be detected, does not work in presence of motion, and require sufficient light/illumination on the facial region.Also, there is little known about their applicability in typical workplace environment i.e. how many opportunities exist in day-today life of an employee where heart rate tracking is feasible.In essence, there can be three different sensing approaches to perform regular heart rate measurements in workplace environment.
1. Always-on Sensing: In this approach, webcam is always ON and continuously recording the user.In the continuous video feed, sessions with minimal motion w.r.t.facial region are detected and further processed to estimate heart rate.
2. Manually-triggered Sensing: It prompts user once in a while to switch on the webcam and directs her to be static for some time to estimate the heart rate.

Opportunity-based Sensing:
This approach tries to identify/predict the time intervals where user is likely to have limited motion w.r.t.facial region and trigger webcam to record a video feed.This video feed is processed to estimate the heart rate.
Among different approaches described above, 1 & 2 are obtrusive because they may change the user behavior and could be intrusive in her day-to-day activities.For example, a continuous-recording webcam could make users conscious where as a notification sent to manually trigger the webcam recording could create interruption for the user.There is a requirement of pervasive and non-obtrusive system that can identify/predict sensing opportunities and subsequently, automatically record the video feed to estimate heart rate in workplace environments.
To study the feasibility, we recruited 16 participants from a corporate organization.We deployed a system to perform continuous webcam-based video recording and subsequent sensing of heart rate for nearly a week across all the participants.An external off-the-shelf webcam was mounted on their laptop and participants were asked to go about their normal work.We logged their system usage history (i.e.applications, number of keys pressed, number of mouse clicks) along with heart rate measurements.Our user study was guided with following research questions: 1. How many in-situ opportunities of webcam-based heart rate sensing exist in a typical workplace environment where measurement can be performed automatically without involving the participants?
2. Do these sensing opportunities vary across different participants?What is the impact of motion and illumination on heart rate sensing?
3. Is there any correlation of these sensing opportunities with the system usage logs?Can these opportunities be predicted in advance using system usage patterns?
Contributions of the Paper: This paper answers above research questions using the data collected by a user study.
Our findings suggest that nearly 120 in-situ opportunities per hour per user for webcam-based heart rate sensing exists in workplace environment.We characterized the number of sensing opportunities w.r.t.motion and found that whenever, there is limited motion, the number of opportunities are much higher at 86% compared to 50% while not considering motion.These opportunities can be predicted with the help of alternate unobtrusive sensing mechanism such as mouse, keyboard, system usage etc with an accuracy of 81%.

RELATED WORK
We characterize the related work in three directions i.e. invasive sensing, non-invasive but obtrusive sensing, and pervasive sensing.A similar characterization is also presented in taxonomy of wearable devices on the basis of deployment of sensors on the human body [11].Observe that term invasive in the context of this paper refers to sensors, which need to be in physical contact of human body, while obtrusive is used to describe the discomfort or intrusiveness associated with the sensing techniques.

Invasive Sensing
Invasive sensing techniques use sensors that touch the skin, penetrate, or implanted inside to collect physiological measurements.A typical medical ECG equipment is an example of invasive sensing [12].These sensing techniques are highly accurate at the same time highly obtrusive for the subject being monitored.It utilizes external devices for sensing vital parameters such as heart rate.It is not feasible to use these senors to monitor physiological parameters in day-today life.

Non-invasive but Obtrusive Sensing
Non-invasive sensing can be performed with sensors that may or may not be in physical contact with the body [13].
Commercial wearable devices like Fitbit, Nike+ FuelBand, Moto360, Samsung Gear are examples of non-invasive wearable sensors with physical contact.These approaches require use of an extra device for heart rate and activity monitoring, which is considered to be an overhead by many people because they are not habitual to wear them.Also, these devices are obtrusive and many people stop wearing them after sometime resulting in high drop-out rates [6].
Similarly, there are many smartphone based non-invasive techniques which are dependent on the users to initiate the heart sensing by placing their fingertip onto the camera with in-built LED (flash) switched on to detect photoplethysmograph signal from the fingertip and thereby sense the heart rate [14,15,16].This methodology is an example of noninvasive sensing with physical contact, and the user has to explicitly place his finger tip on to the camera making it obtrusive.Other video-recording based heart rate sensing techniques described in [8,9] can be classified in the category of contact-less and non-invasive sensing methods.Although, in these techniques the heart rate is sensed by capturing the users' face remotely, still it requires the user to launch the application explicitly and place the camera accurately to capture static face, again making it obtrusive for the user.Our goal in this paper is to spell out the feasibility of a non-obtrusive system to sense heart rate with little/no intervention from user.

Pervasive Sensing
These approaches include the sensing methodologies that uses existing sensors in daily use computing systems (i.e keyboard, mouse, webcam), mobile devices (acclerometers, location, camera), and objects of daily use (chair, drinkware, etc) to sense physiological signals such as heart rate.These sensing approaches are non-invasive as well as non-obtrusive because they do not require any user intervention.For instance, authors in [7,17,18] describes mechanisms to detect stress using non-obtrusive sensors such as keyboard and mouse.Even though, there is a direct contact with human body in aforementioned approaches, this interaction is not forced on the user compared to others described in section 2.2.However, these approaches are limited to sensing stress only whereas heart rate sensing is an important vital parameter to detect other health risks such as cardiovascular diseases too.
Recently, there have been new developments where pervasive devices or objects are augmented with extra hardware to measure physiological attributes un-obtrusively.For instance, authors in [19] proposed an mobile electrocardiogram (ECG) monitoring system that monitors the users' ECG opportunistically during the use of sensor-augmented smartphone.Erin et al [20] instrumented a general sitting chair with the conductive fabric on the armrests to sense heart rate and pressure sensors on the back of the chair to sense respiratory rate.In controlled-trails, instrumented chair was able to determine heart rate nearly 32% of the time and respiratory rate at nearly 52% of the time.Similarly, authors in [21] uses the surface of a drinkware to monitor the heart rate of a person and Ravicharan et al [22] uses WiFi signals with a set of dedicated transmitter and receiver to estimate breathing rate.
As described above, most of previous research in vital parameter sensing using pervasive technologies requires augmenting objects of every-day use, which is not scalable as well as is cost expensive.Our paper tries to explore feasibility of a pervasive and contact-less webcam-based heart rate sensing mechanism in day-to-day environment without any user intervention/attention.

PRELIMINARIES: HEART RATE ESTI-MATION USING WEBCAM-BASED VIDEO FEED
As discussed in earlier sections, there have been existing research work which has focused on building techniques to improve the heart rate measurements by minimizing impact of motion artifacts and improving the accuracy to the level of wearable physiological sensors [23,8].However, the focus of this paper is not on improving existing heart rate measurement techniques, rather it aims to discover and predict existing opportunities of heart rate sensing in day-to-day life.This section provides a brief introduction of the underlying approach to extract heart rate from a video feed captured using webcam [23].
Figure 1 presents a block diagram which highlights the different steps involved in extraction of heart rate using a webcam-based video feed.In the first step, webcam-based video feed is captured and automatic face tracker based on Viola-Jones face detection algorithm is used to detect the region of interest (ROI) for subsequent frames.The resultant ROI pixels are spatially averaged to create time-series signals for a batch.Batches are created by sliding a window of 15 seconds in which 1 second of frames from previous batch is replaced to provide heart rate values every second.The batch size of 15 seconds is the adopted length for most of the medical instruments.At the end of each batch, the face tracker is used to detect whether the face of the subject is still or not present in the video feed.The extracted time series signal of a batch is then subjected to series of signal preprocessing steps which includes de-trending the signal to remove low frequency components; bandpass filtering in a specified frequency range to retain the cardiac frequencies; and normalizing the signal.The pre-processed signal is then converted to frequency domain using FFT and a power spectral density (PSD) curve is obtained, which is similar to one shown in Figure 2. The peak frequencies in the PSD curve gives the estimated heart rate value.SNR (Signal to Noise ratio) as shown in Equation 1 of the signal is calculated for finding the accuracy of heart rate value calculated by this extraction process.

SN R = The power of the region around highest point
The Power of entire signal The approach for heart rate extraction as explained above give heart rate of user and SNR to validate the heart rate but it does not validate the feasibility of webcam based heart rate sensing in workplace environment.

USER STUDY
The user study is focused on assessing the feasibility and effectiveness of webcam-based heart rate sensing in the workplace environment.Firstly, we provide some statistics to show that opportunity of heart rate sensing exists, when we record video continuously, i.e. user is detected still in some instances in front of camera.Then we show that information-rich data for the heart rate detection can be predicted in advance.By information-rich data, we mean the video in which user is present and is still with high probability.

Data Collection Setup
We designed our data collection application for training as well as validating our system.It collects continuous video feed with the system usage logs.The continuous video feed is captured with the help of an external off-the-shelf Logitech webcam, which records at 30 fps with a resolution of 1600X1200.We used an external webcam to bring uniformity in the video data collection and to minimize any biases occurring due to heterogeneous hardware.Further, the application collects system usage logs, including activity on keyboard/mouse and the foreground application information.The system usage logs are sampled cumulatively at an interval of 10 seconds1 .
The data collection application is developed in Microsoft.NET framework and runs as a background service in Windows OS based desktops/laptops.The video feed is saved every half an hour in AVI format where system usage logs are captured in JSON object format as shown in sensing module.
For data collection, we recruited 16 users (Female: 5, Male: 11) from a corporate organization through email asking for voluntary participation.The average age of female participants was 25 years (SD: 3.5) whereas it was 24 years for male participants.The only criteria used for the selection was possession of a Windows machine.The participants were not paid anything in return for their efforts and were given a choice to opt-out from the study or temporarily stop the data collection service anytime.
The participants were told that goal of this study was to collect facial videos and usage logs to better understand their behavior with the computer.All the participants signed an informed consent before participating in the study.The participants were instructed to carry out their usual work without getting disturbed from the mounted webcam device.We distributed external webcam devices to all the participants and asked them to mount it on their laptop in office hours for a duration of one week.All the participants had their own sitting space and there were no changes made to lighting conditions or sitting pattern.The data collection service was installed in their laptop and system usage logs and videos were collected manually due to their large size.
Figure 3 shows picture of our data collection setup, when the webcam was mounted on a laptop.For participants who use external monitor, a webcam was mounted on their monitors to capture the frontal face.The data collection application was configured to run in office working hours only, i.e. 9:00 AM -6:00 PM.
The user study ran for a week and in total, we were able to collect about 168 hours of data from all the participants.On an average, participants contributed 10 hours (SD: 6.0) of data with a minimum of 2 hours and maximum of 23 hours of data.The size of data collected was more than 200 GB.Few participants contributed very less data due to reasons such as frequent movement to meeting rooms, forgetting to switch on the data collection service after stopping it temporarily, etc.Data is filtered out from our evaluation if multiple faces are detected by the face detection algorithm or no face is detected in the video.

Analysis of Opportunity Sensing
The validity factor of the heart rate as discussed in Section 3 is SNR above some threshold and presence of single face.
In our analysis we have fixed SNR threshold to 0.6.The part of the video which satisfies these constraints, we call it information-rich data.This data is information rich as it has high SNR, which show that heart rate calculated is valid and presence of the face show that the user is present in front of the webcam.
Data is split into two categories, i.e. when the user is detected still or in motion.The user is considered still if less or no motion is detected in video and in motion otherwise.
In the overall data collection, the 64% of data belongs to still category.User specific distribution of data in these categories is shown in Figure 5.
To verify our hypothesis that heart rate validity is dependent on user's motion we calculated distribution of valid heart rate while the user is static and in motion.The results prove this hypothesis, showing that 86% of still data has SNR value above threshold.The user specific distribution of valid heart rate given user is still or without considering motion is presented in Figure 6 which shows that, the 55% of data collected is information rich.If we consider that window of 15 seconds as an indicator of valid heart rate value, the approach is detecting valid heart rate 30-175 times per hour per user.On an average, we could detect heart rate 120 times per hour, which shows that opportunity for extracting heart rate using webcam based approach exists.The consecutive values of SNR and motion give high confidence and validate heart rate value.The distribution of consecutive instances with valid heart rate is shown in Figure 4.  To further evaluate the feasibility of using our opportunistic heart rate prediction system and to validate our assumption of sustainable adoption of such a system, we conducted a small survey among the participants.We asked them their preference for using the three types of systems as described in Section 1 namely the Always-on Sensing, Manually triggered sensing and Opportunity-based sensing.The users were made to mark their preference for each of the three categories of system on a Likert scale of 1 to 5, 1 being the least preference and 5 being the most preferred.The average preference recorded for Always on sensing was 2.3 while it was 1.9 for Manual-triggered sensing.The opportunitybased sensing came out as the most preferred mechanism with an average score of 4.

SYSTEM FOR OPPORTUNITY PREDIC-TION
The focus here is to provide information-rich video as an input to the heart rate extraction algorithm.The key component of the system is sensing, classifying and actuation of webcam for video recording.The sensing and prediction module will justify the fact that the system uses unobtrusive sensors in a pervasive manner.The recorded video is used to extract heart rate information.The Figure 7 presents a flow diagram of key components of the system.

Sensing Module
In this module, a program running on the user's laptop/desktop collects usage statistics in a given time interval (say 10 seconds).It includes activity on keyboard, activity on the mouse, and name of the foreground application.Specifically, for the keyboard, it logs whether the user has been active on the keyboard (0/1) as well the number of keys pressed in the past 10 seconds.Similarly, user's activity on the mouse is tracked with a number of left and right clicks pressed during 10 seconds.The system program keeps track of the foreground application being run by the user.A snapshot of a JSON object generated by sensing module is presented in Figure 8.The collection of this data is both pervasive as well as unobtrusive as the data comes from sources (keyboard, mouse applications) which are implicitly present in workplace environments.

Iterative Classification and Prediction Module
The intent of this module is to predict the still moments, so that information-rich data can be collected for heart rate extraction.The usage logs (mouse/key characteristics, active  foreground application) generated in sensing module and validity of heart rate value in a given instance is input to prediction module.The data sensed in given 10-20 seconds is used to predict stillness of user in consecutive 10-20 seconds.Training data for the classification is generated by mapping usage logs of each 10-20 seconds with the opportunity of sensing in a near future of 10-20 seconds.Supervised binary classifier is used to generate predictive models (decision tree).This model is applied on usage logs to predict opportunity of heart rate sensing.
To improve the sensitivity of the system, the predictive model is trained iteratively.The iterative training generates more rules to take care of a new type of application or changed user behavior.The steps mentioned in Figure 9 show the process of generating predictive model iteratively by using newly collected data.

Opportunity prediction
The focus here is to predict the opportunity of heart rate sensing using unobtrusive sensors.We have used J48 classifier using WEKA [24], which is implementation of C4.5 decision tree algorithm.10 fold cross validation is used to find and evaluate our model.Sensing and prediction intervals are varied to see feasible sensing and predicting interval.
For instance in 10-20 scheme, 10 represents sensing time, using data of these 10 seconds decision tree predicts the opportunity to switch on the webcam in the next 20 seconds to record and calculate heart rate.On an average, proposed system was able to predict an opportunity correctly in nearly 81% of instances where sensing interval of 10 seconds was used and nearly 79% of the time whenever the sensing interval of 20 seconds was used.Sensitivity of the system on average to predict opportunity is 86% and 79% for the sensing interval of 10 and 20 respectively.Each user has different decision tree according to usage patterns and thus different sensitivity as shown in Figure 10.It is important to train a decision tree for each user independently as each user has a different pattern of using a keyboard and mouse and even the applications vary with users.If we train Decision tree with the data of n − 1 users in our user set and try to apply it on n th user, the accuracy on average degrades to 44% and prediction for all users is penalized.With the proper system deployment this is helpful, as data for each each user is independent, the decision tree can be trained on user machine and privacy of the user can be preserved as the user specific data remains local only.
All features, i.e. keyboard, mouse and foreground applications are equally essential for the prediction.For all users the decision tree model may or may not have all features.But Figure 11 shows that each feature contribute to the study of one user or another We segregated the applications used by users into 4 major categories i.e. coding (Eclipse, Visual Studio etc.), Document (Word, Notepad, Excel, etc.), PDF (Adobe, Foxit etc.), Internet (IE, Chrome etc.).Correlating this data, we found that some people are still while coding while others are still during Internet surfing.Similarly, each user has a different style of using a keyboard/mouse.And the way they interact with any application can be used to determine its usefulness for predicting.

CONCLUSION
In this paper, we gave a proof-of-concept of a system which can predict in-situ opportunities for webcam based heart rate sensing in the workplace environment by analyzing unobtrusive sensors in a pervasive manner.We did a user study with 16 participants for a week to capture and analyze their video feed (while working) and found out that on an average we can detect heart rate 120 times per hour per user.Also, we correlated the heart-rate sensing interval with the usage patterns and predicted the opportunity of webcam based heart rate sensing with accuracy of 81%

Figure 1 :
Figure 1: Process diagram depicting different steps of extraction of heart rate using the webcam feed.

Figure 2 :
Figure 2: Power Spectral Density (PSD) of video feed, the peak frequencies in the curve gives the estimated heart rate value and noise around signal is used to calculate SNR.

Figure 3 :
Figure 3: Setup for data collection, depicting that an external webcam is mounted on laptop to collect video feeds when user is working.

Figure 4 : 5 Figure 5 :
Figure 4: Distribution of consecutive valid heart rate samples.The distribution shows that 70% of data have more than 20 consecutive valid heart rate values.5

Figure 6 :
Figure 6: The data distribution when the SNR is above threshold given user is still or without considering the motion.

Figure 7 :
Figure 7: Flow diagram of the system depicting key components.

Figure 8 :
Figure 8: The data sensed from the unobtrusive sensors in a pervasive manner to associate with heart rate validity.

Figure 9 :
Figure 9: Classification module showing the iterative training of decision tree using data collected in prediction phase.

Figure 10 :
Figure 10: Sensitivity across all participants with different variants.First number (i.e.10) means the period of sensing data (in seconds) used for prediction whereas the next number (i.e.10) means the prediction period.

Figure 11 :
Figure 11: Sensitivity across all participants on removing a feature.Only one feature is removed at a time.The significance of each feature is different for each user