Research Article
Scaling Health Analytics to Millions without Compromising Privacy using Deep Distributed Behavior Models
@INPROCEEDINGS{10.1145/3154862.3154873, author={Petar Veličković and Nicholas Lane and Sourav Bhattacharya and Angela Chieh and Otmane Bellahsen and Matthieu Vegreville}, title={Scaling Health Analytics to Millions without Compromising Privacy using Deep Distributed Behavior Models}, proceedings={11th EAI International Conference on Pervasive Computing Technologies for Healthcare}, publisher={ACM}, proceedings_a={PERVASIVEHEALTH}, year={2018}, month={1}, keywords={deep learning digital health user privacy}, doi={10.1145/3154862.3154873} }
- Petar Veličković
Nicholas Lane
Sourav Bhattacharya
Angela Chieh
Otmane Bellahsen
Matthieu Vegreville
Year: 2018
Scaling Health Analytics to Millions without Compromising Privacy using Deep Distributed Behavior Models
PERVASIVEHEALTH
ACM
DOI: 10.1145/3154862.3154873
Abstract
People are naturally sensitive to the sharing of their health data collected by various connected consumer devices (e.g., smart scales, sleep trackers) with third parties. However, sharing this data to compute aggregate statistics and comparisons is a basic building block for a range of medical studies based on large-scale consumer devices; such studies have the potential to transform how we study disease and behavior. Furthermore, informing users as to how their health measurements and activities compare with friends, demographic peers and globally has been shown to be a powerful tool for behavior change and management in individuals. While experienced organizations can safely perform aggregate user health analysis, there is a significant need for new privacy-preserving mechanisms that enable people to engage in the same way even with untrusted third parties (e.g., small/recently established organizations).
In this work, we propose and evaluate a new approach to this problem grounded in the use of distributed behavior models: discriminative deep learning models that approximate the calculation of various aggregate functions. Models are bootstrapped with training data from a modestly sized cohort and then distributed directly to personal devices - the user's own data now never has to leave the device. This opens a powerful new paradigm for privacy-preserving analytics under which user data largely remains on personal devices, overcoming a variety of potential privacy risks.