sesa 18: e1

Research Article

How data-sharing nudges influence people's privacy preferences: A machine learning-based analysis

Download57 downloads
  • @ARTICLE{10.4108/eai.21-12-2021.172440,
        author={Yang Lu and Shujun Li and Alex Freitas and Athina Ioannou},
        title={How data-sharing nudges influence people's privacy preferences: A machine learning-based analysis},
        journal={EAI Endorsed Transactions on Security and Safety: Online First},
        volume={},
        number={},
        publisher={EAI},
        journal_a={SESA},
        year={2021},
        month={12},
        keywords={Privacy, Nudging, Persuasive Technology, Data Sharing, User Segmentation, User Profiling, Machine Learning},
        doi={10.4108/eai.21-12-2021.172440}
    }
    
  • Yang Lu
    Shujun Li
    Alex Freitas
    Athina Ioannou
    Year: 2021
    How data-sharing nudges influence people's privacy preferences: A machine learning-based analysis
    SESA
    EAI
    DOI: 10.4108/eai.21-12-2021.172440
Yang Lu1,*, Shujun Li2, Alex Freitas2, Athina Ioannou3
  • 1: School of Science, Technology and Health, York St John University, UK
  • 2: School of Computing, University of Kent, UK
  • 3: School of Hospitality and Tourism Management, University of Surrey, UK
*Contact email: y.lu@yorksj.ac.uk

Abstract

INTRODUCTION: Many online services use data-sharing nudges to solicit personal data from their customers for personalized services.

OBJECTIVES: This study aims to study people’s privacy preferences in sharing different types of personal data under different nudging conditions, how digital nudging can change their data sharing willingness, and if people’s data sharing preferences can be predicted using their responses to a questionnaire.

METHODS: This paper reports a machine learning-based analysis on people’s privacy preference patterns under four different data-sharing nudging conditions (without nudging, monetary incentives, non-monetary incentives, and privacy assurance). The analysis is based on data collected from 685 UK residents who participated in a panel survey. Their self-reported willingness levels towards sharing 23 different types of personal data were analyzed by using both unsupervised (clustering) and supervised (classification) machine learning algorithms.

RESULTS: The results led to a better understanding of people’s privacy preference patterns across different data-sharing nudging conditions, e.g., our participants’ preferences are distributed in a space of 48 possible profiles more sparsely than we expected, and the unexpected observation that all the three data-sharing nudging strategies led to an overall negative effect: they led to a reduced level of self-reported willingness for more participants, comparing with the case of no nudging at all. Our experiments with supervised machine learning models also showed that people’s privacy (data-sharing) preference profiles can be automatically predicted with a good accuracy, even when a small questionnaire with just seven questions is used.

CONCLUSION: Our work revealed a more complicated structure of people’s privacy preference profiles, which have some dependencies on the type of data nudging and the type of personal data shared. Such complicated privacy preference profiles can be effectively analyzed using machine learning methods, including automatic prediction based on a small questionnaire. The negative results on the overall effect of different data-sharing nudges imply that service providers should consider if and how to use such mechanisms to incentivise their consumers to share personal data. We believe that more consumer-centric and transparent methods and tools should be used to help improve trust between consumers and service providers.