Security and Privacy in New Computing Environments. Second EAI International Conference, SPNCE 2019, Tianjin, China, April 13–14, 2019, Proceedings

Research Article

Privacy Disclosures Detection in Natural-Language Text Through Linguistically-Motivated Artificial Neural Networks

Download
156 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-21373-2_14,
        author={Nuhil Mehdy and Casey Kennington and Hoda Mehrpouyan},
        title={Privacy Disclosures Detection in Natural-Language Text Through Linguistically-Motivated Artificial Neural Networks},
        proceedings={Security and Privacy in New Computing Environments. Second EAI International Conference, SPNCE 2019, Tianjin, China, April 13--14, 2019, Proceedings},
        proceedings_a={SPNCE},
        year={2019},
        month={6},
        keywords={Privacy Security Natural language processing Machine learning},
        doi={10.1007/978-3-030-21373-2_14}
    }
    
  • Nuhil Mehdy
    Casey Kennington
    Hoda Mehrpouyan
    Year: 2019
    Privacy Disclosures Detection in Natural-Language Text Through Linguistically-Motivated Artificial Neural Networks
    SPNCE
    Springer
    DOI: 10.1007/978-3-030-21373-2_14
Nuhil Mehdy1,*, Casey Kennington1,*, Hoda Mehrpouyan1,*
  • 1: Boise State University
*Contact email: akmnuhilmehdy@boisestate.edu, caseykennington@boisestate.edu, hodamehrpouyan@boisestate.edu

Abstract

An increasing number of people are sharing information through text messages, emails, and social media without proper privacy checks. In many situations, this could lead to serious privacy threats. This paper presents a methodology for providing extra safety precautions without being intrusive to users. We have developed and evaluated a model to help users take control of their shared information by automatically identifying text (i.e., a sentence or a transcribed utterance) that might contain personal or private disclosures. We apply off-the-shelf natural language processing tools to derive linguistic features such as part-of-speech, syntactic dependencies, and entity relations. From these features, we model and train a multichannel convolutional neural network as a classifier to identify short texts that have personal, private disclosures. We show how our model can notify users if a piece of text discloses personal or private information, and evaluate our approach in a binary classification task with 93% accuracy on our own labeled dataset, and 86% on a dataset of ground truth. Unlike document classification tasks in the area of natural language processing, our framework is developed keeping the sentence level context into consideration.