Mobile Networks and Management. 9th International Conference, MONAMI 2017, Melbourne, Australia, December 13-15, 2017, Proceedings

Research Article

Anonymizing NN Classification on MapReduce

Download75 downloads
  • @INPROCEEDINGS{10.1007/978-3-319-90775-8_29,
        author={Sibghat Bazai and Julian Jang-Jaccard and Ruili Wang},
        title={Anonymizing NN Classification on MapReduce},
        proceedings={Mobile Networks and Management. 9th International Conference, MONAMI 2017, Melbourne, Australia, December 13-15, 2017, Proceedings},
        proceedings_a={MONAMI},
        year={2018},
        month={5},
        keywords={MapReduce Data anonymization -anonymity -NN classification},
        doi={10.1007/978-3-319-90775-8_29}
    }
    
  • Sibghat Bazai
    Julian Jang-Jaccard
    Ruili Wang
    Year: 2018
    Anonymizing NN Classification on MapReduce
    MONAMI
    Springer
    DOI: 10.1007/978-3-319-90775-8_29
Sibghat Bazai1,*, Julian Jang-Jaccard1,*, Ruili Wang1,*
  • 1: Massey University
*Contact email: s.bazai@massey.ac.nz, j.jang-jaccard@massey.ac.nz, r.wang@massey.ac.nz

Abstract

Data analytics scenario such as a classification algorithm plays an important role in data mining to identify a category of a new observation and is often used to drive new knowledge. However, classification algorithm on a big data analytics platform such as MapReduce and Spark, often runs on plain text without an appropriate privacy protection mechanism. This leaves user’s data to be vulnerable from unauthorized access and puts the data at a great privacy risk. To address such concern, we propose a new novel -NN classifier which can run on an anonymized dataset on MapReduce platform. We describe new Map and Reduce algorithms to produce different anonymized datasets for -NN classifier. We also illustrate the details of experiments we performed on the multiple anonymized data sets to understand the effects between the level of privacy protection (data privacy) and the high-value insights (data utility) trade-off before and after data anonymization.