
Research Article
A Multimodal Approach to Synthetic Personal Data Generation with Mixed Modelling: Bayesian Networks, GAN’s and Classification Models
@INPROCEEDINGS{10.1007/978-3-030-94822-1_55, author={Irina Deeva and Andrey Mossyayev and Anna V. Kalyuzhnaya}, title={A Multimodal Approach to Synthetic Personal Data Generation with Mixed Modelling: Bayesian Networks, GAN’s and Classification Models}, proceedings={Mobile and Ubiquitous Systems: Computing, Networking and Services. 18th EAI International Conference, MobiQuitous 2021, Virtual Event, November 8-11, 2021, Proceedings}, proceedings_a={MOBIQUITOUS}, year={2022}, month={2}, keywords={Synthetic personal data Bayesian networks Generative adversarial networks Multimodal approach Classification models}, doi={10.1007/978-3-030-94822-1_55} }
- Irina Deeva
Andrey Mossyayev
Anna V. Kalyuzhnaya
Year: 2022
A Multimodal Approach to Synthetic Personal Data Generation with Mixed Modelling: Bayesian Networks, GAN’s and Classification Models
MOBIQUITOUS
Springer
DOI: 10.1007/978-3-030-94822-1_55
Abstract
Personal data is multimodal, as it is represented by various types of data - tabular data, images, text data. In this regard, the generation of synthetic personal data requires a large number of interconnected datasets, but it is often very difficult to collect tabular data, images or texts for the same people. The problem of having interconnected datasets can be solved by separating the models to generate each type of data and combining them into a single model pipeline. This paper presents a multimodal approach to generating synthetic personal data of a social network user, which allows generating socio-demographic information in the user’s profile (tabular data), an image of the user’s avatar and content images that correlates with the user’s interests. The multimodal approach is based on the combined use of Bayesian networks, generative adversarial networks and discriminative model. This approach, due to the independent training of models, allows us to solve the problem of the presence of interconnected data sets (info + photos) and can also be used for example to anonymize medical data. A quantitative assessment shows that the obtained synthetic profiles are quite plausible.