
Research Article
Data Augmentation for Tabular Datasets Using Generative Adversarial Networks (GANs)
@INPROCEEDINGS{10.4108/eai.28-4-2025.2357951, author={Ummaneni Dinesh Kumar and J Rajasekhar and Pillala Lakshman and Kolanti Manoj Kumar}, title={Data Augmentation for Tabular Datasets Using Generative Adversarial Networks (GANs)}, proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I}, publisher={EAI}, proceedings_a={ICITSM PART I}, year={2025}, month={10}, keywords={generative adversarial networks data augmentation data imbalance privacy}, doi={10.4108/eai.28-4-2025.2357951} }
- Ummaneni Dinesh Kumar
J Rajasekhar
Pillala Lakshman
Kolanti Manoj Kumar
Year: 2025
Data Augmentation for Tabular Datasets Using Generative Adversarial Networks (GANs)
ICITSM PART I
EAI
DOI: 10.4108/eai.28-4-2025.2357951
Abstract
In the data hungry era of machine learning, the problem of datasets shortage, data imbalance, and privacy constraints impede the model effectiveness and ethical conformance. In this work, we propose a new approach to enhance data privacy with the help of the Generative Adversarial Networks (GANs). The method addresses concerns for protecting sensitive data and the need for augmentations to be effective. The proposed approach utilizes the synthetic data generation capability of GAN for increasing the diversity and representativeness of the training data whilst preserving privacy and avoiding the re-identification of actual records. The model uses a hybrid network with a conditional GAN (CGAN) to treat underrepresented classes together with a differential privacy (DP) approach to anonymize synthetic versions. Our framework is based on TensorFlow and focuses on adversarial learning for private realism without sacrificing model quality and privacy. Experiments on unbalanced benchmark data reveal that the generated data benefits later models (e.g., classification accuracy and F1-score), also resists membership inference attack, and can satisfy data protection regulation. This research fills a niche between data augmentation and ethical AI, for scalable solutions in domains such as healthcare and finance. Code, results and comparisons are provided to encourage reproducibility and following work.