2511.01054

Total: 1

#1 MedEqualizer: A Framework Investigating Bias in Synthetic Medical Data and Mitigation via Augmentation [PDF1] [Copy] [Kimi] [REL]

Authors: Sama Salarian, Yue Zhang, Swati Padhee, Srinivasan Parthasarathy

Synthetic healthcare data generation presents a viable approach to enhance data accessibility and support research by overcoming limitations associated with real-world medical datasets. However, ensuring fairness across protected attributes in synthetic data is critical to avoid biased or misleading results in clinical research and decision-making. In this study, we assess the fairness of synthetic data generated by multiple generative adversarial network (GAN)-based models using the MIMIC-III dataset, with a focus on representativeness across protected demographic attributes. We measure subgroup representation using the logarithmic disparity metric and observe significant imbalances, with many subgroups either underrepresented or overrepresented in the synthetic data, compared to the real data. To mitigate these disparities, we introduce MedEqualizer, a model-agnostic augmentation framework that enriches the underrepresented subgroups prior to synthetic data generation. Our results show that MedEqualizer significantly improves demographic balance in the resulting synthetic datasets, offering a viable path towards more equitable and representative healthcare data synthesis.

Subject: Machine Learning

Publish: 2025-11-02 19:16:50 UTC