Abstract:
Sepsis and diabetes present intricate medical conditions that present substantial challenges to healthcare systems globally. Timely detection and precise diagnosis are critical in facilitating effective treatment and enhancing patient outcomes. This study aimed to investigate the potential of machine learning-based data augmentation approaches for biomarker discovery in a multicenter dataset of patients with sepsis and diabetes. Specifically, the study focused on comparing the efficacy of three different approaches, including the Gaussian Mixture Model (GMM), Bayesian Network (BN), and Conditional Tabular Generative Adversarial Network (CTGAN), for augmenting microarray expression data in the XpressionSuite tool developed by the TavLab at IIIT-Delhi. Differential Gene Expression Analysis (DGEA) was performed on the augmented data, and statistical significance was compared across the three approaches. The findings indicated that CTGAN-generated data exhibited higher statistical significance than the other two approaches, making it the preferred choice for further analysis. Interestingly, Myc targets were identified as a hallmark in all the models, suggesting the potential involvement of Myc in sepsis in patients with diabetes. Furthermore, the DEGs identified through CTGAN-based DGEA were subjected to functional enrichment analysis. The findings highlighted the involvement of several cytosolic components, including secretory vesicles, secretory granules, and dysregulation of stem cell differentiation, in the pathogenesis of sepsis in patients with diabetes. The study results underscore the potential of data augmentation in enhancing the statistical power of gene expression data analysis. Moreover, the study findings suggest that CTGAN-based data augmentation could be a promising approach for biomarker discovery in patients with sepsis and diabetes. The identified immune system pathways could also serve as potential targets for developing therapeutic interventions for sepsis in diabetic patients. It provides insights into the effectiveness of different data augmentation approaches and their potential for biomarker discovery in sepsis patients with diabetes, with potential implications for advancing clinical research in this area.