Abstract:
Ovarian cancer is a highly aggressive malignancy with poor survival rates, largely due to late detection and extensive tumor heterogeneity. This study introduces a computational framework, NIGAM (Normalize, Identify, Generate, Authenticate, Meta-analyze) , to overcome limitations posed by small sample sizes in transcriptomic datasets. Gene expression data from ovarian cancer microarray studies were preprocessed (normalized), key underlying dimensions were identified, synthetic data were generated which were then authenticated via statistical and biological enrichment. Finally, the key findings were meta-analyzed to derive signatures. In the case study on Ovarian Cancer, a comparative analysis between original and augmented data revealed significant improvements in detecting biologically relevant signals. Pathways emerging only after augmentation included those associated with key cancer hallmarks such as uncontrolled proliferation, genomic instability, and angiogenesis. From these datasets, eight genes AURKA, DAPK1, MCM2, WNT2B, CNRIP1, CXXC5, PEX5L, and SEL1L2 were identified as novel candidates, with 50% of these supported by existing literature and pathway databases. AURKA and MCM2, in particular, showed strong alignment with known ovarian cancer biology. The remaining genes, lacking prior association, may represent unexplored therapeutic or diagnostic targets. Observed trends in p-values and fold changes confirmed that increasing augmented sample size enhanced statistical robustness. Our innovation, NIGAM and its application to Ovarian cancer demonstrates how an end-to-end framework incorporating AI, biology and data-science can enable biomarker discovery, speed up discovery of diagnostic panels and lead to precision treatment.