Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1710
Full metadata record
DC FieldValueLanguage
dc.contributor.authorR Chandra, Omkar-
dc.contributor.authorKumar, Vibhor (Advisor)-
dc.date.accessioned2025-01-03T11:28:56Z-
dc.date.available2025-01-03T11:28:56Z-
dc.date.issued2024-07-01-
dc.identifier.urihttp://repository.iiitd.edu.in/xmlui/handle/123456789/1710-
dc.description.abstractThere are thousands of genes with incomplete functional annotations, particularly non-coding genes. Understanding the functional roles of genes is crucial for dissecting the complex genomic regulatory mechanisms underlying biological processes, which in turn provides control over cellular processes such as the immune response and cell cycle for potential clinical interventions. Over the years, numerous computational methods have emerged to link genes with biological processes and molecular functions. However, these methods often fail to account for non-coding genes and rarely provide interpretations of their predictions. To address this problem, a computational framework has been developed that incorporates features of non-coding genes at the promoter level using epigenome profiles, open-chromatin profiles, and transcription factor (TF) binding profiles of gene promoters. This approach allows for reliable predictions of gene functions, which are independently validated using available CRISPR screens and PubMed abstract mining. The explainable machine learning algorithms used for the prediction of gene function allowed for post hoc analysis using the top predictors of the learned models, yielding latent clusters of functions that collectively contribute to larger cellular processes. Additionally, downstream analysis using only transcription factors as top predictors provided insights into their synergy and pleiotropy in regulating various biological functions. The entire computational framework is built into an R package, "GFPredict," which can be used to predict biologically similar genes to user-defined query genes. Further analysis utilizing TF binding and epigenome profiles as features identified novel disease-gene associations. The predicted associations of coding and non-coding genes with diseases were validated using GWAS data and PubMed abstract mining. The genomic regulation analysis using top predictors of individual disease gene-sets revealed associations of divergent cell types in diseases. These association insights were validated with evidence from the literature, providing a basis for generating putative hypotheses for developing strategies for diagnosis, prognosis, and potential therapeutics.en_US
dc.language.isoen_USen_US
dc.publisherIIIT-Delhien_US
dc.subjectFunctional genomicsen_US
dc.subjectRegulatory genomicsen_US
dc.subjectGene function predictionen_US
dc.subjectTranscription factorsen_US
dc.subjectEpigenomeen_US
dc.titleExplainable machine learning with epigenomic features for insights into regulatory and functional genomicsen_US
dc.typeThesisen_US
Appears in Collections:Year-2024

Files in This Item:
File Description SizeFormat 
omkar_thesis_signed_final.pdf3.56 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.