Explainable machine learning with epigenomic features for insights into regulatory and functional genomics

R Chandra, Omkar; Kumar, Vibhor (Advisor)

Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1710

Full metadata record

DC Field	Value	Language
dc.contributor.author	R Chandra, Omkar	-
dc.contributor.author	Kumar, Vibhor (Advisor)	-
dc.date.accessioned	2025-01-03T11:28:56Z	-
dc.date.available	2025-01-03T11:28:56Z	-
dc.date.issued	2024-07-01	-
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1710	-
dc.description.abstract	There are thousands of genes with incomplete functional annotations, particularly non-coding genes. Understanding the functional roles of genes is crucial for dissecting the complex genomic regulatory mechanisms underlying biological processes, which in turn provides control over cellular processes such as the immune response and cell cycle for potential clinical interventions. Over the years, numerous computational methods have emerged to link genes with biological processes and molecular functions. However, these methods often fail to account for non-coding genes and rarely provide interpretations of their predictions. To address this problem, a computational framework has been developed that incorporates features of non-coding genes at the promoter level using epigenome profiles, open-chromatin profiles, and transcription factor (TF) binding profiles of gene promoters. This approach allows for reliable predictions of gene functions, which are independently validated using available CRISPR screens and PubMed abstract mining. The explainable machine learning algorithms used for the prediction of gene function allowed for post hoc analysis using the top predictors of the learned models, yielding latent clusters of functions that collectively contribute to larger cellular processes. Additionally, downstream analysis using only transcription factors as top predictors provided insights into their synergy and pleiotropy in regulating various biological functions. The entire computational framework is built into an R package, "GFPredict," which can be used to predict biologically similar genes to user-defined query genes. Further analysis utilizing TF binding and epigenome profiles as features identified novel disease-gene associations. The predicted associations of coding and non-coding genes with diseases were validated using GWAS data and PubMed abstract mining. The genomic regulation analysis using top predictors of individual disease gene-sets revealed associations of divergent cell types in diseases. These association insights were validated with evidence from the literature, providing a basis for generating putative hypotheses for developing strategies for diagnosis, prognosis, and potential therapeutics.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	Functional genomics	en_US
dc.subject	Regulatory genomics	en_US
dc.subject	Gene function prediction	en_US
dc.subject	Transcription factors	en_US
dc.subject	Epigenome	en_US
dc.title	Explainable machine learning with epigenomic features for insights into regulatory and functional genomics	en_US
dc.type	Thesis	en_US
Appears in Collections:	Year-2024

Files in This Item:

File	Description	Size	Format
omkar_thesis_signed_final.pdf		3.56 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets