Abstract:
This study presents a novel approach leveraging machine learning techniques to analyze epigenomic and transcription factor binding patterns for the identification of key genes, including non-coding genes, associated with rare diseases. Utilizing comprehensive gene lists from the "Rare_Diseases_GeneRIF_Gene_Lists" along with data sets pertaining to "Disease_Perturbations_from_GEO_up" and “Disease_Perturbations_from_GEO_down" downloaded from Enrichr database. The efficacy of our method was demonstrated using the Area Under the Curve (AUC) plots for each disease-specific gene set, providing a quantitative measure of our model's performance. These AUC plots not only underscored the accuracy of our predictions but also revealed distinct epigenetic and transcriptional signatures characteristic of various rare diseases. Our findings associated novel genes of rare diseases and pave the way for further investigations into targeted therapies. This work highlights the potential of machine learning in transforming our understanding of rare genetic disorders and in aiding the discovery of novel therapeutic targets.