Abstract:
For effective treatment regimens, decisions should be based on specific genetic variability present across different human body cells by taking advantage of already accessible large-scale omics data like genomics, epigenomics, proteomics, and metabolomics databases. As of lately, cellular heterogeneity in phenotypic conditions (like cancer, neurodegenerative diseases, bone disease, metabolic disorders, and immune-related disorders) is inferred using genomic and epigenetic biomarkers for clinical diagnosis, patient stratification, prognosis and treatment monitoring. For understanding regulatory changes due to disease and external stimuli in a cell, it is important to consider the role of chromatin structures as it is the regulation of the expression of the genes. But current existing datasets about chromatin interaction are derived from only a few cell-types, thereby providing limited insights for many cell-types. “Single-cell open-chromatin profiles” can be used to infer the pattern of chromatin-interaction in a cell-type. To study chromatin-interaction data for more cell-types, we developed a method called as “single-cell epigenome-based chromatin-interaction analysis (scEChIA)” that utilizes imputation of read-counts and refined L1 regularization for predicting interactions among genomic sites using “single-cell open-chromatin profiles”. Unlike other methods scEChiA is not biased for only short-range interaction but it opens avenues for studying long-range chromatin interaction by using “single-cell open-chromatin profile”. Using scEChIA, to predict chromatin interaction using “single-cell open-chromatin profile” of seven human brain cell types lead to identification of almost 0.7 million cis-regulatory interactions. Further analysis helped in finding the cell-type where there could be a connection to the known expression quantitative trait locus (eQTL) and their target genes the human brain. It also lead to the identification of possible target genes of human-accelerated-elements and disease-associated mutations.
Further analysis revealed connection between genes and expression quantitative trait locus (eQTL) across different cell-types of human brain and along with insights into the target genes of human-accelerated-elements and disease-associated mutations. Due to availability of low amounts of relevant DNA and stochasticity, “single-cell open-chromatin profiles” have high drop-out rate and noise. To tackle this challenge, we developed a method called forest of imputation trees (FITs) to restore original signals from noisy and sparse single-cell open-chromatin profiles. Our algorithm, FITs is designed in such a way that it avoids bias during the restoration of read-count matrices. For this purpose it build forest of multiple imputation trees. FITs has resolved the challenging issue of recovering single-cell epigenome profiles without compromising the information at genomic sites with cell-type-specific activity. FITs-based imputation has not only improved the accuracy in the detection of enhancers but it has also increased reliability in estimating pathway enrichment score for every single-cell as well as predicting chromatin-interactions. To utilize the knowledge of chromatin interaction, we propose an approach to study the activity of topologically associating domains (TADs) in cancer cell lines. A TAD is a self-interacting genomic region; DNA sequences within a TAD physically interact more frequently with each other than with sequences outside the TAD. TAD boundaries contribute to complex-trait heritability, especially for immunologic, hematologic, and metabolic traits. We have analyzed the variation in the activity of TADs in different phenotypic conditions across cell-types, creating a resource for understanding the role of chromatin interactions at different phenotypic contexts. Our proposed methods can help utilize chromatin structure to highlight regulatory elements and genes that influence disease state and drug-response of cells for deciding hypothesis-driven therapeutics.