Year-2022

Year-2022 http://repository.iiitd.edu.in/xmlui/handle/123456789/1247 2026-07-16T04:23:55Z 2026-07-16T04:23:55Z Advancing graph-based computational approaches to decipher omic signature of diseases Mishra, Shreya Kumar, Vibhor (Advisor) http://repository.iiitd.edu.in/xmlui/handle/123456789/1300 2023-08-18T22:00:30Z 2022-10-31T00:00:00Z

Advancing graph-based computational approaches to decipher omic signature of diseases Mishra, Shreya; Kumar, Vibhor (Advisor) Omic signatures of disease are important for personalized treatment because of theheterogeneity of diseases. Despite the advancement of computational tools, there arelimited methods that can capture the latent inter-relationships between the individualcomponents (amino acids, genes) of proteins, transcriptomic profiles. This gap may beaddressed by the graph-based learning approach in a both supervised and unsupervisedway which enables the creation of scientifically driven learning problems on graphs. Weused graph signal processing which implements a range of tools for processing graphsignal that are functions defined over the nodes in a graph. These functions represent theindividual components of a biological unit. Further, these data points at the nodes aretransformed into different spaces in order to bring out the latent features of the biologi-cal unit for downstream analysis. These tools elaborate on traditional signal processingand provide access to several functionalities, including filtering and frequency analysis.In the first contribution, we devised an approach to address the noise in gene-expression profiles based on graph-wavelet driven gene-expression filtering to enhancegene-network inference. By using this approach, we were able to demonstrate howgene regulatory networks of young and elderly lung cells are different. Additionally,we contrasted differences in gene expression in lungs infected with COVID-19 with thepattern of changes in the effect of genes brought on by ageing.In the second contribution, we have proposed a smart graph-based embedding sys-tem in our search engine (ScEpiSearch) which is capable of embedding and provid-ing an integrative visualization of single-cell ATAC-seq profiles from various sourcesregardless of the species from which they originated and batch effect. Our method(scEpiSearch) calculates distance between query cells on the basis of the similaritywith reference expression and epigenome cells. Here, reference cells are selected fromlarge pool of cells based on their statistical significance of match. We demonstrated theiiiutility of our method in studying the lineage of cancer cells (mixed phenotype acuteleukaemia) and understanding their multipotent behaviour, emphasize unique regula-tory patterns in subpopulation of stem cells.In our third contribution, we have developed a novel graph signal processing basedmethodology to predict biophysical properties of proteins. The model utilizes graph-wavelet of physicochemical signals of amino-acid in protein residue networks to modelits biophysical properties. We demonstrate how our approach using graph wavelets canhelp in estimating the possible effect of disease-associated mutations on proteins usingexamples of prediction of globularity and folding rate.

2022-10-31T00:00:00Z Feature engineering for low dimensional representation of genes’ expression and pathological activities across diverse human tissues Rai, Priyadarshini Sengupta, Debarka (Advisor) Majumdar, Angshul (Advisor) http://repository.iiitd.edu.in/xmlui/handle/123456789/1299 2023-08-14T22:00:15Z 2022-10-01T00:00:00Z

Feature engineering for low dimensional representation of genes’ expression and pathological activities across diverse human tissues Rai, Priyadarshini; Sengupta, Debarka (Advisor); Majumdar, Angshul (Advisor) The advent of tissue and single cell based transcriptomic profiling technologies has allowed precise characterization of tissue specific gene activities in the context of development and disease. Human cells express about 20,000 genes whose interplay enables all physical activities that define our life. However, with expression signals, most transcriptomic platforms also offer bewildering levels of noise. This has become more prominent in the case of single cell transcriptomic experiments. As such, it is important to represent cells and tissues with the help of minimal genesets. This poses the classical challenge of dimension reduction. To reduce this feature space, we developed a de novo feature selection algorithm, SelfE (self expression), a novel l2,0-minimization algorithm that determines an optimal subset of feature vectors (genes) that preserves subspace structures as observed in single cell RNA-sequencing data. We compared SelfE with the commonly used feature selection methods for single-cell expression data analysis. Unlike bulk RNA sequencing data, single cell gene expression readouts feature excessive dropout events, thereby confounding downstream bioinformatic analyses. Keeping these limitations in mind, we proposed a method that employs deep dictionary learning for the clustering of single cell data. This is the first piece of the effort to create a deep learning-based approach for clustering. We render the framework clustering compatible by introducing a cluster-aware loss (K-means and sparse subspace) into the learning problem. The potential of our method is demonstrated by comparison with general deep learning-based clustering techniques and with specially designed single-cell RNA clustering techniques. In an effort to provide a comprehensive resource to understand tissue specific pathological activities of the genes, we developed Pathomap. It allows querying a gene to visualize, on a human body template, the intensity of pathological activities of a certain gene in a tissue specific manner. While the Human Cell Atlas project is still consolidating the gene expression patterns across healthy human tissues, Pathomap, as a parallel, provides insights into tissue specific pathological activity of genes. To achieve this, we searched 18 million PubMed papers published through May 2019 and automatically selected 4.5 million abstracts describing certain genes' functions in disease development. In addition, we fine-tuned the pretrained Bidirectional Encoder Representations from Transformers (BERT) for text modeling from the field of Natural Language Processing (NLP) in order to learn embeddings of entities such as genes, diseases, tissues, cell types, etc., in a way that preserves their relationship in a vector space. The reprogrammed BERT predicted disease-gene relationships not present in the training data, demonstrating the viability of in-silico formulation of hypotheses relating to diverse biological entities such as genes and disorders. Taken together, our works bring feature engineering approaches to bear in representing biological entities in low-dimensional space.

2022-10-01T00:00:00Z Unraveling cellular heterogeneity and phenotypic drug responses using chromatin profiles Neetesh Kumar, Vibhor (Advisor) http://repository.iiitd.edu.in/xmlui/handle/123456789/1258 2023-05-26T22:00:33Z 2022-08-01T00:00:00Z

Unraveling cellular heterogeneity and phenotypic drug responses using chromatin profiles Neetesh; Kumar, Vibhor (Advisor) For effective treatment regimens, decisions should be based on specific genetic variability present across different human body cells by taking advantage of already accessible large-scale omics data like genomics, epigenomics, proteomics, and metabolomics databases. As of lately, cellular heterogeneity in phenotypic conditions (like cancer, neurodegenerative diseases, bone disease, metabolic disorders, and immune-related disorders) is inferred using genomic and epigenetic biomarkers for clinical diagnosis, patient stratification, prognosis and treatment monitoring. For understanding regulatory changes due to disease and external stimuli in a cell, it is important to consider the role of chromatin structures as it is the regulation of the expression of the genes. But current existing datasets about chromatin interaction are derived from only a few cell-types, thereby providing limited insights for many cell-types. “Single-cell open-chromatin profiles” can be used to infer the pattern of chromatin-interaction in a cell-type. To study chromatin-interaction data for more cell-types, we developed a method called as “single-cell epigenome-based chromatin-interaction analysis (scEChIA)” that utilizes imputation of read-counts and refined L1 regularization for predicting interactions among genomic sites using “single-cell open-chromatin profiles”. Unlike other methods scEChiA is not biased for only short-range interaction but it opens avenues for studying long-range chromatin interaction by using “single-cell open-chromatin profile”. Using scEChIA, to predict chromatin interaction using “single-cell open-chromatin profile” of seven human brain cell types lead to identification of almost 0.7 million cis-regulatory interactions. Further analysis helped in finding the cell-type where there could be a connection to the known expression quantitative trait locus (eQTL) and their target genes the human brain. It also lead to the identification of possible target genes of human-accelerated-elements and disease-associated mutations. Further analysis revealed connection between genes and expression quantitative trait locus (eQTL) across different cell-types of human brain and along with insights into the target genes of human-accelerated-elements and disease-associated mutations. Due to availability of low amounts of relevant DNA and stochasticity, “single-cell open-chromatin profiles” have high drop-out rate and noise. To tackle this challenge, we developed a method called forest of imputation trees (FITs) to restore original signals from noisy and sparse single-cell open-chromatin profiles. Our algorithm, FITs is designed in such a way that it avoids bias during the restoration of read-count matrices. For this purpose it build forest of multiple imputation trees. FITs has resolved the challenging issue of recovering single-cell epigenome profiles without compromising the information at genomic sites with cell-type-specific activity. FITs-based imputation has not only improved the accuracy in the detection of enhancers but it has also increased reliability in estimating pathway enrichment score for every single-cell as well as predicting chromatin-interactions. To utilize the knowledge of chromatin interaction, we propose an approach to study the activity of topologically associating domains (TADs) in cancer cell lines. A TAD is a self-interacting genomic region; DNA sequences within a TAD physically interact more frequently with each other than with sequences outside the TAD. TAD boundaries contribute to complex-trait heritability, especially for immunologic, hematologic, and metabolic traits. We have analyzed the variation in the activity of TADs in different phenotypic conditions across cell-types, creating a resource for understanding the role of chromatin interactions at different phenotypic contexts. Our proposed methods can help utilize chromatin structure to highlight regulatory elements and genes that influence disease state and drug-response of cells for deciding hypothesis-driven therapeutics.

2022-08-01T00:00:00Z Interpreting single cell transcriptomes in the pathway space and its applications in cancer Chawla, Smriti Sengupta, Debarka (Advisor) Kumar, Vibhor (Advisor) http://repository.iiitd.edu.in/xmlui/handle/123456789/1257 2023-05-26T22:00:39Z 2022-09-01T00:00:00Z

Interpreting single cell transcriptomes in the pathway space and its applications in cancer Chawla, Smriti; Sengupta, Debarka (Advisor); Kumar, Vibhor (Advisor) Single-cell transcriptomics is a powerful technique that has revolutionized our approach to dissect cellular phenotypes and diversity in complex tissues at an unprecedented res- olution. The emergence of this groundbreaking technology has dramatically enhanced our understanding of cellular heterogeneity, interactions, and cell fate decisions during the development and progression of cancer. These new technologies have shown to be promising in the field of cancer genomics. Despite all the goodness, many computa- tional challenges remain. Human cells express about 20,000 genes, which dynamically carry out a multitude of biophysical activities. Statistical and machine learning-based methods treat genes as independent variables in the process of characterizing intra-tumoral heterogeneity and developing insights into cancer progression, pathogenesis, and clinical outcomes. This approach is quite limiting since constantly accumulating somatic genomic alter- ations are often manifested through the dysregulation of molecular pathways or cancer- relevant gene signatures. Thus, exploiting gene set and pathway scores to decipher heterogeneity in the single-cell will aid in many applications in cancer genomics. We propose a statistically robust method called UniPath to represent single cells in terms of pathway or gene set enrichment scores. UniPath projects gene expression readouts and single-cell ATAC-seq profiles into pathway scores while accounting for dropouts and sequencing depth. Further, it allows pseudotemporal ordering of single cells in pathway space. Visualization of gradients and distribution of pathways on a pseudotemporally ordered tree helps understand the lineage potency of cells. Another application of UniPath is that it helps enumerate differences in two cell populations through the exploitation of pathway co-occurrences. In a connected work, we introduce, Precily, deep learning framework that leverages pathway scores of gene expression pro- files and drug descriptors for anti-cancer drug response predictions. We thoroughly val- idated our proposed approach using bulk and single-cell gene expression profiles. We also assessed the performance of our approach on several in-house generated prostate cancer datasets. Finally, we interrogated the transcriptomic profile of triple-negative breast cancer tumor and Natural killer cell doublets and their physical distance cap- tured at single-cell resolution. We discovered that physical distances are governed by activities of regulatory modules, pinpointing the presence of transcriptional memory. In addition, our investigation into ligand-protein pairs interactions that are responsible for conveying messages into cells by activating signaling pathways revealed inflated activities of some of the specific pairs in NK-immune cell doublets. We concluded that intercellular communications in tumors play an essential role in deciphering the underlying mechanism operating in cancer. Our approach of capturing and profiling single-cell doublets will aid in the understanding of complex tumor microenvironment and cellular interactions.

2022-09-01T00:00:00Z