IIIT-Delhi Institutional Repository

Feature engineering for low dimensional representation of genes’ expression and pathological activities across diverse human tissues

Show simple item record

dc.contributor.author Rai, Priyadarshini
dc.contributor.author Sengupta, Debarka (Advisor)
dc.contributor.author Majumdar, Angshul (Advisor)
dc.date.accessioned 2023-08-14T10:20:17Z
dc.date.available 2023-08-14T10:20:17Z
dc.date.issued 2022-10
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1299
dc.description.abstract The advent of tissue and single cell based transcriptomic profiling technologies has allowed precise characterization of tissue specific gene activities in the context of development and disease. Human cells express about 20,000 genes whose interplay enables all physical activities that define our life. However, with expression signals, most transcriptomic platforms also offer bewildering levels of noise. This has become more prominent in the case of single cell transcriptomic experiments. As such, it is important to represent cells and tissues with the help of minimal genesets. This poses the classical challenge of dimension reduction. To reduce this feature space, we developed a de novo feature selection algorithm, SelfE (self expression), a novel l2,0-minimization algorithm that determines an optimal subset of feature vectors (genes) that preserves subspace structures as observed in single cell RNA-sequencing data. We compared SelfE with the commonly used feature selection methods for single-cell expression data analysis. Unlike bulk RNA sequencing data, single cell gene expression readouts feature excessive dropout events, thereby confounding downstream bioinformatic analyses. Keeping these limitations in mind, we proposed a method that employs deep dictionary learning for the clustering of single cell data. This is the first piece of the effort to create a deep learning-based approach for clustering. We render the framework clustering compatible by introducing a cluster-aware loss (K-means and sparse subspace) into the learning problem. The potential of our method is demonstrated by comparison with general deep learning-based clustering techniques and with specially designed single-cell RNA clustering techniques. In an effort to provide a comprehensive resource to understand tissue specific pathological activities of the genes, we developed Pathomap. It allows querying a gene to visualize, on a human body template, the intensity of pathological activities of a certain gene in a tissue specific manner. While the Human Cell Atlas project is still consolidating the gene expression patterns across healthy human tissues, Pathomap, as a parallel, provides insights into tissue specific pathological activity of genes. To achieve this, we searched 18 million PubMed papers published through May 2019 and automatically selected 4.5 million abstracts describing certain genes' functions in disease development. In addition, we fine-tuned the pretrained Bidirectional Encoder Representations from Transformers (BERT) for text modeling from the field of Natural Language Processing (NLP) in order to learn embeddings of entities such as genes, diseases, tissues, cell types, etc., in a way that preserves their relationship in a vector space. The reprogrammed BERT predicted disease-gene relationships not present in the training data, demonstrating the viability of in-silico formulation of hypotheses relating to diverse biological entities such as genes and disorders. Taken together, our works bring feature engineering approaches to bear in representing biological entities in low-dimensional space. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject cell data analysis en_US
dc.subject Dimensional representation of cells en_US
dc.subject Datasets used for validation en_US
dc.title Feature engineering for low dimensional representation of genes’ expression and pathological activities across diverse human tissues en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account