IIIT-Delhi Institutional Repository

Statistical and machine learning-based approaches to precise characterization of cellular phenotypes

Show simple item record

dc.contributor.author Gupta, Krishan
dc.contributor.author Sengupta, Debarka (Advisor)
dc.contributor.author Ghosh, Abhik (Advisor)
dc.contributor.author Ahuja, Gaurav (Advisor)
dc.date.accessioned 2021-11-13T05:27:18Z
dc.date.available 2021-11-13T05:27:18Z
dc.date.issued 2021-10
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/944
dc.description.abstract Delineation of the complex layers of biological system requires a cumulative effort from multiple disciplines of science. The present thesis work utilizes some of the interdisciplinary approaches by combining the automation and accuracy of computation to the in-depth concepts of Biology. In my thesis, I have addressed three fundamental biological problems. In one of my initial projects, I developed a computational framework by utilizing Machine Learning-based approach to build a classification model for the detection of Circulating Tumor Cells (CTCs). Moreover, I validated the authenticity of our model on a large number of publicly available scRNA-seq datasets and a newly generated CTC dataset of breast tumour cells, captured using a newly developed microfluidic system for label-free enrichment of CTCs. In my second project, I utilized single cell genomics approach coupled with stringent statistical and structural biology frameworks to dissect the cellular basis of the loss of smell in COVID-19 infected patients. Of note, one of the prevalent, but largely ignored symptoms during the early COVID-19 pandemic was the loss of smell and taste. Our work utilized the known information about the viral entry proteins, and viral-human protein-protein interaction map. Our integrative analysis clearly suggests that the non-sensory (sustentacular, Globolar Basal Cells and Bow-man’s gland) cell-types are vulnerable to SARS-CoV-2 infection. In my third project, I explored the potential of modelling expression-ranks, as robust surrogates for transcript abundance. Here I examined the Discrete Generalized Beta Distribution (DGBD) performance on real data and devised a Wald type test to compare gene expression between two phenotypically divergent groups of single cells. We carried out a comprehensive assessment of the proposed method, to understand its advantages as compared to some of the current best practice approaches. In addition to striking a reasonable balance between Type 1 and Type 2 errors, we concluded that with increasing sample size, Rank Order- Sequencing (ROSeq), the proposed differential expression test, is remarkably robust for expression noise and scales rapidly. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject RNA species en_US
dc.subject Single-cell expression studies en_US
dc.subject Pseudotemporal analysis en_US
dc.subject Single Circulating Tumor Cells (CTCs) en_US
dc.title Statistical and machine learning-based approaches to precise characterization of cellular phenotypes en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account