Abstract:
Delineation of the complex layers of biological system requires a cumulative effort from multiple disciplines of science. The present thesis work utilizes some of the interdisciplinary approaches by combining the automation and accuracy of computation to the in-depth concepts of Biology. In my thesis, I have addressed three fundamental biological problems. In one of my initial projects, I developed a computational framework by utilizing Machine Learning-based approach to build a classification model for the detection of Circulating Tumor Cells (CTCs). Moreover, I validated the authenticity of our model on a large number of publicly available scRNA-seq datasets and a newly generated CTC dataset of breast tumour cells, captured using a newly developed microfluidic system for label-free enrichment of CTCs. In my second project, I utilized single cell genomics approach coupled with stringent statistical and structural biology frameworks to dissect the cellular basis of the loss of smell in COVID-19 infected patients. Of note, one of the prevalent, but largely ignored symptoms during the early COVID-19 pandemic was the loss of smell and taste. Our work utilized the known information about the viral entry proteins, and viral-human protein-protein interaction map. Our integrative analysis clearly suggests that the non-sensory (sustentacular, Globolar Basal Cells and Bow-man’s gland) cell-types are vulnerable to SARS-CoV-2 infection. In my third project, I explored the potential of modelling expression-ranks, as robust surrogates for transcript abundance. Here I examined the Discrete Generalized Beta Distribution (DGBD) performance on real data and devised a Wald type test to compare gene expression between two phenotypically divergent groups of single cells. We carried out a comprehensive assessment of the proposed method, to understand its advantages as compared to some of the current best practice approaches. In addition to striking a reasonable balance between Type 1 and Type 2 errors, we concluded that with increasing sample size, Rank Order- Sequencing (ROSeq), the proposed differential expression test, is remarkably robust for expression noise and scales rapidly.