dc.description.abstract |
Data analytics and computational techniques applied to biological sciences aid rapid technological advances, swift discoveries, and reliable analysis. A broad range of bountiful tools and algorithms have played pivotal roles in a variety of biological applications. One such class of algorithms: "Matrix completion", motivated from recommender systems, has been used to solve different kinds of biological problems. This dissertation proposes the use of novel low-rank matrix completion algorithms and their variants as a contribution for two fields: scRNA-sequencing and drug re-positioning. Specifically, biological problems such as scRNA-seq imputation, drug-target interaction prediction, drug- disease association prediction, and the most motivating one, virus-drug prediction (pro-posed to contribute towards a cure for COVID-19) have been modeled as matrix completion frameworks bridging the gap between two seemingly disjoint re-search fields, collaborative filtering, and bioinformatics, initiating a symbiotic or deep collaborative relationship between the two. Firstly, this dissertation proposes one of the early tools for the imputation of scRNA-seq gene expression data. The single-cell RNA seq technology allows the measurement of gene expression at a single-cell resolution but has a dis ad-vantage of a low amount of mRNA in individual cells. This eventually leads to dropouts in the single-cell gene expression data hindering the single-cell down-stream analysis. We handle the dropouts problem by modeling scRNA-seq im-putation as a missing-value prediction problem, employing a novel deep matrix completion framework.
The second contribution is largely incremental in terms of biological application but novel when looked at from an algorithmic perspective. With the aim of drug re-positioning/drug re-purposing (predicting new targets/diseases for exist-ing drugs), we propose techniques for drug-disease association and drug-target interaction prediction. Both take into account the side-information associated with the drug and target entities and deploy graph regularized matrix completion frameworks for the aforesaid tasks. Apart from this, the third application has consequentially sprouted from the algorithmic contributions of this thesis which finds its direct mapping to predict anti-viral treatments/effective against SARS-Cov-2. We put forward a matrix completion framework based on a manually curated drug-virus association dataset, which uses variants of matrix completion methods (including the pro-posed ones) for virus-drug association prediction. This work interestingly covers the entire spectrum of tasks ranging from data curation to algorithms and biological implications. The fourth and the last contribution of this dissertation is a new framework which can collaboratively perform matrix completion, finding its application in imputation on combined proteomics and transcriptomics data obtained from RNA sequencing methods such as CITE-seq in which the RNA data is expected to have relatively more dropouts (due to higher amounts of protein in a cell). |
en_US |