Abstract:
Cancer has become the second leading cause of mortality worldwide, and early de-tection and adequate treatment are crucial in reducing the cancer burden. Metastasis,which involves malignant cells detaching from the primary tumor and colonizing otherdistant organs, is the leading cause of cancer-related deaths. The microenvironment,immune cells, stromal cells, and drug selection pressures influence tumors’ hetero-geneity and dynamicity, making it challenging to select the most effective treatmentapproach throughout the entire course of the disease. Liquid biopsy and single-celltranscriptomics have emerged as promising techniques for cancer detection. Bodily flu-ids such as blood, urine, and saliva provide rich biomarkers. Circulating tumor cellsand other tumor-associated cell products have been identified in the bloodstream, pro-viding potential biomarkers for cancer detection. Through serial blood analysis, liquidbiopsy techniques can help track spatial and temporal heterogeneity in tumor biology.Characterizing circulating tumor cells (CTCs) provides essential biological informa-tion about the disease as they are the primary live tumor cells responsible for metas-tasis. Existing CTC detection methods rely on surface markers, which may be shedduring the epithelial-to-mesenchymal (EMT) process or due to various stressors in theblood. Therefore, marker-free detection and characterization of CTCs are necessary.To achieve the best possible outcomes, it is crucial to manage cancer and any clinicalfactors that may impact treatment response or contribute to disease relapse. By identi-fying and addressing these factors, healthcare providers can develop effective treatmentplans and improve overall cancer management. This approach can help patients achievelonger-term remission and better quality of life.Over the past two decades, machine learning (ML) has shown tremendous potentialin enhancing cancer diagnosis and treatment accuracy and efficiency. Our researchleverages the power of ML to address the pressing need for timely cancer detectionand optimal management of the disease. By employing advanced ML algorithms, weaimed to improve the accuracy and speed of cancer diagnosis, identify the most effectivetreatment options, and enable personalized cancer care. For marker-free detection and characterization of CTCs, we created a novel unsu-pervised clustering algorithm, unCTC, which can leverage single-cell transcriptomicdata to detect and characterize CTCs. In unCTC, a wide range of computational andstatistical modules are integrated, such as novel Deep Dictionary Learning with k-meansClustering Cost (DDLK) approach for scRNA-Seq clustering, expression-based infer-ence of copy number variation (CNV), and combinatorial, marker-based validation ofmalignant phenotypes. DDLK provides a robust separation of circulating tumor cells(CTCs) and white blood cells (WBCs) in the pathway space, unlike the gene expressionspace. The utility of unCTC was validated on single-cell RNA sequencing (scRNA-Seq)profiles of breast CTCs from six patients.