| dc.description.abstract |
Cell-free DNA (cfDNA) has emerged as a promising biomarker for non-invasive cancer diagnostics, offering a window into tumor-derived genetic and epigenetic information through simple blood sampling. However, the biological signals embedded in cfDNA, such as end motifs and nucleosome positioning, remain underutilized compared to conventional mutation-based assays. This thesis develops and validates computational frameworks that harness fragmentomic and chromatin features of cfDNA to enhance cancer detection, tissue-of-origin classification, and cross-platform model generalizability. Firstly, the study demonstrates that short sequence motifs at cfDNA fragment ends, particularly 6-mer patterns encode robust cancer-specific signatures. By applying machine learning classifiers to these motifs, the models achieved high accuracy in distinguishing cancer patients from healthy individuals and exhibited strong performance in differentiating among multiple cancer types. Secondly, to address platform variability and limited sample sizes, a model-based transfer learning strategy was implemented. Using decision tree adaptation techniques (SER and STRUT, with class imbalance-aware variants), models trained on Illumina sequencing platform were successfully transferred to data generated by alternative sequencing platforms, such as Nanopore sequencing, improving cross-domain prediction without retraining from scratch. Thirdly, this work investigates nucleosome occupancy patterns around transcription factor binding sites as an informative layer for tissue-of-origin inference. A custom computational pipeline quantified nucleosome positioning and chromatin accessibility signatures, which, when integrated with machine learning and feature reduction, provided accurate tumor-type classification and revealed biologically interpretable chromatin features relevant to cancer progression. Collectively, this thesis advances the field of cfDNA diagnostics by demonstrating that shallow, cost-effective sequencing combined with robust computational pipelines can deliver high diagnostic accuracy, interpretability, and adaptability across sequencing technologies. The developed approaches lay the groundwork for scalable, minimally invasive multi-cancer early detection and tissue-specific monitoring, supporting future implementation of cfDNA-based liquid biopsies in precision oncology. |
en_US |