Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1850
Title: Computational methods for cell-free DNA based diagnostics
Authors: Siraj, Mariyam
Kumar, Vibhor (Advisor)
Keywords: Cell-free DNA
machine learning (ML)
Issue Date: Jul-2025
Publisher: IIIT-Delhi
Abstract: Cell-free DNA (cfDNA) has emerged as a promising biomarker for non-invasive cancer diagnostics, offering a window into tumor-derived genetic and epigenetic information through simple blood sampling. However, the biological signals embedded in cfDNA, such as end motifs and nucleosome positioning, remain underutilized compared to conventional mutation-based assays. This thesis develops and validates computational frameworks that harness fragmentomic and chromatin features of cfDNA to enhance cancer detection, tissue-of-origin classification, and cross-platform model generalizability. Firstly, the study demonstrates that short sequence motifs at cfDNA fragment ends, particularly 6-mer patterns encode robust cancer-specific signatures. By applying machine learning classifiers to these motifs, the models achieved high accuracy in distinguishing cancer patients from healthy individuals and exhibited strong performance in differentiating among multiple cancer types. Secondly, to address platform variability and limited sample sizes, a model-based transfer learning strategy was implemented. Using decision tree adaptation techniques (SER and STRUT, with class imbalance-aware variants), models trained on Illumina sequencing platform were successfully transferred to data generated by alternative sequencing platforms, such as Nanopore sequencing, improving cross-domain prediction without retraining from scratch. Thirdly, this work investigates nucleosome occupancy patterns around transcription factor binding sites as an informative layer for tissue-of-origin inference. A custom computational pipeline quantified nucleosome positioning and chromatin accessibility signatures, which, when integrated with machine learning and feature reduction, provided accurate tumor-type classification and revealed biologically interpretable chromatin features relevant to cancer progression. Collectively, this thesis advances the field of cfDNA diagnostics by demonstrating that shallow, cost-effective sequencing combined with robust computational pipelines can deliver high diagnostic accuracy, interpretability, and adaptability across sequencing technologies. The developed approaches lay the groundwork for scalable, minimally invasive multi-cancer early detection and tissue-specific monitoring, supporting future implementation of cfDNA-based liquid biopsies in precision oncology.
URI: http://repository.iiitd.edu.in/xmlui/handle/123456789/1850
Appears in Collections:Year-2025

Files in This Item:
File Description SizeFormat 
MT23248_Mariyam Siraj.pdf2.03 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.