Abstract:
This project aims to unravel the complex genomic dynamics of COVID-19, which are critical for understanding its virulence and developing targeted therapeutic interventions. Our approachfocuses on meticulously analyzing the genomic sequences of the SARS-Cov-2 virus, which wasaccomplished using a transformer model trained on real-world SARS-CoV-2 sequences. The transformer model was trained on approximately 2 million sequences, which generated attention scores for genomic codons. These 2 million COVID-19 genomic sequences were aligned using the MAFFT tool. The DNA sequences were then divided into codons to facilitate mapping between aligned and real-world sequences. This mapping method carefully examined the distribution of attention scores across the sequences’ mutated and non-mutated regions.