Abstract:
The field of bioinformatics is dynamically evolving, facing challenges with the escalating quantity of genomic data. An emerging challenge involves the rapid expansion of Escherichia coli genomes, demanding efficient yet accurate approaches for genomic epidemiology. In the PATRIC database, housing over 53,000 genomic sequences, only around 8,000 are annotated for antibiotic sensitivity or resistance using traditional laboratory methods. Our current focus involves utilizing the HMM algorithm for E. coli genome annotation, complementing Patric’s ML-based adaboost classifier. In parallel, we are concurrently exploring a Machine Learning (ML) approach employing Convolutional Neural Networks (CNN). The dataset for this ML approach is sourced from a research paper, enhancing our investigation with diverse and comprehensive genomic information. While Patric has successfully employed computational methods for annotating Mycobacterium tuberculosis and Staphylococcus aureus genomes, they have not yet applied these methods to annotate Escherichia coli genomes. Our research underscores the potential of computational methodologies for bacterial genome annotation, contributing to the broader landscape of bacterial genomics.