Abstract:
This project aims to develop computational pipelines to cluster genomes based on the similarity in their functional signatures and to identify functionally redundant genome clusters. This initiative has built an extensive database of functional feature count matrices for around 70,000 high-quality metagenomes from their annotation files. The database will give information about several functional features like the Carbohydrate active enzymes for different microbial species, Orthologous groups, KEGG reaction pathways, etc. Locality-sensitive hashing techniques have been deployed to find the cluster of functionally similar genomes/taxa. This framework will be implemented to develop better diagnostic methodologies for the human gut microbiome.