Abstract:
Influenza A, an infectious viral disease affecting the lungs, is a significant public health concern. It has already caused four pandemics in the past, and some strains are now seasonal. Being zoonotic, the virus is transmitted to humans from birds, which are usually aquatic, and swine and other mammals serve as intermediate hosts for its transmission. When present in aquatic birds, the virus is asymptomatic, predicting zoonotic strains that have the potential to cause an outbreak in humans. Gradually, this virus experiences host-adaptive mutations or reassortments in its genome, resulting in different variants which might trigger global health emergencies. Therefore, recognizing zoonotic strains that can cause an outbreak in humans and their origin is the need of the hour. In this analysis, we have devised a machine learning method to predict infectious strains of the Influenza A virus from avians/mammals to humans. The training and validation of the 15 protein sequence was conducted on data obtained from the Influenza Research Database. Random forest-based models using composition-based features attained maximum AUC for the 15 proteins ranging from 0.93 to 0.98 on the validation dataset. On training and validation datasets, the haemagglutinin (HA) protein has the highest AUC of 0.98. We have formulated an in-silico tool for the prediction of infectious strains from protein sequences as a service to the scientific community. The best models were incorporated in our web server named FluSPred which can be accessed freely at “https://webs.iiitd.edu.in/raghava/fluspred/”. We expect that this research will assist in prioritizing high-risk viral strains hereafter and analyze the risk of a novel influenza virus emergence. This tool can be integrated with early warning systems and is beneficial for pandemic preparedness, disease surveillance, and determining the overall public-health impact.