| dc.description.abstract |
Microbiome data is often challenged by the fact that the obtained microbiome compositional profile may not be either the true or the ideal one. There is a problem certain that species may not be detected because of being either too low in number or their genomic content being inefficiently extracted during sample processing. This results in artefactual zero values. There is also a likelihood of sub-optimality where the actual state of the microbiome may be different from the ideal state depending upon the community-composition of the resident members. While the issue of artefactual zeros may be pronounced for rarer low abundant taxa, the prevalent or the core-gut-associated taxa may be more affected by the issue of sub-optimality. While addressing the first issue can help to improve biological interpretations from microbiome data, investigating the second aspect can have multiple benefits, ranging from identifying unstable microbiome states, identifying response to probiotic therapies as well as in discerning microbiome-associated disease associations.Here, we propose a deep learning-based imputation framework designed explicitly for microbiome data using a Denoising Autoencoder (DAE) to address both these issues. This approach leverages neural networks to capture non-linear patterns and complexity within species abundance distributions and effectively predict missing values for rarer species and sub-optimal or over-representation values for core-associated species. The framework was generated and trained on a large microbiome dataset (Abundance profile) with 44,943 samples and 354 features representing microbial species. For the imputation function, the framework's performance was evaluated using statistical approaches including Bray-Curtis similarity and Spearman correlation and compared with current existing imputation methods, including GemIMP and DeepImpute using both a validation as well as simulated microbiome datasets. The DAE consistently outperformed these alternative strategies and obtained the strongest correlation. For investigating unstable and responsive-ness to probiotic interventions, we extended our analysis by introducing species-specific receptive-scores and overall-microbiome state-stability-scores, derived from the DAE framework. Using data from an intervention trial involving a Bifidobacterium longum probiotic, we show that the DAE framework is able to distinguish between ‘Persisters’ (or Responders) or ‘Non-Persisters’ (or Non-Responders) using B. longum receptive scores. Furthermore, using data from a population-level longitudinal Swedish cohort, we show the ability of the microbiome-state-stability-scores to differentiate between stable and unstable microbiome states. The research summarizes findings of a practicaldeep learning approach to categorize single time-point microbiome states and facilitate imputation of rarer species. It is scalable, can enhance the data quality for subsequent studies and contribute to more reliable and precise microbiome analyses. |
en_US |