Abstract:
The escalating volume and intricate nature of healthcare and social care datasets necessitate the implementation of unconventional feature learning strategies to tackle current challenges. The examination of health and biological datasets enables the assessment of existing computational techniques and fosters the creation of novel algorithms and methodologies that can be applied to difficulties in other fields. By employing these concepts, we have not only developed novel algorithms but also performed meticulous analysis to tackle concerns pertaining to healthcare and social care. It should be noted that the methodologies and analysis processes can be adapted to accommodate supplementary datasets featuring diverse data types and formats. Firstly, the authors present a novel manifold learning algorithm, named "Topological Preservation and Distance Scaling" (TPDS), which aims to enhance classification and visualization of high-dimensional datasets. The proposed method addresses the challenge of the "curse of dimensionality". The methodology aims to maintain the hierarchical structure of data by preserving both local topology and distances during linear and non-linear dimension reduction. This approach is designed to prevent the collapse of data points in visualization. In the second study, the authors present a novel matrix factorization-based manifold learning algorithm, "Network Inference in Reduced Dimensions” (NIRD), for inferring very large regulatory networks with a very large number of features. The study revealed that the proposed approach exhibited superior performance compared to existing methods, namely GENIE-3 and GrnBoost2, in terms of both the computational time required to infer the network and the accuracy of estimated edges or connections. The objective was to deduce intricate dependency and regulatory networks that encompass a vast number of dimensions, with the aim of capturing non-linear dependencies among random variables. Subsequently, two causal discovery analyses were conducted on high-dimensional healthcare datasets to infer "explainable" associations and estimate public health concerns, such as the prevalence of mental health. The hypothesis posits that the utilization of generative probabilistic graphical models, specifically the Bayesian network and Markov network, in tandem with the Markov blanket concept of feature learning may yield greater interpretability. The initial investigation involved the utilization of survey data collected from a diverse group of American adults, encompassing various age groups, genders, and socioeconomic statuses. In contrast, the subsequent inquiry employed data from the Longitudinal Ageing Study in India (LASI) Wave-1 survey, which focused on elderly individuals residing in India. The methodology employed facilitated the determination of the most relevant attributes (driver factors) that could effectively represent the incidence of mental health disorders in both studies. The features chosen by our approach demonstrate relevance in facilitating actionable interventions aimed at promoting mental health well-being among adults during pandemic-induced lockdowns, as well as among elderly individuals in India.