IIIT-Delhi Institutional Repository

Learning from high dimensional healthcare data to improve interpretability and insights

Show simple item record

dc.contributor.author Jha, Indra Prakash
dc.contributor.author Kumar, Vibhor (Advisor)
dc.date.accessioned 2024-10-10T09:54:38Z
dc.date.available 2024-10-10T09:54:38Z
dc.date.issued 2023-05
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1699
dc.description.abstract The escalating volume and intricate nature of healthcare and social care datasets necessitate the implementation of unconventional feature learning strategies to tackle current challenges. The examination of health and biological datasets enables the assessment of existing computational techniques and fosters the creation of novel algorithms and methodologies that can be applied to difficulties in other fields. By employing these concepts, we have not only developed novel algorithms but also performed meticulous analysis to tackle concerns pertaining to healthcare and social care. It should be noted that the methodologies and analysis processes can be adapted to accommodate supplementary datasets featuring diverse data types and formats. Firstly, the authors present a novel manifold learning algorithm, named "Topological Preservation and Distance Scaling" (TPDS), which aims to enhance classification and visualization of high-dimensional datasets. The proposed method addresses the challenge of the "curse of dimensionality". The methodology aims to maintain the hierarchical structure of data by preserving both local topology and distances during linear and non-linear dimension reduction. This approach is designed to prevent the collapse of data points in visualization. In the second study, the authors present a novel matrix factorization-based manifold learning algorithm, "Network Inference in Reduced Dimensions” (NIRD), for inferring very large regulatory networks with a very large number of features. The study revealed that the proposed approach exhibited superior performance compared to existing methods, namely GENIE-3 and GrnBoost2, in terms of both the computational time required to infer the network and the accuracy of estimated edges or connections. The objective was to deduce intricate dependency and regulatory networks that encompass a vast number of dimensions, with the aim of capturing non-linear dependencies among random variables. Subsequently, two causal discovery analyses were conducted on high-dimensional healthcare datasets to infer "explainable" associations and estimate public health concerns, such as the prevalence of mental health. The hypothesis posits that the utilization of generative probabilistic graphical models, specifically the Bayesian network and Markov network, in tandem with the Markov blanket concept of feature learning may yield greater interpretability. The initial investigation involved the utilization of survey data collected from a diverse group of American adults, encompassing various age groups, genders, and socioeconomic statuses. In contrast, the subsequent inquiry employed data from the Longitudinal Ageing Study in India (LASI) Wave-1 survey, which focused on elderly individuals residing in India. The methodology employed facilitated the determination of the most relevant attributes (driver factors) that could effectively represent the incidence of mental health disorders in both studies. The features chosen by our approach demonstrate relevance in facilitating actionable interventions aimed at promoting mental health well-being among adults during pandemic-induced lockdowns, as well as among elderly individuals in India. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Health care en_US
dc.title Learning from high dimensional healthcare data to improve interpretability and insights en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account