IIIT-Delhi Institutional Repository

LLM-assisted ontology mapping for semantic interoperability in structured biomedical data

Show simple item record

dc.contributor.author S, Prasanna Kumar
dc.contributor.author Sethi, Tavpritesh (Advisor)
dc.date.accessioned 2026-04-17T10:21:58Z
dc.date.available 2026-04-17T10:21:58Z
dc.date.issued 2025-06
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1912
dc.description.abstract The exponential growth of biomedical data promises new insights, but semantic heterogeneity and inconsistent metadata limit reuse. In practice, many publicly available datasets (e.g., tabular datasets from Figshare or Zenodo) are annotated with non-standardized field names, violating Findable Accessible Interoperable Reusable criteria (FAIR). To bridge this gap, we propose a framework for FAIR Assessment using Ontology Mapping and large language models (LLMs), that assesses and enhances interoperability of such “not-so-FAIR” datasets. First, we quantify dataset FAIRness by mapping variables to standard clinical terms - Systematized Medical Nomenclature for Medicine Clinical Terms (SNOMED CT) – a comprehensive ontology widely used for semantic interoperability. Then we explore the use of large language models – specifically Mistral and LLaMA – to improve SNOMED CT term mapping coverage and disambiguation for dataset fields. We prompt these large language models with field context and compare their predicted SNOMED terms to ground-truth concepts (baseline: Medical Concept Annotation Tool). Our experiments on diverse clinical datasets show that large language models can significantly augment automated ontology mapping and reduce semantic mismatches. Taken together, this work presents a principled approach that integrates ontology-based FAIR assessment with LLM-driven harmonization to close the semantic gap in biomedical data integration. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Biomedical Data en_US
dc.subject Semantic Interoperability en_US
dc.subject FAIR Data Principles en_US
dc.subject Data Standardization en_US
dc.subject SNOMED CT en_US
dc.subject Ontology Mapping en_US
dc.subject Large Language Models en_US
dc.title LLM-assisted ontology mapping for semantic interoperability in structured biomedical data en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account