Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1914
Full metadata record
DC FieldValueLanguage
dc.contributor.authorSrivastava, Anushka-
dc.contributor.authorShah, Rajiv Ratn (Advisor)-
dc.date.accessioned2026-04-17T11:09:22Z-
dc.date.available2026-04-17T11:09:22Z-
dc.date.issued2024-11-27-
dc.identifier.urihttp://repository.iiitd.edu.in/xmlui/handle/123456789/1914-
dc.description.abstractWe aim to leverage the use of LLMs in drug discovery to generate new molecules given the description of their properties. Zinc-250K dataset is used to build the dataset and various LLMs and SOTA Models are selected for inferencing the baseline tasks on the vanilla LLMs. These models are evaluated on the basis of three types of losses - token level loss, structural loss and property level loss. Evaluation metrics like validity, fragment similarity, scaffold similarity etc. are carefully studied and chosen for this task. These metrics are then used to evaluate the LLMs. It is observed that the LLMs do not perform well in preserving the structure of the molecule and cannot generate syntactically valid notations. It performs decently in generating molecules according to properties.en_US
dc.language.isoen_USen_US
dc.publisherIIIT-Delhien_US
dc.subjectDrug Discoveryen_US
dc.subjectLarge Language Modelsen_US
dc.subjectNatural Language Processingen_US
dc.titleMolecular generation and property prediction using LLMs with chemical feedback and alignmenten_US
dc.typeOtheren_US
Appears in Collections:Year-2024

Files in This Item:
File Description SizeFormat 
2022086_BTP_Report - Anushka Srivastava.pdf
  Restricted Access
777.62 kBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.