Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1914
Title: Molecular generation and property prediction using LLMs with chemical feedback and alignment
Authors: Srivastava, Anushka
Shah, Rajiv Ratn (Advisor)
Keywords: Drug Discovery
Large Language Models
Natural Language Processing
Issue Date: 27-Nov-2024
Publisher: IIIT-Delhi
Abstract: We aim to leverage the use of LLMs in drug discovery to generate new molecules given the description of their properties. Zinc-250K dataset is used to build the dataset and various LLMs and SOTA Models are selected for inferencing the baseline tasks on the vanilla LLMs. These models are evaluated on the basis of three types of losses - token level loss, structural loss and property level loss. Evaluation metrics like validity, fragment similarity, scaffold similarity etc. are carefully studied and chosen for this task. These metrics are then used to evaluate the LLMs. It is observed that the LLMs do not perform well in preserving the structure of the molecule and cannot generate syntactically valid notations. It performs decently in generating molecules according to properties.
URI: http://repository.iiitd.edu.in/xmlui/handle/123456789/1914
Appears in Collections:Year-2024

Files in This Item:
File Description SizeFormat 
2022086_BTP_Report - Anushka Srivastava.pdf
  Restricted Access
777.62 kBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.