Molecular generation and property prediction using LLMs with chemical feedback and alignment

Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1914

Title:	Molecular generation and property prediction using LLMs with chemical feedback and alignment
Authors:	Srivastava, Anushka Shah, Rajiv Ratn (Advisor)
Keywords:	Drug Discovery Large Language Models Natural Language Processing
Issue Date:	27-Nov-2024
Publisher:	IIIT-Delhi
Abstract:	We aim to leverage the use of LLMs in drug discovery to generate new molecules given the description of their properties. Zinc-250K dataset is used to build the dataset and various LLMs and SOTA Models are selected for inferencing the baseline tasks on the vanilla LLMs. These models are evaluated on the basis of three types of losses - token level loss, structural loss and property level loss. Evaluation metrics like validity, fragment similarity, scaffold similarity etc. are carefully studied and chosen for this task. These metrics are then used to evaluate the LLMs. It is observed that the LLMs do not perform well in preserving the structure of the molecule and cannot generate syntactically valid notations. It performs decently in generating molecules according to properties.
URI:	http://repository.iiitd.edu.in/xmlui/handle/123456789/1914
Appears in Collections:	Year-2024

Files in This Item:

File	Description	Size	Format
2022086_BTP_Report - Anushka Srivastava.pdf Restricted Access		777.62 kB	Adobe PDF	View/Open Request a copy

DSpace JSPUI