Molecular generation and property prediction using LLMs with chemical feedback and alignment

Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1914

Full metadata record

DC Field	Value	Language
dc.contributor.author	Srivastava, Anushka	-
dc.contributor.author	Shah, Rajiv Ratn (Advisor)	-
dc.date.accessioned	2026-04-17T11:09:22Z	-
dc.date.available	2026-04-17T11:09:22Z	-
dc.date.issued	2024-11-27	-
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1914	-
dc.description.abstract	We aim to leverage the use of LLMs in drug discovery to generate new molecules given the description of their properties. Zinc-250K dataset is used to build the dataset and various LLMs and SOTA Models are selected for inferencing the baseline tasks on the vanilla LLMs. These models are evaluated on the basis of three types of losses - token level loss, structural loss and property level loss. Evaluation metrics like validity, fragment similarity, scaffold similarity etc. are carefully studied and chosen for this task. These metrics are then used to evaluate the LLMs. It is observed that the LLMs do not perform well in preserving the structure of the molecule and cannot generate syntactically valid notations. It performs decently in generating molecules according to properties.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	Drug Discovery	en_US
dc.subject	Large Language Models	en_US
dc.subject	Natural Language Processing	en_US
dc.title	Molecular generation and property prediction using LLMs with chemical feedback and alignment	en_US
dc.type	Other	en_US
Appears in Collections:	Year-2024

Files in This Item:

File	Description	Size	Format
2022086_BTP_Report - Anushka Srivastava.pdf Restricted Access		777.62 kB	Adobe PDF	View/Open Request a copy

DSpace JSPUI