Pixel to plate

Awasthy, Ieshaan; Dabas, Gunjan; Bagler, Ganesh (Advisor)

Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1975

Title:	Pixel to plate
Authors:	Awasthy, Ieshaan Dabas, Gunjan Bagler, Ganesh (Advisor)
Keywords:	Object Detection YOLO Recipe Generation Large Language Models Natural Language Processing
Issue Date:	13-Dec-2024
Publisher:	IIIT-Delhi
Abstract:	This project, Pixel to Plate, aims to bridge computer vision and natural language processing to automate recipe generation from images of ingredients. The first phase of the project focuses on object detection, employing state-of-the-art YOLO models to accurately identify ingredients in the AI-Cook dataset. Comprehensive exploratory data analysis (EDA) was conducted to address dataset quality, class imbalance, and object co-occurrence patterns. Among the tested models, YOLOv8x demonstrated superior performance with a precision of 0.970, making it the chosen model for ingredient detection. The second phase evaluates four large language models (LLaMA, Falcon, GEMMA, and Phi) for recipe generation based on detected ingredients. Models were assessed in a zero-shot setting for coherence, completeness, and relevance. The analysis revealed that LLaMA outperformed the others, producing recipes with logical structure, meaningful use of ingredients, and balanced food combinations. This interdisciplinary effort highlights the potential of combining advanced computer vision and language models for culinary applications, paving the way for automated recipe generation systems that could transform personalized cooking experiences. The findings underscore the importance of model selection, data quality, and task-specific evaluation metrics in achieving reliable results.
URI:	http://repository.iiitd.edu.in/xmlui/handle/123456789/1975
Appears in Collections:	Year-2024

Files in This Item:

File	Description	Size	Format
btp_report - Gunjan Dabas.pdf Restricted Access		2.13 MB	Adobe PDF	View/Open Request a copy

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets