Abstract:
This project focuses on developing a robust Named Entity Recognition (NER) system tailored for recipe ingredient phrases, a critical task in computational gastronomy. The proposed ap- proach employs transformer-based models like SpaCy and Flair, leveraging BIO encoding to handle complex multi-word entities. Extensive experiments were conducted on manually an- notated datasets and BIO-encoded entries, achieving a significant macro F1 score of 91.20 on oversampled data with RoBERTa-large. Challenges like dataset inconsistencies and imbalanced tag distributions were addressed through innovative strategies, including large language models (LLMs) for encoding and advanced hyperparameter tuning. The results demonstrate the efficacy of our methodology, offering insights for future applications in culinary datasets and beyond.