IIIT-Delhi Institutional Repository

Collaborative and cross-modal distillation for large language models

Show simple item record

dc.contributor.author Dixit, Shantanu
dc.contributor.author Akhtar, Md Shad (Advisor)
dc.date.accessioned 2025-06-23T09:46:27Z
dc.date.available 2025-06-23T09:46:27Z
dc.date.issued 2023-11-29
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1755
dc.description.abstract Knowledge distillation is a technique that involves transferring knowledge from a larger teacher model to a smaller student model. The latest developments in meta-learning-based knowledge distillation emphasize the significance of fine-tuning the teacher models while taking into account the student’s need for better knowledge distillation. Nevertheless, current MetaKD methods frequently fail to provide incentives for the teacher model to improve itself. We introduce a meta-policy distillation technique aiming to foster both collaboration and competition during the fine-tuning of the teacher model within the meta-learning phase. Additionally, we put forth a curriculum learning framework tailored for the student model within a competitive setting. In this context, the student model endeavors to surpass the teacher model through self-training on a diverse range of tasks. We conduct extensive experiments on two NLU benchmarks GLUE andSuperGLUE [74, 75] and validate our methodology’s effectiveness against various KD techniques.As an extension to the above work we further explore the setting where teacher-student modalities differ (ex: text-vision and vice-versa). Existing cross-modal distillation approaches predominantly utilize modality-dependent features for knowledge distillation, and therefore, fail to adaptively learn the abstractions in different modalities. we propose a generic and modality-agnostic cross-modal distillation technique that can distil knowledge from any arbitrary cross-modal open or closed teacher model to any arbitrary student model in any different modality. Our empirical studies encompass eight natural language understanding tasks and an image classification task, showcasing the efficacy of cross-modal distillation in enhancing the performance of student models. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Model Compression en_US
dc.subject Knowledge Distillation en_US
dc.subject Meta Knowledge Distillation en_US
dc.subject Cross Modal Knowledge Distillation en_US
dc.title Collaborative and cross-modal distillation for large language models en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account