Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1475
Title: Uncovering intellectual property threats to language models
Authors: Singh, Srishti
Goyal, Vikram (Advisor)
Keywords: Natural Language Models
Security
Model extraction
Adversarial attacks
Active learning
Semi-supervised Learning
Issue Date: 29-Nov-2023
Abstract: The commercial use of Natural Language Processing (NLP) has gained significant popularity in recent years. Many companies train and deploy language models to perform tasks like Sentiment classification, Machine Translation etc. These models are published as black box APIs that charge the user per query. However, these models are vulnerable to Model Stealing attacks. In these attacks, the attacker repeatedly queries the API and uses the generated dataset to train a thief model. The thief model can closely replicate the input-output behaviour of the original model. This attack thus poses a serious intellectual property risk and compromises the accuracy and reliability of the original model. Previous work in this domain has focused primarily on Image Classification models. In this study, we show that it is possible to steal Text classification models using the same techniques. Our primary focus was conducting experiments to understand the impact of domain mismatch, model architecture variation, and query budget on extraction accuracy.
URI: http://repository.iiitd.edu.in/xmlui/handle/123456789/1475
Appears in Collections:Year-2023

Files in This Item:
File Description SizeFormat 
BTP_Sem2_Report - Srishti Singh.pdf
  Restricted Access
1.48 MBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.