IIIT-Delhi Institutional Repository

Uncovering intellectual property threats to language models

Show simple item record

dc.contributor.author Singh, Srishti
dc.contributor.author Goyal, Vikram (Advisor)
dc.date.accessioned 2024-05-16T08:27:56Z
dc.date.available 2024-05-16T08:27:56Z
dc.date.issued 2023-11-29
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1475
dc.description.abstract The commercial use of Natural Language Processing (NLP) has gained significant popularity in recent years. Many companies train and deploy language models to perform tasks like Sentiment classification, Machine Translation etc. These models are published as black box APIs that charge the user per query. However, these models are vulnerable to Model Stealing attacks. In these attacks, the attacker repeatedly queries the API and uses the generated dataset to train a thief model. The thief model can closely replicate the input-output behaviour of the original model. This attack thus poses a serious intellectual property risk and compromises the accuracy and reliability of the original model. Previous work in this domain has focused primarily on Image Classification models. In this study, we show that it is possible to steal Text classification models using the same techniques. Our primary focus was conducting experiments to understand the impact of domain mismatch, model architecture variation, and query budget on extraction accuracy. en_US
dc.language.iso en_US en_US
dc.subject Natural Language Models en_US
dc.subject Security en_US
dc.subject Model extraction en_US
dc.subject Adversarial attacks en_US
dc.subject Active learning en_US
dc.subject Semi-supervised Learning en_US
dc.title Uncovering intellectual property threats to language models en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account