IIIT-Delhi Institutional Repository

Information extraction in user's complaints

Show simple item record

dc.contributor.author T G, Narayanan
dc.contributor.author Akhtar, Md. Shad (Advisor)
dc.date.accessioned 2023-04-03T10:47:10Z
dc.date.available 2023-04-03T10:47:10Z
dc.date.issued 2022-08
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1067
dc.description.abstract The task of Named Entity Recognition is one of the most explored fields in the Natural Language Processing domain. Numerous existing works have tried to uncover different aspects of this common yet unique field. The NER task has been extended to a variety of domains (such as social media, judiciary, medical, or the general domain) and languages (English, European, Chinese, Hindi, etc.). However, each domain and language offer their core challenges particularly due to the syntactical complexities. In this research, we explore the named-entity recognition in the legal domain. Given a user’s complaint to report a crime, we intend to extract all relevant and necessary information considering the crime, victim, accused, etc. in an efficient manner. We collect publicly available user complaints and developed an entity and relationship annotated dataset, aka. Legal Document Processing (LDP) dataset. The instances of this dataset are densely annotated with more than fifty labels broadly revolving around victim and other crime details. Subsequently, we benchmark the dataset using multiple pre-trained language models and information extraction-based baselines. In particular, we finetune Multi-lingual BERT, Hindi- BERT, and HindiBERTa on the LDP dataset. Our evaluation shows Multi-lingual BERT reports the best performance among all baselines. The potential scope for the future work includes entity and relation-level knowledge graph creation as well as converting a user complaint to a technical and legal document. en_US
dc.language.iso en_US en_US
dc.publisher IIITD en_US
dc.subject Information extraction en_US
dc.subject Named entity recognition en_US
dc.subject Natural language processing en_US
dc.subject Criminal activities en_US
dc.subject LDPQuAD en_US
dc.subject BERT models en_US
dc.title Information extraction in user's complaints en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account