Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1067
Full metadata record
DC FieldValueLanguage
dc.contributor.authorT G, Narayanan-
dc.contributor.authorAkhtar, Md. Shad (Advisor)-
dc.date.accessioned2023-04-03T10:47:10Z-
dc.date.available2023-04-03T10:47:10Z-
dc.date.issued2022-08-
dc.identifier.urihttp://repository.iiitd.edu.in/xmlui/handle/123456789/1067-
dc.description.abstractThe task of Named Entity Recognition is one of the most explored fields in the Natural Language Processing domain. Numerous existing works have tried to uncover different aspects of this common yet unique field. The NER task has been extended to a variety of domains (such as social media, judiciary, medical, or the general domain) and languages (English, European, Chinese, Hindi, etc.). However, each domain and language offer their core challenges particularly due to the syntactical complexities. In this research, we explore the named-entity recognition in the legal domain. Given a user’s complaint to report a crime, we intend to extract all relevant and necessary information considering the crime, victim, accused, etc. in an efficient manner. We collect publicly available user complaints and developed an entity and relationship annotated dataset, aka. Legal Document Processing (LDP) dataset. The instances of this dataset are densely annotated with more than fifty labels broadly revolving around victim and other crime details. Subsequently, we benchmark the dataset using multiple pre-trained language models and information extraction-based baselines. In particular, we finetune Multi-lingual BERT, Hindi- BERT, and HindiBERTa on the LDP dataset. Our evaluation shows Multi-lingual BERT reports the best performance among all baselines. The potential scope for the future work includes entity and relation-level knowledge graph creation as well as converting a user complaint to a technical and legal document.en_US
dc.language.isoen_USen_US
dc.publisherIIITDen_US
dc.subjectInformation extractionen_US
dc.subjectNamed entity recognitionen_US
dc.subjectNatural language processingen_US
dc.subjectCriminal activitiesen_US
dc.subjectLDPQuADen_US
dc.subjectBERT modelsen_US
dc.titleInformation extraction in user's complaintsen_US
dc.typeThesisen_US
Appears in Collections:Year-2022

Files in This Item:
File Description SizeFormat 
MTP_T G Narayanan_MT20027.pdf1.13 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.