Information extraction in user's complaints

T G, Narayanan; Akhtar, Md. Shad (Advisor)

Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1067

Full metadata record

DC Field	Value	Language
dc.contributor.author	T G, Narayanan	-
dc.contributor.author	Akhtar, Md. Shad (Advisor)	-
dc.date.accessioned	2023-04-03T10:47:10Z	-
dc.date.available	2023-04-03T10:47:10Z	-
dc.date.issued	2022-08	-
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1067	-
dc.description.abstract	The task of Named Entity Recognition is one of the most explored fields in the Natural Language Processing domain. Numerous existing works have tried to uncover different aspects of this common yet unique field. The NER task has been extended to a variety of domains (such as social media, judiciary, medical, or the general domain) and languages (English, European, Chinese, Hindi, etc.). However, each domain and language offer their core challenges particularly due to the syntactical complexities. In this research, we explore the named-entity recognition in the legal domain. Given a user’s complaint to report a crime, we intend to extract all relevant and necessary information considering the crime, victim, accused, etc. in an efficient manner. We collect publicly available user complaints and developed an entity and relationship annotated dataset, aka. Legal Document Processing (LDP) dataset. The instances of this dataset are densely annotated with more than fifty labels broadly revolving around victim and other crime details. Subsequently, we benchmark the dataset using multiple pre-trained language models and information extraction-based baselines. In particular, we finetune Multi-lingual BERT, Hindi- BERT, and HindiBERTa on the LDP dataset. Our evaluation shows Multi-lingual BERT reports the best performance among all baselines. The potential scope for the future work includes entity and relation-level knowledge graph creation as well as converting a user complaint to a technical and legal document.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIITD	en_US
dc.subject	Information extraction	en_US
dc.subject	Named entity recognition	en_US
dc.subject	Natural language processing	en_US
dc.subject	Criminal activities	en_US
dc.subject	LDPQuAD	en_US
dc.subject	BERT models	en_US
dc.title	Information extraction in user's complaints	en_US
dc.type	Thesis	en_US
Appears in Collections:	Year-2022

Files in This Item:

File	Description	Size	Format
MTP_T G Narayanan_MT20027.pdf		1.13 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets