Information extraction in user's complaints

T G, Narayanan; Akhtar, Md. Shad (Advisor)

dc.contributor.author	T G, Narayanan
dc.contributor.author	Akhtar, Md. Shad (Advisor)
dc.date.accessioned	2023-04-03T10:47:10Z
dc.date.available	2023-04-03T10:47:10Z
dc.date.issued	2022-08
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1067
dc.description.abstract	The task of Named Entity Recognition is one of the most explored fields in the Natural Language Processing domain. Numerous existing works have tried to uncover different aspects of this common yet unique field. The NER task has been extended to a variety of domains (such as social media, judiciary, medical, or the general domain) and languages (English, European, Chinese, Hindi, etc.). However, each domain and language offer their core challenges particularly due to the syntactical complexities. In this research, we explore the named-entity recognition in the legal domain. Given a user’s complaint to report a crime, we intend to extract all relevant and necessary information considering the crime, victim, accused, etc. in an efficient manner. We collect publicly available user complaints and developed an entity and relationship annotated dataset, aka. Legal Document Processing (LDP) dataset. The instances of this dataset are densely annotated with more than fifty labels broadly revolving around victim and other crime details. Subsequently, we benchmark the dataset using multiple pre-trained language models and information extraction-based baselines. In particular, we finetune Multi-lingual BERT, Hindi- BERT, and HindiBERTa on the LDP dataset. Our evaluation shows Multi-lingual BERT reports the best performance among all baselines. The potential scope for the future work includes entity and relation-level knowledge graph creation as well as converting a user complaint to a technical and legal document.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIITD	en_US
dc.subject	Information extraction	en_US
dc.subject	Named entity recognition	en_US
dc.subject	Natural language processing	en_US
dc.subject	Criminal activities	en_US
dc.subject	LDPQuAD	en_US
dc.subject	BERT models	en_US
dc.title	Information extraction in user's complaints	en_US
dc.type	Thesis	en_US