dc.description.abstract |
The task of Named Entity Recognition is one of the most explored fields in the Natural Language Processing domain. Numerous existing works have tried to uncover different aspects of this common yet unique field. The NER task has been extended to a variety of domains (such as social media, judiciary, medical, or the general domain) and languages (English, European, Chinese, Hindi, etc.). However, each domain and language offer their core challenges particularly due to the syntactical complexities. In this research, we explore the named-entity recognition in the legal domain. Given a user’s complaint to report a crime, we intend to extract all relevant and necessary information considering the crime, victim, accused, etc. in an efficient manner. We collect publicly available user complaints and developed an entity and relationship annotated dataset, aka. Legal Document Processing (LDP) dataset. The instances of this dataset are densely annotated with more than fifty labels broadly revolving around victim and other crime details. Subsequently, we benchmark the dataset using multiple pre-trained language models and information extraction-based baselines. In particular, we finetune Multi-lingual BERT, Hindi- BERT, and HindiBERTa on the LDP dataset. Our evaluation shows Multi-lingual BERT reports the best performance among all baselines. The potential scope for the future work includes entity and relation-level knowledge graph creation as well as converting a user complaint to a technical and legal document. |
en_US |