IIIT-Delhi Institutional Repository

Indian legal case judgment document mining : semantic segmentation and information extraction

Show simple item record

dc.contributor.author Das, Antara
dc.contributor.author Goyal, Vikram (Advisor)
dc.date.accessioned 2024-09-21T10:31:37Z
dc.date.available 2024-09-21T10:31:37Z
dc.date.issued 2024-05-01
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1678
dc.description.abstract Developing NLP-based techniques to automate tasks in the Indian legal domain is highly demanding due to the enormously increasing volume of legal text documents, intricate legal terminologies, and the need for efficient information retrieval and document analysis for legal professionals. These techniques streamline extensive processing, extraction, and understanding of legal information, aiding more productivity within the judicial framework. In this work, we have experimented with two tasks: Task 1 deals with extracting eight legal domain-specific named entities from the Indian court judgment texts, and Task 2 is on semantic segmentation of Indian case judgment documents into different functional or rhetorical components such as Facts, Arguments, Judgment statement etc. We have introduced two new large corpora for each task, which enabled us to experiment with different transformer-based models. For Task 1, we propose a hybrid approach combining a BERT-CRF model for token classification and uniquely designed rule-based information extraction. The semantic segmentation task can be modelled in two ways: a high-level approach automatically segregates a given text document into multiple functional chunks using a subtask called Label Shift Prediction, and another detailed approach classifies the rhetorical roles of those text chunks. We have extensively experimented on Task 2 to improve prior research by introducing different ways to incorporate the Label Shift Prediction task to enhance the hierarchical BERT-based approach of the rhetorical role identification task. Also, in Task 2, we worked on a dataset with more fine-grained RR labels and huge label imbalances and significantly improved the performance of rare labels using a dynamically weighted loss. Further we have experimented with cross domain performance of RR and LSP prediction models and shown that finetuning a model with a small corpus of a target domain is can efficiently provide solution for cases from unseen domain. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject NER en_US
dc.subject Pretraining Models en_US
dc.subject Baselines: Legal NER en_US
dc.title Indian legal case judgment document mining : semantic segmentation and information extraction en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account