IIIT-Delhi Institutional Repository

Corpora evaluation and system bias detection in multi document summarization

Show simple item record

dc.contributor.author Dey, Alvin
dc.contributor.author Chakraborty, Tanmoy (Advisor)
dc.date.accessioned 2021-03-24T07:14:01Z
dc.date.available 2021-03-24T07:14:01Z
dc.date.issued 2020-06
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/852
dc.description.abstract Multi-document summarization (MDS) is the task of reflecting key points from any set of documents into a concise text paragraph. In the past, it has been used to aggregate news, tweets, product reviews, etc. from various sources. Owing to no standard definition of the task, we encounter a plethora of datasets with varying levels of overlap and conflict between participating documents. There is also no standard regarding what constitutes summary information in MDS. Adding to the challenge is the fact that new systems report results on a set of chosen datasets, which might not correlate with their performance on the other datasets. In this paper, we study this heterogeneous task with the help of a few widely used MDS corpora and a suite of state-of-the-art models. We make an attempt to quantify the quality of summarization corpus and prescribe a list of points to consider while proposing a new MDS corpus. Next, we analyze the reason behind the absence of an MDS system which achieves superior performance across all corpora. We then observe the extent to which system metrics are influenced, and bias is propagated due to corpus properties. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject DUC, TAC, TextRank, LexRank en_US
dc.title Corpora evaluation and system bias detection in multi document summarization en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account