IIIT-Delhi Institutional Repository

Finding influential people from a historical news repository

Show simple item record

dc.contributor.author Gupta, Aayushee
dc.contributor.author Dutta, Haimonti (Advisor)
dc.date.accessioned 2014-09-05T10:04:02Z
dc.date.available 2014-09-05T10:04:02Z
dc.date.issued 2014-09-05
dc.identifier.uri https://repository.iiitd.edu.in/jspui/handle/123456789/166
dc.description.abstract Historical newspaper archives provide a wealth of information. They are of particular interest to genealogists, historians and scholars for People Search. In this thesis, we design a People Gazetteer from the noisy OCR text of historical newspapers and identify \in uential" people from it. A People Gazetteer is a dictionary of personal names; each entry of the gazetteer is a tuple containing a person name and a list of articles in which his name occurs along with the corresponding topic associated with each article. To build the People Gazetteer, we rst spell correct the noisy text using an edit distance based algorithm. A novel N-gram based evaluation algorithm is designed for measuring the perfor- mance of the spell corrector. Next, a Named Entity Recognizer is run on the text of each article to identify person entities and an LDA-based topic detector to assign categories to articles. To identify in uential people across each category of People Gazetteer, we de ne the notion of an In uential Person Index (IPI) and rank based on it. Our corpus is a sample of 14020 OCR newspaper articles (roughly two months' data) obtained from \The Sun" newspaper in the Chronicling America project. We present results on the top-K in uential people obtained from our algorithm by varying its parameters and verify results using Wikipedia. en_US
dc.language.iso en_US en_US
dc.publisher IIIT Delhi en_US
dc.subject Gazetteer en_US
dc.subject Text Mining en_US
dc.subject Information Retrieval en_US
dc.subject OCR en_US
dc.subject Spelling Correction en_US
dc.subject Historical data en_US
dc.subject In uential people detection en_US
dc.title Finding influential people from a historical news repository en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account