Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/166
Full metadata record
DC FieldValueLanguage
dc.contributor.authorGupta, Aayushee-
dc.contributor.authorDutta, Haimonti (Advisor)-
dc.date.accessioned2014-09-05T10:04:02Z-
dc.date.available2014-09-05T10:04:02Z-
dc.date.issued2014-09-05-
dc.identifier.urihttps://repository.iiitd.edu.in/jspui/handle/123456789/166-
dc.description.abstractHistorical newspaper archives provide a wealth of information. They are of particular interest to genealogists, historians and scholars for People Search. In this thesis, we design a People Gazetteer from the noisy OCR text of historical newspapers and identify \in uential" people from it. A People Gazetteer is a dictionary of personal names; each entry of the gazetteer is a tuple containing a person name and a list of articles in which his name occurs along with the corresponding topic associated with each article. To build the People Gazetteer, we rst spell correct the noisy text using an edit distance based algorithm. A novel N-gram based evaluation algorithm is designed for measuring the perfor- mance of the spell corrector. Next, a Named Entity Recognizer is run on the text of each article to identify person entities and an LDA-based topic detector to assign categories to articles. To identify in uential people across each category of People Gazetteer, we de ne the notion of an In uential Person Index (IPI) and rank based on it. Our corpus is a sample of 14020 OCR newspaper articles (roughly two months' data) obtained from \The Sun" newspaper in the Chronicling America project. We present results on the top-K in uential people obtained from our algorithm by varying its parameters and verify results using Wikipedia.en_US
dc.language.isoen_USen_US
dc.publisherIIIT Delhien_US
dc.subjectGazetteeren_US
dc.subjectText Miningen_US
dc.subjectInformation Retrievalen_US
dc.subjectOCRen_US
dc.subjectSpelling Correctionen_US
dc.subjectHistorical dataen_US
dc.subjectIn uential people detectionen_US
dc.titleFinding influential people from a historical news repositoryen_US
dc.typeThesisen_US
Appears in Collections:Year-2014

Files in This Item:
File Description SizeFormat 
MT12030.pdf1.56 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.