Abstract:
Keyphrase extraction is the task of automatically extracting a set of phrases from a document that represents the overall context of the given document. Such keyphrases can be used in multiple ways in Information retrieval, recommendation systems, document clustering, etc. For scientific papers, all the existing works for keyphrase extraction use datasets consisting title and abstracts. However, since the abstract and the title aren’t the complete representation of an entire paper, these datasets have a major limitation. To overcome this limitation, we introduce a dataset of over 1.3M full body scientific papers with their keyphrases that can be used for the automatic keyphrase extraction tasks. We also present the results of initial experiments done using the popular unsupervised and supervised techniques on this dataset. We also experimented a new semi-supervised approach on this dataset.