Long document keyphrase extraction

Gautam, Dibya; Agrawal, Navneet

dc.contributor.author	Gautam, Dibya
dc.contributor.author	Agrawal, Navneet
dc.date.accessioned	2023-04-14T10:56:57Z
dc.date.available	2023-04-14T10:56:57Z
dc.date.issued	2021-11
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1145
dc.description.abstract	Keyphrase extraction is the task of automatically extracting a set of phrases from a document that represents the overall context of the given document. Such keyphrases can be used in multiple ways in Information retrieval, recommendation systems, document clustering, etc. For scientific papers, all the existing works for keyphrase extraction use datasets consisting title and abstracts. However, since the abstract and the title aren’t the complete representation of an entire paper, these datasets have a major limitation. To overcome this limitation, we introduce a dataset of over 1.3M full body scientific papers with their keyphrases that can be used for the automatic keyphrase extraction tasks. We also present the results of initial experiments done using the popular unsupervised and supervised techniques on this dataset. We also experimented a new semi-supervised approach on this dataset.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	keyphrase extraction	en_US
dc.subject	long document dataset	en_US
dc.subject	unsupervised keyphrase extraction	en_US
dc.subject	semi-supervised keyphrase extraction	en_US
dc.title	Long document keyphrase extraction	en_US