Abstract:
There are scenarios in machine learning problems when the available class is only positive, and the rest of the data points are unlabelled. In our current approach, for a given disease-gene set, given members of the gene set are positive data points labeled as disease genes, but there are no true negative genes that can be said as non-disease genes unless empirically proven. Thus, this setup presents a positive-unlabelled problem, where the goal is to find more number of positive data points (disease genes in our context) without labels for negative data points. Our method performs well on sparse features and a small set of positive data points.