Abstract:
Locality sensitive hashing(LSH) is a hashing scheme that hashes objects into several buckets such that objects that are similar hashes into the same bucket and objects that are dissimilar hashes into different bucket with high probability. LSH provides sub-linear query time for near neighbor search problem. The nearness of two objects is measured using different similarity measures, e.g. cosine similarity, jaccard similarity. Inner product similarity is an ubiquitous measure in many data mining and machine learning problems. Even though inner product similarity appears in many important problems it is hard to get locality sensitive hashing for this similarity. There are few transformations that alleviate this problem and provide locality sensitive hashing for inner product similarity. In this thesis study we analyze the performance of locality sensitive hashing for inner product similarity. As of our knowledge there is no proper study that estimates the number of hash function evaluations required to achieve particular performance measures (true positive rate and false positive rate). This analysis provides a guideline to anyone who wants to use locality sensitive hashing for inner product similarity search related problems. We also investigated and proposed a new technique that reduces the required number of hash function evaluations. This technique bands the hash values in a hierarchical tree like structure. The idea behind this technique is that the locality sensitive hashing technique performs better if the collision probability difference is more.