Abstract:
Spatio-textual similarity join retrieves a set of pairs of objects wherein objects in each pair are close in spatial as well as textual dimensions. A lot of
work has been done in the spatial dimension but no work has been done for
spatial-textual joins. However, due to the ubiquity of GPS enabled devices,
huge spatial-textual data is being generated which demand new methods to
query and perform operations on this new data type. We study join operation for spatial-textual data and incorporate various optimizations/ heuristics such as e efficient grid partitioning for spatial dimension, use of a speci c
pre x length of textual vector and ordering of elements in textual vectors on
the basis of their TF-IDF scores. We also design and study algorithms using
the above heuristics for spatial-textual data join on MapReduce Framework.
Experimental results on two real life datasets, Flickr and Foursquare, show
the e effectiveness of these optimizations in terms of computation time as well
as pruning of non-candidates.