Abstract:
High-utility itemset mining is an emerging research area in
the field of Data Mining. Several algorithms were proposed to find high-
utility itemsets from transaction databases and use a data structure
called UP-tree for their working. However, algorithms based on UP-tree
generate a lot of candidates due to limited information availability in
UP-tree for computing utility value estimates of itemsets. In this pa-
per, we present a data structure named UP-Hist tree which maintains a
histogram of item quantities with each node of the tree. The histogram
allows computation of better utility estimates for efective pruning of the
search space. Extensive experiments on real as well as synthetic datasets
show that our algorithm based on UP-Hist tree outperforms the state of
the art algorithms in terms of the total number of candidate high util-
ity itemsets generated as well as total execution time. The UP-Hist tree
takes low memory ranging from few KB's to MB's only.