Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data (2001)
| Citations: | 34 - 6 self |
BibTeX
@MISC{Pavlov01beyondindependence:,
author = {Dmitry Pavlov and Heikki Mannila and Padhraic Smyth},
title = {Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data},
year = {2001}
}
Years of Citing Articles
OpenURL
Abstract
We investigate the problem of generating fast approximate answers to queries for large sparse binary data sets. We focus in particular on probabilistic model-based approaches to this problem and develop a number of techniques that are significantly more accurate than a baseline independence model. In particular, we introduce a novel technique for building probabilistic models from frequent itemsets. The itemsets are treated as constraints on the distribution of the query variables and the maximum entropy principle is used online to build a joint probability model for attributes in the query. We show that the resulting probability model defines a Markov random field (MRF) and that the time taken to answer a query scales exponentially as a function of the induced width of the associated MRF graph. We empirically compare the MRF model to other probabilistic models, such as the independence model, the Chow-Liu tree model, the Bernoulli mixture model, and the ADTree model. Experimental resu...







