Results 1 -
5 of
5
Evaluation methods for topic models
- In ICML
, 2009
"... A natural evaluation metric for statistical topic models is the probability of held-out documents given a trained model. While exact computation of this probability is intractable, several estimators for this probability have been used in the topic modeling literature, including the harmonic mean me ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
A natural evaluation metric for statistical topic models is the probability of held-out documents given a trained model. While exact computation of this probability is intractable, several estimators for this probability have been used in the topic modeling literature, including the harmonic mean method and empirical likelihood method. In this paper, we demonstrate experimentally that commonly-used methods are unlikely to accurately estimate the probability of heldout documents, and propose two alternative methods that are both accurate and efficient. 1.
SampleSearch: Importance Sampling in Presence of Determinism
, 2009
"... The paper focuses on developing effective importance sampling algorithms for mixed probabilistic and deterministic graphical models. The use of importance sampling in such graphical models is problematic because it generates many useless zero weight samples which are rejected yielding an inefficient ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The paper focuses on developing effective importance sampling algorithms for mixed probabilistic and deterministic graphical models. The use of importance sampling in such graphical models is problematic because it generates many useless zero weight samples which are rejected yielding an inefficient sampling process. To address this rejection problem, we propose the SampleSearch scheme that augments sampling with systematic constraint-based backtracking search. We characterize the bias introduced by the combination of search with sampling, and derive a weighting scheme which yields an unbiased estimate of the desired statistics (e.g. probability of evidence). When computing the weights exactly is too complex, we propose an approximation which has a weaker guarantee of asymptotic unbiasedness. We present results of an extensive empirical evaluation demonstrating that SampleSearch outperforms other schemes in presence of significant amount of determinism.
On Combining Graph-based variance reduction schemes
- in: 13th International Conference on Artificial Intelligence and Statistics
, 2010
"... In this paper, we consider two variance reduction schemes that exploit the structure of the primal graph of the graphical model: Rao-Blackwellised w-cutset sampling and AND/OR sampling. We show that the two schemes are orthogonal and can be combined to further reduce the variance. Our combination yi ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper, we consider two variance reduction schemes that exploit the structure of the primal graph of the graphical model: Rao-Blackwellised w-cutset sampling and AND/OR sampling. We show that the two schemes are orthogonal and can be combined to further reduce the variance. Our combination yields a new family of estimators which trade time and space with variance. We demonstrate experimentally that the new estimators are superior, often yielding an order of magnitude improvement over previous schemes on several benchmarks. 1
Importance Sampling based Estimation over AND/OR Search Spaces for Graphical Models
, 2009
"... The paper introduces a family of approximate schemes that extend the process of computing sample mean in importance sampling from the conventional OR space to the AND/OR search space for graphical models. All the sample means are defined on the same set of samples and trade time with variance. At on ..."
Abstract
- Add to MetaCart
The paper introduces a family of approximate schemes that extend the process of computing sample mean in importance sampling from the conventional OR space to the AND/OR search space for graphical models. All the sample means are defined on the same set of samples and trade time with variance. At one end is the AND/OR sample tree mean which has the same time complexity as the conventional OR sample tree mean but has lower variance. At the other end is the AND/OR sample graph mean which requires more time to compute but has the lowest variance. The paper provides theoretical analysis as well as empirical evaluation demonstrating that the AND/OR sample tree and graph means are far closer to the true mean than the OR sample tree mean.

