Results 1  10
of
23
Learning with mixtures of trees
 Journal of Machine Learning Research
, 2000
"... This paper describes the mixturesoftrees model, a probabilistic model for discrete multidimensional domains. Mixturesoftrees generalize the probabilistic trees of Chow and Liu [6] in a different and complementary direction to that of Bayesian networks. We present efficient algorithms for learnin ..."
Abstract

Cited by 146 (2 self)
 Add to MetaCart
(Show Context)
This paper describes the mixturesoftrees model, a probabilistic model for discrete multidimensional domains. Mixturesoftrees generalize the probabilistic trees of Chow and Liu [6] in a different and complementary direction to that of Bayesian networks. We present efficient algorithms for learning mixturesoftrees models in maximum likelihood and Bayesian frameworks. We also discuss additional efficiencies that can be obtained when data are “sparse, ” and we present data structures and algorithms that exploit such sparseness. Experimental results demonstrate the performance of the model for both density estimation and classification. We also discuss the sense in which treebased classifiers perform an implicit form of feature selection, and demonstrate a resulting insensitivity to irrelevant attributes.
Learning from labeled and unlabeled data: An empirical study across techniques and domains
 Journal of Artificial Intelligence Research
"... There has been increased interest in devising learning techniques that combine unlabeled data with labeled data – i.e. semisupervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
(Show Context)
There has been increased interest in devising learning techniques that combine unlabeled data with labeled data – i.e. semisupervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semisupervised learning techniques assumes that the labeled and unlabeled data come from the same distribution. It is possible for the labeling process to be associated with a selection bias such that the distributions of data points in the labeled and unlabeled sets are different. Not correcting for such bias can result in biased function approximation with potentially poor performance. In this paper, we present an empirical study of various semisupervised learning techniques on a variety of datasets. We attempt to answer various questions such as the effect of independence or relevance amongst features, the effect of the size of the labeled and unlabeled sets and the effect of noise. We also investigate the impact of sampleselection bias on the semisupervised learning techniques under study and implement a bivariate probit technique particularly designed to correct for such bias. 1.
Estimating dependency structure as a hidden variable
 In NIPS
, 1998
"... This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms based on the EM and the Minimum Spann ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
(Show Context)
This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms based on the EM and the Minimum Spanning Tree algorithms that learn mixtures of trees in the ML framework. The method can be extended to take into account priors and, for a wide class of priors that includes the Dirichlet and the MDL priors, it preserves its computational efficiency. Experimental results demonstrate the excellent performance of the new model both in density estimation and in classification. Finally, we show that a single tree classifier acts like an implicit feature selector, thus making the classification performance insensitive to irrelevant attributes.
Comparing Predictive Inference Methods for Discrete Domains
 In Proceedings of the sixth international workshop on artificial intelligence and statistics
, 1997
"... Predictive inference is seen here as the process of determining the predictive distribution of a discrete variable, given a data set of training examples and the values for the other problem domain variables. We consider three approaches for computing this predictive distribution, and assume that th ..."
Abstract

Cited by 23 (17 self)
 Add to MetaCart
(Show Context)
Predictive inference is seen here as the process of determining the predictive distribution of a discrete variable, given a data set of training examples and the values for the other problem domain variables. We consider three approaches for computing this predictive distribution, and assume that the joint probability distribution for the variables belongs to a set of distributions determined by a set of parametric models. In the simplest case, the predictive distribution is computed by using the model with the maximum a posteriori (MAP) posterior probability. In the evidence approach, the predictive distribution is obtained by averaging over all the individual models in the model family. In the third case, we define the predictive distribution by using Rissanen's new definition of stochastic complexity. Our experiments performed with the family of Naive Bayes models suggest that when using all the data available, the stochastic complexity approach produces the most accurate prediction...
Efficient Computation of Stochastic Complexity
 Proceedings of the Ninth International Conference on Artificial Intelligence and Statistics
, 2003
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing ..."
Abstract

Cited by 18 (11 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. Unfortunately, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Therefore, in order to be able to apply the stochastic complexity measure in practice, in most cases it has to be approximated. In this paper, we show that for some interesting and important cases with multinomial data sets, the exponentiality can be removed without loss of accuracy. We also introduce a new computationally efficient approximation scheme based on analytic combinatorics and assess its accuracy, together with earlier approximations, by comparing them to the exact form.
Comparing Bayesian Model Class Selection Criteria by Discrete Finite Mixtures
 Information, Statistics and Induction in Science, pages 364374, Proceedings of the ISIS'96 Conference
, 1996
"... : We investigate the problem of computing the posterior probability of a model class, given a data sample and a prior distribution for possible parameter settings. By a model class we mean a group of models which all share the same parametric form. In general this posterior may be very hard to compu ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
: We investigate the problem of computing the posterior probability of a model class, given a data sample and a prior distribution for possible parameter settings. By a model class we mean a group of models which all share the same parametric form. In general this posterior may be very hard to compute for highdimensional parameter spaces, which is usually the case with realworld applications. In the literature several methods for computing the posterior approximately have been proposed, but the quality of the approximations may depend heavily on the size of the available data sample. In this work we are interested in testing how well the approximative methods perform in realworld problem domains. In order to conduct such a study, we have chosen the model family of finite mixture distributions. With certain assumptions, we are able to derive the model class posterior analytically for this model family. We report a series of model class selection experiments on realworld data sets, w...
NML Computation Algorithms for TreeStructured Multinomial Bayesian Networks
, 2007
"... Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically wellfounded, general fr ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically wellfounded, general framework for performing statistical inference. The mathematical formalization of MDL is based on the normalized maximum likelihood (NML) distribution, which has several desirable theoretical properties. In the case of discrete data, straightforward computation of the NML distribution requires exponential time with respect to the sample size, since the definition involves a sum over all the possible data samples of a fixed size. In this paper, we first review some existing algorithms for efficient NML computation in the case of multinomial and naive Bayes model families. Then we proceed by extending these algorithms to more complex, treestructured Bayesian networks.
Computing the Regret Table for Multinomial Data
, 2005
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case
Prior Information and Generalized Questions
, 1996
"... In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint). ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint).