Results 1  10
of
16
On predictive distributions and Bayesian networks
 Statistics and Computing
, 2000
"... this paper we are interested in discrete prediction problems for a decisiontheoretic setting, where the ..."
Abstract

Cited by 39 (30 self)
 Add to MetaCart
this paper we are interested in discrete prediction problems for a decisiontheoretic setting, where the
Constructing Bayesian finite mixture models by the EM algorithm
, 1997
"... In this paper we explore the use of finite mixture models for building decision support systems capable of sound probabilistic inference. Finite mixture models have many appealing properties: they are computationally efficient in the prediction (reasoning) phase, they are universal in the sense that ..."
Abstract

Cited by 23 (13 self)
 Add to MetaCart
In this paper we explore the use of finite mixture models for building decision support systems capable of sound probabilistic inference. Finite mixture models have many appealing properties: they are computationally efficient in the prediction (reasoning) phase, they are universal in the sense that they can approximate any problem domain distribution, and they can handle multimodality well. We present a formulation of the model construction problem in the Bayesian framework for finite mixture models, and describe how Bayesian inference is performed given such a model. The model construction problem can be seen as missing data estimation and we describe a realization of the ExpectationMaximization (EM) algorithm for finding good models. To prove the feasibility of our approach, we report crossvalidated empirical results on several publicly available classification problem datasets, and compare our results to corresponding results obtained by alternative techniques, such as neural netw...
Comparing Predictive Inference Methods for Discrete Domains
 In Proceedings of the sixth international workshop on artificial intelligence and statistics
, 1997
"... Predictive inference is seen here as the process of determining the predictive distribution of a discrete variable, given a data set of training examples and the values for the other problem domain variables. We consider three approaches for computing this predictive distribution, and assume that th ..."
Abstract

Cited by 23 (17 self)
 Add to MetaCart
(Show Context)
Predictive inference is seen here as the process of determining the predictive distribution of a discrete variable, given a data set of training examples and the values for the other problem domain variables. We consider three approaches for computing this predictive distribution, and assume that the joint probability distribution for the variables belongs to a set of distributions determined by a set of parametric models. In the simplest case, the predictive distribution is computed by using the model with the maximum a posteriori (MAP) posterior probability. In the evidence approach, the predictive distribution is obtained by averaging over all the individual models in the model family. In the third case, we define the predictive distribution by using Rissanen's new definition of stochastic complexity. Our experiments performed with the family of Naive Bayes models suggest that when using all the data available, the stochastic complexity approach produces the most accurate prediction...
Predictive Data Mining with Finite Mixtures
 In Proceedings of The Second International Conference on Knowledge Discovery and Data Mining
, 1996
"... In data mining the goal is to develop methods for discovering previously unknown regularities from databases. The resulting models are interpreted and evaluated by domain experts, but some model evaluation criterion is needed also for the model construction process. The optimal choice would be to us ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
In data mining the goal is to develop methods for discovering previously unknown regularities from databases. The resulting models are interpreted and evaluated by domain experts, but some model evaluation criterion is needed also for the model construction process. The optimal choice would be to use the same criterion as the human experts, but this is usually impossible as the experts are not capable of expressing their evaluation criteria formally. On the other hand, it seems reasonable to assume that any model pos31353ulg nP.nn:,,.. cl.,3 IA&l2 LqJa”urvy nn,nl.:,:+.. “I,4 UanuL ~,1,:..,.. ~““U a.eF..l ~IlxmAl‘“uu,mFl:n+:nw. ” am ” nlo, captures some structure of the reality. For this reason, in predictive data mining the search for good models is guided by the expected predictive error of the models. In this paper we describe the Bayesian approach to predictive data mining in the finite mixture modeling framework. The finite mixture model family is a natural choice for domains where the data exhibits a clustering structure. In many real world domains this seems to be the case, as is demonstrated by our experimental results on a set of public domain databases. Data mining aims at extracting useful information from databases by discovering previously unknown regularities from data (Fayyad et al. 1996). In the most general context, finding such interesting regularities is a process (often called knowledge discovery in databases) which includes the interpretation of the extracted patterns based on the domain knowledge available. Typically the pattern extraction phase is performed by a structure searching program, and the interpretation phase by a human expert. The various proposed ap
Comparing Bayesian Model Class Selection Criteria by Discrete Finite Mixtures
 Information, Statistics and Induction in Science, pages 364374, Proceedings of the ISIS'96 Conference
, 1996
"... : We investigate the problem of computing the posterior probability of a model class, given a data sample and a prior distribution for possible parameter settings. By a model class we mean a group of models which all share the same parametric form. In general this posterior may be very hard to compu ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
: We investigate the problem of computing the posterior probability of a model class, given a data sample and a prior distribution for possible parameter settings. By a model class we mean a group of models which all share the same parametric form. In general this posterior may be very hard to compute for highdimensional parameter spaces, which is usually the case with realworld applications. In the literature several methods for computing the posterior approximately have been proposed, but the quality of the approximations may depend heavily on the size of the available data sample. In this work we are interested in testing how well the approximative methods perform in realworld problem domains. In order to conduct such a study, we have chosen the model family of finite mixture distributions. With certain assumptions, we are able to derive the model class posterior analytically for this model family. We report a series of model class selection experiments on realworld data sets, w...
Bayes Optimal InstanceBased Learning
 MACHINE LEARNING: ECML98, PROCEEDINGS OF THE 10TH EUROPEAN CONFERENCE, VOLUME 1398 OF LECTURE
, 1998
"... In this paper we present a probabilistic formalization of the instancebased learning approach. In our Bayesian framework, moving from the construction of an explicit hypothesis to a datadriven instancebased learning approach, is equivalent to averaging over all the (possibly infinitely many) indiv ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
In this paper we present a probabilistic formalization of the instancebased learning approach. In our Bayesian framework, moving from the construction of an explicit hypothesis to a datadriven instancebased learning approach, is equivalent to averaging over all the (possibly infinitely many) individual models. The general Bayesian instancebased learning framework described in this paper can be applied with any set of assumptions defining a parametric model family, and to any discrete prediction task where the number of simultaneously predicted attributes is small, which includes for example all classification tasks prevalent in the machine learning literature. To illustrate the use of the suggested general framework in practice, we show how the approach can be implemented in the special case with the strong independence assumptions underlying the so called Naive Bayes classifier. The resulting Bayesian instancebased classifier is validated empirically with public domain data sets...
On the Accuracy of Stochastic Complexity Approximations
 IN A. GAMMERMAN (ED.), CAUSAL
, 1997
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as determining model complexity, or performing predictive inference. U ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as determining model complexity, or performing predictive inference. Unfortunately for cases where the data has missing information, computing the stochastic complexity requires marginalizing (integrating) over the missing data, which results even in the discrete data case to computing a sum with an exponential number of terms. Therefore in most cases the stochastic complexity measure has to be approximated. In this paper we will investigate empirically the performance of some of the most common stochastic complexity approximations in an attempt to understand their small sample behavior in the incomplete data framework. In earlier empirical evaluations the problem of not knowing the actual stochastic complexity for incomplete data was circumvented either by us...
Bayesian Modeling in an Adaptive OnLine Questionnaire for Education And Educational Research
 PEG 2001: INTELLIGENT COMPUTER AND COMMUNICATIONS TECHNOLOGY  LEARNING IN ONLINE COMMUNITIES, PROCEEDINGS OF THE TENTH INTERNATIONAL PEG CONFERENCE, 2001
, 2001
"... Bayesian modeling can be used for providing adaptation in an online questionnaire. In our research, adaptation means selecting the questions presented to the user in such a way that the total amount of answers required for profiling the user is minimized. In the article, we present the motivation ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Bayesian modeling can be used for providing adaptation in an online questionnaire. In our research, adaptation means selecting the questions presented to the user in such a way that the total amount of answers required for profiling the user is minimized. In the article, we present the motivation to use Bayesian modeling as a basis for the adaptation and introduce our adaptive online questionnaire EDUFORM which employs these modeling principles. Preliminary empirical study of EDUFORM proved 3 to 17 minutes (35 to 64 percent) time saving and 9 to 32 propositions (36 to 80 percent) less to answer per questionnaire. Since EDUFORM is an open system in the sense that the content is not fixed, we discuss a range of possible uses of EDUFORM, including learner selfevaluation and testing by quizzes to provide assessment information for teachers.
A Bayesian Approach to Discretization
 Proceedings of the European Symposium on Intelligent Techniques
, 1997
"... : The performance of many machine learning algorithms can be substantially improved with a proper discretization scheme. In this paper we describe a theoretically rigorous approach to discretization of continuous attribute values, based on a Bayesian clustering framework. The method produces a proba ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
: The performance of many machine learning algorithms can be substantially improved with a proper discretization scheme. In this paper we describe a theoretically rigorous approach to discretization of continuous attribute values, based on a Bayesian clustering framework. The method produces a probabilistic scoring metric for different discretizations, and it can be combined with various types of learning algorithms working on discrete data. The approach is validated by demonstrating empirically the performance improvement of the Naive Bayes classifier when Bayesian discretization is used instead of the standard equal frequency interval discretization. 1 INTRODUCTION Many algorithms developed in the machine learning and uncertain reasoning community focus on learning in nominal feature bases. On the other hand, many real world tasks involve continuous attribute domains. Consequently, in order to be able to use such algorithms, a discretization process is needed. Continuous variable d...