Results 1  10
of
23
On predictive distributions and Bayesian networks
 Statistics and Computing
, 2000
"... this paper we are interested in discrete prediction problems for a decisiontheoretic setting, where the ..."
Abstract

Cited by 39 (30 self)
 Add to MetaCart
this paper we are interested in discrete prediction problems for a decisiontheoretic setting, where the
Minimum Encoding Approaches for Predictive Modeling
 Proceedings of the 14th International Conference on Uncertainty in Artificial Intelligence (UAI'98
, 1998
"... We analyze differences between two informationtheoretically motivated approaches to statistical inference and model selection: the Minimum Description Length (MDL) principle, and the Minimum Message Length (MML) principle. Based on this analysis, we present two revised versions of MML: a pointwise ..."
Abstract

Cited by 20 (13 self)
 Add to MetaCart
(Show Context)
We analyze differences between two informationtheoretically motivated approaches to statistical inference and model selection: the Minimum Description Length (MDL) principle, and the Minimum Message Length (MML) principle. Based on this analysis, we present two revised versions of MML: a pointwise estimator which gives the MMLoptimal single parameter model, and a volumewise estimator which gives the MMLoptimal region in the parameter space. Our empirical results suggest that with small data sets, the MDL approach yields more accurate predictions than the MML estimators. The empirical results also demonstrate that the revised MML estimators introduced here perform better than the original MML estimator suggested by Wallace and Freeman. 1 INTRODUCTION Two related but distinct approaches to statistical inference and model selection are the Minimum Description Length (MDL) principle (Rissanen, 1978, 1987, 1996), and the Minimum Message Length (MML) principle (Wallace & Boulton, 1968; W...
Supervised modelbased visualization of highdimensional data
, 2000
"... When highdimensional data vectors are visualized on a two or threedimensional display, the goal is that two vectors close to each other in the multidimensional space should also be close to each other in the lowdimensional space. Traditionally, closeness is defined in terms of some standard ge ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
When highdimensional data vectors are visualized on a two or threedimensional display, the goal is that two vectors close to each other in the multidimensional space should also be close to each other in the lowdimensional space. Traditionally, closeness is defined in terms of some standard geometric distance measure, such as the Euclidean distance, based on a more or less straightforward comparison between the contents of the data vectors. However, such distances do not generally reflect properly the properties of complex problem domains, where changing one bit in a vector may completely change the relevance of the vector. What is more, in realworld situations the similarity of two vectors is not a universal property: even if two vectors can be regarded as similar from one point of view, from another point of view they may appear quite dissimilar. In order to capture these requirements for building a pragmatic and flexible similarity measure, we propose a data visualization scheme where the similarity of two vectors is determined indirectly by using a formal model of the problem domain; in our case, a Bayesian network model. In this scheme, two vectors are considered similar if they lead to similar predictions, when given as input to a Bayesian network model. The scheme is supervised in the sense that different perspectives can be taken into account by using different predictive distributions, i.e., by changing what is to be predicted. In addition, the modeling framework can also be used for validating the rationality of the resulting visualization. This modelbased visualization scheme has been implemented and tested on realworld domains with encouraging results.
BAYDA: Software for Bayesian Classification and Feature Selection
 Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD98
, 1998
"... BAYDA is a software package for flexible data analysis in predictive data mining tasks. The mathematical model underlying the program is based on a simple Bayesian network, the Naive Bayes classifier. It is wellknown that the Naive Bayes classifier performs well in predictive data mining tasks, whe ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
BAYDA is a software package for flexible data analysis in predictive data mining tasks. The mathematical model underlying the program is based on a simple Bayesian network, the Naive Bayes classifier. It is wellknown that the Naive Bayes classifier performs well in predictive data mining tasks, when compared to approaches using more complex models. However, the model makes strong independenceassumptions that are frequently violated in practice. For this reason, the BAYDA software also provides a feature selection scheme which can be used for analyzing the problem domain, and for improving the prediction accuracy of the models constructed by BAYDA. The scheme is based on a novel Bayesian feature selection criterion introduced in this paper. The suggested criterion is inspired by the CheesemanStutz approximation for computing the marginal likelihood of Bayesiannetworks with hidden variables. The empirical results with several widelyused data sets demonstrate that the automated Bayesian...
Bayes Optimal InstanceBased Learning
 MACHINE LEARNING: ECML98, PROCEEDINGS OF THE 10TH EUROPEAN CONFERENCE, VOLUME 1398 OF LECTURE
, 1998
"... In this paper we present a probabilistic formalization of the instancebased learning approach. In our Bayesian framework, moving from the construction of an explicit hypothesis to a datadriven instancebased learning approach, is equivalent to averaging over all the (possibly infinitely many) indiv ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
In this paper we present a probabilistic formalization of the instancebased learning approach. In our Bayesian framework, moving from the construction of an explicit hypothesis to a datadriven instancebased learning approach, is equivalent to averaging over all the (possibly infinitely many) individual models. The general Bayesian instancebased learning framework described in this paper can be applied with any set of assumptions defining a parametric model family, and to any discrete prediction task where the number of simultaneously predicted attributes is small, which includes for example all classification tasks prevalent in the machine learning literature. To illustrate the use of the suggested general framework in practice, we show how the approach can be implemented in the special case with the strong independence assumptions underlying the so called Naive Bayes classifier. The resulting Bayesian instancebased classifier is validated empirically with public domain data sets...
On the Accuracy of Stochastic Complexity Approximations
 IN A. GAMMERMAN (ED.), CAUSAL
, 1997
"... Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as determining model complexity, or performing predictive inference. U ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as determining model complexity, or performing predictive inference. Unfortunately for cases where the data has missing information, computing the stochastic complexity requires marginalizing (integrating) over the missing data, which results even in the discrete data case to computing a sum with an exponential number of terms. Therefore in most cases the stochastic complexity measure has to be approximated. In this paper we will investigate empirically the performance of some of the most common stochastic complexity approximations in an attempt to understand their small sample behavior in the incomplete data framework. In earlier empirical evaluations the problem of not knowing the actual stochastic complexity for incomplete data was circumvented either by us...
A Bayesian Approach to Discretization
 Proceedings of the European Symposium on Intelligent Techniques
, 1997
"... : The performance of many machine learning algorithms can be substantially improved with a proper discretization scheme. In this paper we describe a theoretically rigorous approach to discretization of continuous attribute values, based on a Bayesian clustering framework. The method produces a proba ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
: The performance of many machine learning algorithms can be substantially improved with a proper discretization scheme. In this paper we describe a theoretically rigorous approach to discretization of continuous attribute values, based on a Bayesian clustering framework. The method produces a probabilistic scoring metric for different discretizations, and it can be combined with various types of learning algorithms working on discrete data. The approach is validated by demonstrating empirically the performance improvement of the Naive Bayes classifier when Bayesian discretization is used instead of the standard equal frequency interval discretization. 1 INTRODUCTION Many algorithms developed in the machine learning and uncertain reasoning community focus on learning in nominal feature bases. On the other hand, many real world tasks involve continuous attribute domains. Consequently, in order to be able to use such algorithms, a discretization process is needed. Continuous variable d...
A Bayesian Approach for Retrieving Relevant Cases
 Artificial Intelligence Applications (Proceedings of the EXPERSYS97 Conference
, 1997
"... The problem of finding the set of most relevant cases from a given database, with respect to the decision making situation at hand, is frequently encountered in many realworld domains. In the casebased reasoning framework this task is commonly known as the case matching problem. Case matching is an ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
The problem of finding the set of most relevant cases from a given database, with respect to the decision making situation at hand, is frequently encountered in many realworld domains. In the casebased reasoning framework this task is commonly known as the case matching problem. Case matching is an important problem in several commercially significant application areas, such as industrial configuration and manufacturing problems. Earlier approaches to the case matching problem typically rely on some distance measure, e.g., the Euclidean distance, although there is no a priori guarantee that such measures really reflect the useful similarities and dissimilarities between the cases. In this paper we introduce a novel approach to the case matching problem based on Bayesian probability theory, and propose a Bayesian case matching measure for scoring the cases with respect to a given decision making situation. The Bayesian case matching score discussed is currently being applied in a real...
Using Bayesian Networks For Visualizing HighDimensional Data
, 1999
"... A Bayesian (belief) network is a representation of a probability distribution over a set of random variables. One of the main advantages of this model family is that it offers a theoretically solid machine learning framework for constructing accurate domain models from sample data efficiently and re ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
A Bayesian (belief) network is a representation of a probability distribution over a set of random variables. One of the main advantages of this model family is that it offers a theoretically solid machine learning framework for constructing accurate domain models from sample data efficiently and reliably. As the parameters of a Bayesian network have a precise semantic interpretation, the learned models can be used for data mining purposes, i.e., for examining regularities found in the data. In addition to this type of direct examination of the model, we suggest that the learned Bayesian networks can also be used for indirect data mining purposes through a visualization scheme which can be used for producing 2D or 3D representations of highdimensional problem domains. Our visualization scheme is based on the predictive distributions produced by the Bayesian network model, which means that the resulting visualizations can also be used as a postprocessing tool for visual inspection of ...