Results 1 
9 of
9
On discriminative Bayesian network classifiers and logistic regression
 Machine Learning
"... Abstract. Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheor ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheoretic property. The property holds for naive Bayes but also for more complex structures such as treeaugmented naive Bayes (TAN) as well as for mixed diagnosticdiscriminative structures. Our results imply that for networks satisfying our property, the conditional likelihood cannot have local maxima so that the global maximum can be found by simple local optimization methods. We also show that if this property does not hold, then in general the conditional likelihood can have local, nonglobal maxima. We illustrate our theoretical results by empirical experiments with local optimization in a conditional naive Bayes model. Furthermore, we provide a heuristic strategy for pruning the number of parameters and relevant features in such models. For many data sets, we obtain good results with heavily pruned submodels containing many fewer parameters than the original naive Bayes model.
On discriminative joint density modeling
 In 16th European Conference on Machine Learning (ECML
, 2005
"... Abstract. We study discriminative joint density models, that is, generative models for the joint density p(c, x) learned by maximizing a discriminative cost function, the conditional likelihood. We use the framework to derive generative models for generalized linear models, including logistic regres ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We study discriminative joint density models, that is, generative models for the joint density p(c, x) learned by maximizing a discriminative cost function, the conditional likelihood. We use the framework to derive generative models for generalized linear models, including logistic regression, linear discriminant analysis, and discriminative mixture of unigrams. The benefits of deriving the discriminative models from joint density models are that it is easy to extend the models and interpret the results, and missing data can be treated using justified standard methods. 1
Selection of generative models in classification
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2006
"... Abstract—This paper is concerned with the selection of a generative model for supervised classification. Classical criteria for model selection assess the fit of a model rather than its ability to produce a low classification error rate. A new criterion, the Bayesian Entropy Criterion (BEC), is prop ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Abstract—This paper is concerned with the selection of a generative model for supervised classification. Classical criteria for model selection assess the fit of a model rather than its ability to produce a low classification error rate. A new criterion, the Bayesian Entropy Criterion (BEC), is proposed. This criterion takes into account the decisional purpose of a model by minimizing the integrated classification entropy. It provides an interesting alternative to the crossvalidated error rate which is computationally expensive. The asymptotic behavior of the BEC criterion is presented. Numerical experiments on both simulated and real data sets show that BEC performs better than the BIC criterion to select a model minimizing the classification error rate and provides analogous performance to the crossvalidated error rate. Index Terms—Generative classification, integrated likelihood, integrated conditional likelihood, classification entropy, crossvalidated error rate, AIC and BIC criteria.
Supervised Learning of Bayesian Network Parameters Made Easy
 Level Perspective on Branch Architecture Performance, IEEE Micro28
, 2002
"... Bayesian network models are widely used for supervised prediction tasks such as classification. Usually the parameters of such models are determined using `unsupervised' methods such as maximization of the joint likelihood. In many cases, the reason is that it is not clear how to find the param ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Bayesian network models are widely used for supervised prediction tasks such as classification. Usually the parameters of such models are determined using `unsupervised' methods such as maximization of the joint likelihood. In many cases, the reason is that it is not clear how to find the parameters maximizing the supervised (conditional) likelihood. We show how the supervised learning problem can be solved e#ciently for a large class of Bayesian network models, including the Naive Bayes (NB) and treeaugmented NB (TAN) classifiers. We do this by showing that under a certain general condition on the network structure, the supervised learning problem is exactly equivalent to logistic regression. Hitherto this was known only for Naive Bayes models. Since logistic regression models have a concave loglikelihood surface, the global maximum can be easily found by local optimization methods.
Supervised Naive Bayes Parameters
, 2002
"... this paper we show, how this supervised learning problem can be solved e#ciently. We introduce an alternative parametrization in which the supervised likelihood becomes concave. From this result it follows that there can be at most one maximum, easily found by local optimization methods. We present ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
this paper we show, how this supervised learning problem can be solved e#ciently. We introduce an alternative parametrization in which the supervised likelihood becomes concave. From this result it follows that there can be at most one maximum, easily found by local optimization methods. We present test results that show this is feasible and highly beneficial
A Bayesian Approach to Learning in Fault Isolation
"... Fault isolation is the art of localizing faults in a process, given observations from it. To do this, a model describing the relation between faults and observations is needed. In this paper we focus on learning such models both from training data and from prior knowledge. There are several challeng ..."
Abstract
 Add to MetaCart
(Show Context)
Fault isolation is the art of localizing faults in a process, given observations from it. To do this, a model describing the relation between faults and observations is needed. In this paper we focus on learning such models both from training data and from prior knowledge. There are several challenges in learning fault isolators. The number of data, as well as the available computing resources, are often limited and there may be previously unobserved fault patterns. To meet these challenges we take on a Bayesian approach. We compare five different methods for learning in fault isolation, and evaluate their performance on a real fault isolation problem; the diagnosis of an automotive engine. 1
CrossAnalysis of Gulf of Bothnia Wild Salmon Rivers Using Bayesian Networks
, 2002
"... We present a methodology allowing the transfer of knowledge from a wild salmon river to another via a predictive model for the chosen population status indicator. From the management point of view, the production of wild smolts is the most important of such indicators. However, in our realworld d ..."
Abstract
 Add to MetaCart
(Show Context)
We present a methodology allowing the transfer of knowledge from a wild salmon river to another via a predictive model for the chosen population status indicator. From the management point of view, the production of wild smolts is the most important of such indicators. However, in our realworld data from Finnish and Swedish Gulf of Bothnia rivers we only have data on the number of wild smolts available for two of the rivers, making the direct empirical learning and validation of models learned from the data for the other rivers impossible, but the suggested methodology can be used to transfer knowledge from the two rivers to the other rivers. To validate the suggested approach, we also apply the methodology in the prediction of parr density, in which case the results can be validated, and check by strict empirical procedures for our success in the transfer of knowledge. Our framework is probabilistic and our approach Bayesian, allowing us to handle uncertainty in a consistent and welldefined fashion. Our model family is Bayesian networks, a class of models with a simple graphical representation allowing visualization of the obtained knowledge, being also the stateoftheart classifier in many domains. Our emphasis is on empirical modeling: our aim is to see what can be learned from the existing realworld data. With the needs of fisheries management in mind, we highlight the role of the loss function in modeling, evaluating our models also in a setting where it is a greater error to over than underestimate the size of a population.
Predicting the Wild Salmon Production Using Bayesian Networks
, 2002
"... From the management point of view, the production of wild smolts is the most important indicator of the status of a river's salmon population. We present a methodology allowing the prediction of the number of wild smolts in a river in a consistent and welldefined fashion. Our framework is pr ..."
Abstract
 Add to MetaCart
(Show Context)
From the management point of view, the production of wild smolts is the most important indicator of the status of a river's salmon population. We present a methodology allowing the prediction of the number of wild smolts in a river in a consistent and welldefined fashion. Our framework is probabilistic and our approach Bayesian. Our models are Bayesian networks, which have a simple graphical representation allowing visualization of the obtained knowledge. Being the stateoftheart classifier in many domains, they also possess predictive power. We emphasize empirical modeling, studying what can be learned from the existing realworld data for two Gulf of Bothnia rivers, Simo and Tornio (the Finnish side). To ensure that our models generalize well, we employ strict validation procedures, where care is taken to inhibit leakage of information from the validation set to the training set. Furthermore, with the needs of fisheries management in mind, we highlight the role of the loss function in modeling, evaluating our models also in a setting where it is a greater error to over than underestimate the size of a population.