Results 1  10
of
93
Learning Bayesian networks: The combination of knowledge and statistical data
 Machine Learning
, 1995
"... We describe scoring metrics for learning Bayesian networks from a combination of user knowledge and statistical data. We identify two important properties of metrics, which we call event equivalence and parameter modularity. These properties have been mostly ignored, but when combined, greatly simpl ..."
Abstract

Cited by 1158 (35 self)
 Add to MetaCart
(Show Context)
We describe scoring metrics for learning Bayesian networks from a combination of user knowledge and statistical data. We identify two important properties of metrics, which we call event equivalence and parameter modularity. These properties have been mostly ignored, but when combined, greatly simplify the encoding of a user’s prior knowledge. In particular, a user can express his knowledge—for the most part—as a single prior Bayesian network for the domain. 1
Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction
, 2002
"... For large, realworld inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the data and/or the computational costs associated with learning from the data. One question of practical importance is: if n ..."
Abstract

Cited by 173 (9 self)
 Add to MetaCart
(Show Context)
For large, realworld inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the data and/or the computational costs associated with learning from the data. One question of practical importance is: if n training examples are going to be selected, in what proportion should the classes be represented? In this article we analyze the relationship between the marginal class distribution of training data and the performance of classification trees induced from these data, when the size of the training set is fixed. We study twentysix data sets and, for each, determine the best class distribution for learning. Our results show that, for a fixed number of training examples, it is often possible to obtain improved classifier performance by training with a class distribution other than the naturally occurring class distribution. For example, we show that to build a classifier robust to different misclassification costs, a balanced class distribution generally performs quite well. We also describe and evaluate a budgetsensitive progressivesampling algorithm that selects training examples such that the resulting training set has a good (nearoptimal) class distribution for learning.
The Effect of Class Distribution on Classifier Learning: An Empirical Study
, 2001
"... In this article we analyze the effect of class distribution on classifier learning. We begin by describing the different ways in which class distribution affects learning and how it affects the evaluation of learned classifiers. We then present the results of two comprehensive experimental studie ..."
Abstract

Cited by 107 (2 self)
 Add to MetaCart
(Show Context)
In this article we analyze the effect of class distribution on classifier learning. We begin by describing the different ways in which class distribution affects learning and how it affects the evaluation of learned classifiers. We then present the results of two comprehensive experimental studies. The first study compares the performance of classifiers generated from unbalanced data sets with the performance of classifiers generated from balanced versions of the same data sets. This comparison allows us to isolate and quantify the effect that the training set's class distribution has on learning and contrast the performance of the classifiers on the minority and majority classes. The second study assesses what distribution is "best" for training, with respect to two performance measures: classification accuracy and the area under the ROC curve (AUC). A tacit assumption behind much research on classifier induction is that the class distribution of the training data should match the "natural" distribution of the data. This study shows that the naturally occurring class distribution often is not best for learning, and often substantially better performance can be obtained by using a different class distribution. Understanding how classifier performance is affected by class distribution can help practitioners to choose training datain realworld situations the number of training examples often must be limited due to computational costs or the costs associated with procuring and preparing the data. 1.
Inductive and Bayesian learning in medical diagnosis
 Applied Artificial Intelligence
, 1993
"... Abstract. Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two di erent approaches to machine learning in medical applications are compared: the system for inductive learning of decision trees Assistant, and t ..."
Abstract

Cited by 90 (13 self)
 Add to MetaCart
(Show Context)
Abstract. Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two di erent approaches to machine learning in medical applications are compared: the system for inductive learning of decision trees Assistant, and the naive Bayesian classi er. Both methodologies were tested in four medical diagnostic problems: localization of primary tumor, prognostics of recurrence of breast cancer, diagnosis of thyroid diseases, and rheumatology. The accuracy of automatically acquired diagnostic knowledge from stored data records is compared and the interpretation of the knowledge and the explanation ability of the classi cation process of each system is discussed. Surprisingly, thenaiveBayesian classi er is superior to Assistant in classi cation accuracy and explanation ability, while the interpretation of the acquired knowledge seems to be equally valuable. In addition, two extensions to naive Bayesian classi er are brie y described: dealing with continuous attributes, and discovering the dependencies among attributes.
Modelbased clustering and visualization of navigation patterns on a web site
 Data Mining and Knowledge Discovery
, 2003
"... We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through th ..."
Abstract

Cited by 74 (0 self)
 Add to MetaCart
(Show Context)
We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach weemployis modelbased (as opposed to distancebased) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of rstorder Markov models using the ExpectationMaximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data � and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on usertra c data from msnbc.com. Keywords: Modelbased clustering, sequence clustering, data visualization, Internet, web 1
Probabilistic models for relational data
, 2004
"... We introduce a graphical language for relational data called the probabilistic entityrelationship (PER) model. The model is an extension of the entityrelationship model, a common model for the abstract representation of database structure. We concentrate on the directed version of this model—the di ..."
Abstract

Cited by 65 (0 self)
 Add to MetaCart
We introduce a graphical language for relational data called the probabilistic entityrelationship (PER) model. The model is an extension of the entityrelationship model, a common model for the abstract representation of database structure. We concentrate on the directed version of this model—the directed acyclic probabilistic entityrelationship (DAPER) model. The DAPER model is closely related to the plate model and the probabilistic relational model (PRM), existing models for relational data. The DAPER model is more expressive than either existing model, and also helps to demonstrate their similarity. In addition to describing the new language, we discuss important facets of modeling relational data, including the use of restricted relationships, self relationships, and probabilistic relationships. Many examples are provided.
Optimization by learning and simulation of Bayesian and Gaussian networks
, 1999
"... Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organ ..."
Abstract

Cited by 56 (7 self)
 Add to MetaCart
Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses  organized in the same way as most evolutionary computation heuristics. In opposition to most evolutionary computation paradigms which consider the crossing and mutation operators as essential tools to generate new populations, EDA replaces those operators by the estimation and simulation of the joint probability distribution of the selected individuals. In this work, after making a review of the different approaches based on EDA for problems of combinatorial optimization as well as for problems of optimization in continuous domains, we propose new approaches based on the theory of probabilistic graphical models to solve problems in both domains. More precisely, we propose to adapt algorit...
Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network
 Proc. 1st IEEE Computer Society Bioinformatics Conference
, 2002
"... We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric ..."
Abstract

Cited by 51 (19 self)
 Add to MetaCart
We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is in the estimation of the conditional distribution of each random variable. We consider fitting nonparametric regression models with heterogeneous error variances to the microarray gene expression data to capture the nonlinear structures between genes. A problem still remains to be solved in selecting an optimal graph, which gives the best representation of the system among genes. We theoretically derive a new graph selection criterion from Bayes approach in general situations. The proposed method includes previous methods based on Bayesian networks. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes. 1.
Bayesian perspectives for epidemiological research: Foundations and basic methods.
 International Journal of Epidemiology
, 2006
"... I present some extensions of Bayesian methods to situations in which biases are of concern. First, a basic misclassification problem is illustrated using data from a study of sudden infant death syndrome. Bayesian analyses are then given. These analyses can be conducted directly, or by converting a ..."
Abstract

Cited by 50 (5 self)
 Add to MetaCart
I present some extensions of Bayesian methods to situations in which biases are of concern. First, a basic misclassification problem is illustrated using data from a study of sudden infant death syndrome. Bayesian analyses are then given. These analyses can be conducted directly, or by converting actualdata records to incomplete records and prior distributions to completedata records, then applying missingdata techniques to the augmented data set. The analyses can easily incorporate any complete ('validation' or secondstage) data that might be available, as well as adjustments for confounding and selection bias. The approach illustrates how conventional analyses depend on implicit certainty that bias parameters are null and how these implausible assumptions can be replaced by plausible priors for bias parameters.