Results 1  10
of
575
Bayesian Data Analysis
, 1995
"... I actually own a copy of Harold Jeffreys’s Theory of Probability but have only read small bits of it, most recently over a decade ago to confirm that, indeed, Jeffreys was not too proud to use a classical chisquared pvalue when he wanted to check the misfit of a model to data (Gelman, Meng and Ste ..."
Abstract

Cited by 2194 (63 self)
 Add to MetaCart
I actually own a copy of Harold Jeffreys’s Theory of Probability but have only read small bits of it, most recently over a decade ago to confirm that, indeed, Jeffreys was not too proud to use a classical chisquared pvalue when he wanted to check the misfit of a model to data (Gelman, Meng and Stern, 2006). I do, however, feel that it is important to understand where our probability models come from, and I welcome the opportunity to use the present article by Robert, Chopin and Rousseau as a platform for further discussion of foundational issues. 2 In this brief discussion I will argue the following: (1) in thinking about prior distributions, we should go beyond Jeffreys’s principles and move toward weakly informative priors; (2) it is natural for those of us who work in social and computational sciences to favor complex models, contra Jeffreys’s preference for simplicity; and (3) a key generalization of Jeffreys’s ideas is to explicitly include model checking in the process of data analysis.
ModelBased Clustering, Discriminant Analysis, and Density Estimation
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2000
"... Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little ..."
Abstract

Cited by 573 (29 self)
 Add to MetaCart
Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as \How many clusters are there?", "Which clustering method should be used?" and \How should outliers be handled?". We outline a general methodology for modelbased clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, mineeld detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology, a...
Analyzing Developmental Trajectories: A Semiparametric, GroupBased Approach
 Psychological Methods
, 1999
"... A developmental trajectory describes the course of a behavior over age or time. A groupbased method for identifying distinctive groups of individual trajectories within the population and for profiling the characteristics of group members is demonstrated. Such clusters might include groups of & ..."
Abstract

Cited by 232 (14 self)
 Add to MetaCart
A developmental trajectory describes the course of a behavior over age or time. A groupbased method for identifying distinctive groups of individual trajectories within the population and for profiling the characteristics of group members is demonstrated. Such clusters might include groups of &quot;increasers. &quot; &quot;decreasers,&quot; and &quot;no changers. &quot; Suitably defined probability distributions are used to handle 3 data types—count, binary, and psychometric scale data. Four capabilities are demonstrated: (a) the capability to identify rather than assume distinctive groups of trajectories, (b) the capability to estimate the proportion of the population following each such trajectory group, (c) the capability to relate group membership probability to individual characteristics and circumstances, and (d) the capability to use the group membership probabilities for various other purposes such as creating profiles of group members. Over the past decade, major advances have been made in methodology for analyzing individuallevel developmental trajectories. The two main branches of methodology are hierarchical modeling (Bryk &
On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach
 Data Mining and Knowledge Discovery
, 1997
"... Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in stati ..."
Abstract

Cited by 224 (0 self)
 Add to MetaCart
(Show Context)
Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in statistically invalid conclusions. This is especially true when one is using data mining techniques to analyze very large databases, which inevitably contain some statistically unlikely data. This paper describes several phenomena that can, if ignored, invalidate an experimental comparison. These phenomena and the conclusions that follow apply not only to classification, but to computational experiments in almost any aspect of data mining. The paper also discusses why comparative analysis is more important in evaluating some types of algorithms than for others, and provides some suggestions about how to avoid the pitfalls suffered by many experimental studies.
A Guide to the Literature on Learning Probabilistic Networks From Data
, 1996
"... This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the ..."
Abstract

Cited by 204 (0 self)
 Add to MetaCart
This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The presentation avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples. Keywords Bayesian networks, graphical models, hidden variables, learning, learning structure, probabilistic networks, knowledge discovery. I. Introduction Probabilistic networks or probabilistic gra...
Probabilistic independence networks for hidden Markov probability models
, 1996
"... Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been develop ..."
Abstract

Cited by 193 (13 self)
 Add to MetaCart
(Show Context)
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper contains a selfcontained review of the basic principles of PINs. It is shown that the wellknown forwardbackward (FB) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate the advantages of the general approach.
Trajectories of boys’ physical aggression, opposition, and hyperactivity on the path to physically violent and nonviolent juvenile delinquency.
 Child Development,
, 1999
"... A semiparametric mixture model was used with a sample of 1,037 boys assessed repeatedly from 6 to 15 years of age to approximate a continuous distribution of developmental trajectories for three externalizing behaviors. Regression models were then used to determine which trajectories best predicte ..."
Abstract

Cited by 176 (22 self)
 Add to MetaCart
(Show Context)
A semiparametric mixture model was used with a sample of 1,037 boys assessed repeatedly from 6 to 15 years of age to approximate a continuous distribution of developmental trajectories for three externalizing behaviors. Regression models were then used to determine which trajectories best predicted physically violent and nonviolent juvenile delinquency up to 17 years of age. Four developmental trajectories were identified for the physical aggression, opposition, and hyperactivity externalizing behavior dimensions: a chronic problem trajectory, a high level neardesister trajectory, a moderate level desister trajectory, and a no problem trajectory. Boys who followed a given trajectory for one type of externalizing problem behavior did not necessarily follow the same trajectory for the two other types of behavior problem. The different developmental trajectories of problem behavior also led to different types of juvenile delinquency. A chronic oppositional trajectory, with the physical aggression and hyperactivity trajectories being held constant, led to covert delinquency (theft) only, while a chronic physical aggression trajectory, with the oppositional and hyperactivity trajectories being held constant, led to overt delinquency (physical violence) and to the most serious delinquent acts.
Comparing Dynamic Causal Models
 NEUROIMAGE
, 2004
"... This article describes the use of Bayes factors for comparing Dynamic Causal Models (DCMs). DCMs are used to make inferences about effective connectivity from functional Magnetic Resonance Imaging (fMRI) data. These inferences, however, are contingent upon assumptions about model structure, that is, ..."
Abstract

Cited by 114 (37 self)
 Add to MetaCart
(Show Context)
This article describes the use of Bayes factors for comparing Dynamic Causal Models (DCMs). DCMs are used to make inferences about effective connectivity from functional Magnetic Resonance Imaging (fMRI) data. These inferences, however, are contingent upon assumptions about model structure, that is, the connectivity pattern between the regions included in the model. Given the current lack of detailed knowledge on anatomical connectivity in the human brain, there are often considerable degrees of freedom when defining the connectional structure of DCMs. In addition, many plausible scientific hypotheses may exist about which connections are changed by experimental manipulation, and a formal procedure for directly comparing these competing hypotheses is highly desirable. In this article, we show how Bayes factors can be used to guide choices about model structure, both with regard to the intrinsic connectivity pattern and the contextual modulation of individual connections. The combined use of Bayes factors and DCM thus allows one to evaluate competing scientific theories about the architecture of largescale neural networks and the neuronal interactions that mediate perception and cognition.
Bayesian Networks for Data Mining
, 1997
"... A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data modeling. One, because the model encodes dependencies among all variables, it readil ..."
Abstract

Cited by 103 (0 self)
 Add to MetaCart
A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data modeling. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesiannetwork methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphicalmodeling approach using a realworld case study.