Results 1 - 10
of
29
Convergence results for the EM Approach to Mixtures of Experts Architectures
- NEURAL NETWORKS
, 1995
"... The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architectur ..."
Abstract
-
Cited by 89 (6 self)
- Add to MetaCart
The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992). They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent. In the current paper we provide a theoretical analysis of this algorithm. We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood. We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate. In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.
Estimating mixtures of regressions
"... In this paper, we show how Bayesian inference for switching regression models and their generalisations can be achieved by the specification of loss functions which overcome the label switching problem common to all mixture models. We also derive an extension to models where the number of components ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
In this paper, we show how Bayesian inference for switching regression models and their generalisations can be achieved by the specification of loss functions which overcome the label switching problem common to all mixture models. We also derive an extension to models where the number of components in the mixture is unknown, based on the birthand -death technique developed in Stephens (2000a). The methods are illustrated on various real datasets.
Curve clustering with random effects regression mixtures
- Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics
, 2003
"... In this paper we address the problem of clustering sets of curve or trajectory data generated by groups of objects or individuals. The focus is to model curve data directly using a set of model-based curve clustering algorithms referred to as mixtures of regressions or regression mixtures (DeSarbo a ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
In this paper we address the problem of clustering sets of curve or trajectory data generated by groups of objects or individuals. The focus is to model curve data directly using a set of model-based curve clustering algorithms referred to as mixtures of regressions or regression mixtures (DeSarbo and Cron, 1988; Gaffney and Smyth, 1999). Our proposed methodology concerns an extension to regression mixtures that we shall call random effects regression mixtures which integrates linear random effects models (Laird and Ware, 1982) with standard regression mixtures. We develop an explicit maximum a posteriori or MAP-based EM algorithm for random effects regression mixtures and show an application to the clustering of cyclone data using these methods. 1
The unicorn, the normal curve, and other improbable creatures
- Psychological Bulletin
, 1989
"... An investigation of the distributional characteristics of 440 large-sample achievement and psychometric measures found all to be significantly nonnormal at the alpha.01 significance level. Several classes of contamination were found, including tail weights from the uniform to the double exponential, ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
An investigation of the distributional characteristics of 440 large-sample achievement and psychometric measures found all to be significantly nonnormal at the alpha.01 significance level. Several classes of contamination were found, including tail weights from the uniform to the double exponential, exponential-level asymmetry, severe digit preferences, multimodalities, and modes external to the mean/median interval. Thus, the underlying tenets of normality-assuming statistics appear fallacious for these commonly used types of data. However, findings here also fail to support the types of distributions used in most prior robustness research suggesting the failure of such statistics under nonnormal conditions. A reevaluation of the statistical robustness literature appears appropriate in light of these findings. 1 During recent years a considerable literature devoted to robust statistics has appeared. This research reflects a growing concern among statisticians regarding the robustness, or insensitivity, of parametric statistics to violations of their underlying assumptions. Recent findings suggest that the most commonly used of these statistics exhibit varying degrees of nonrobustness to certain violations of the normality assumption. Although the importance of such findings is underscored by numerous empirical studies documenting nonnormality in a variety of fields, a startling lack of such evidence exists for achievement
Semiparametric estimation of a two-component mixture model
- Annals of Statistics
, 2006
"... Suppose that univariate data are drawn from a mixture of two distributions that are equal up to a shift parameter. Such a model is known to be nonidentifiable from a nonparametric viewpoint. However, if we assume that the unknown mixed distribution is symmetric, we obtain the identifiability of this ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Suppose that univariate data are drawn from a mixture of two distributions that are equal up to a shift parameter. Such a model is known to be nonidentifiable from a nonparametric viewpoint. However, if we assume that the unknown mixed distribution is symmetric, we obtain the identifiability of this model, which is then defined by four unknown parameters: the mixing proportion, two location parameters and the cumulative distribution function of the symmetric mixed distribution. We propose estimators for these four parameters when no training data is available. Our estimators are shown to be strongly consistent under mild regularity assumptions and their convergence rates are studied. Their finite-sample properties are illustrated by a Monte Carlo study and our method is applied to real data.
Interpretation and inference in mixture models: Simple MCMC works
- Journal of Econometrics
, 2007
"... The mixture model likelihood function is invariant with respect to permutation of the components of the mixture. If functions of interest are permutation sensitive, as in classification applications, then interpretation of the likelihood function requires valid inequality constraints and a very larg ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The mixture model likelihood function is invariant with respect to permutation of the components of the mixture. If functions of interest are permutation sensitive, as in classification applications, then interpretation of the likelihood function requires valid inequality constraints and a very large sample may be required to resolve ambiguities. If functions of interest are permutation invariant, as in prediction applications, then there are no such problems of interpretation. Contrary to assessments in some recent publications, simple and widely used Markov chain Monte Carlo (MCMC) algorithms with data augmentation reliably recover the entire posterior distribution. 1 1
Probabilistic Curve-Aligned Clustering and Prediction with Regression Mixture Models
- Ph.D. Dissertation, 2004. Laboratoire MAS
, 2004
"... in quality ..."
MCMC Estimation of Classical and Dynamic Switching and Mixture Models
, 1998
"... In the present paper we discuss Bayesian estimation of a very general model class where the distribution of the observations is assumed to depend on a latent mixture or switching variable taking values in a discrete state space. This model class covers e.g. ønite mixture modelling, Markov switching ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
In the present paper we discuss Bayesian estimation of a very general model class where the distribution of the observations is assumed to depend on a latent mixture or switching variable taking values in a discrete state space. This model class covers e.g. ønite mixture modelling, Markov switching autoregressive modelling and dynamic linear models with switching. Joint Bayesian estimation of all latent variables, model parameters and parameters determining the probability law of the switching variable is carried out by a new Markov Chain Monte Carlo method called permutation sampling. Estimation of switching and mixture models is known to be faced with identiøability problems as switching and mixture are identiøable only up to permutations of the indices of the states. For a Bayesian analysis the posterior has to be constrained in such a way that identiøablity constraints are fulølled. The permutation sampler is designed to sample eOEciently from the constrained posterior, by ørst sam...
Models And Methods For Clusterwise Linear Regression
- Proceedings in Computational Statistics
, 1999
"... : Three models for linear regression clustering are given, and corresponding methods for classification and parameter estimation are developed and discussed: The mixture model with fixed regressors (ML-estimation), the fixed partition model with fixed regressors (ML-estimation), and the mixture mode ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
: Three models for linear regression clustering are given, and corresponding methods for classification and parameter estimation are developed and discussed: The mixture model with fixed regressors (ML-estimation), the fixed partition model with fixed regressors (ML-estimation), and the mixture model with random regressors (Fixed Point Clustering). The number of clusters is treated as unknown. The approaches are compared via an application to Fisher's Iris data. By the way, a broadly ignored feature of these data is discovered. 1 Introduction Cluster analysis problems based on stochastic models can be divided into two classes: 1. A cluster is considered as a subset of the data points, which can be modeled adequately by a distribution from a class of cluster reference distributions (c.r.d.). These distributions are chosen to reflect the meaning of homogeneity with respect to the certain data analysis problem. Therefore c.r.d. are often unimodal. If the class of c.r.d. is parametric, th...

