Results 1  10
of
9,407
Hierarchical mixtures of experts and the EM algorithm
, 1993
"... We present a treestructured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood ..."
Abstract

Cited by 885 (21 self)
 Add to MetaCart
We present a treestructured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood
Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties
, 2001
"... Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized ..."
Abstract

Cited by 948 (62 self)
 Add to MetaCart
likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty
A maximum likelihood approach to continuous speech recognition
 IEEE Trans. Pattern Anal. Machine Intell
, 1983
"... AbstractSpeech recognition is formulated as a problem of maximum likelihood decoding. This formulation requires statistical models of the speech production process. In this paper, we describe a number of statistical models for use in speech recognition. We give special attention to determining the ..."
Abstract

Cited by 477 (9 self)
 Add to MetaCart
AbstractSpeech recognition is formulated as a problem of maximum likelihood decoding. This formulation requires statistical models of the speech production process. In this paper, we describe a number of statistical models for use in speech recognition. We give special attention to determining
Minimum Error Rate Training in Statistical Machine Translation
, 2003
"... Often, the training procedure for statistical machine translation models is based on maximum likelihood or related criteria. A general problem of this approach is that there is only a loose relation to the final translation quality on unseen text. In this paper, we analyze various training cri ..."
Abstract

Cited by 757 (7 self)
 Add to MetaCart
Often, the training procedure for statistical machine translation models is based on maximum likelihood or related criteria. A general problem of this approach is that there is only a loose relation to the final translation quality on unseen text. In this paper, we analyze various training
Additive Logistic Regression: a Statistical View of Boosting
 Annals of Statistics
, 1998
"... Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of the most important recent developments in classification methodology. The performance of many classification algorithms can often be dramatically improved by sequentially applying them to reweighted versions of the input dat ..."
Abstract

Cited by 1750 (25 self)
 Add to MetaCart
data, and taking a weighted majority vote of the sequence of classifiers thereby produced. We show that this seemingly mysterious phenomenon can be understood in terms of well known statistical principles, namely additive modeling and maximum likelihood. For the twoclass problem, boosting can
A Maximum Entropy approach to Natural Language Processing
 COMPUTATIONAL LINGUISTICS
, 1996
"... The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we des ..."
Abstract

Cited by 1366 (5 self)
 Add to MetaCart
describe a method for statistical modeling based on maximum entropy. We present a maximumlikelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.
A View Of The Em Algorithm That Justifies Incremental, Sparse, And Other Variants
 Learning in Graphical Models
, 1998
"... . The EM algorithm performs maximum likelihood estimation for data in which some variables are unobserved. We present a function that resembles negative free energy and show that the M step maximizes this function with respect to the model parameters and the E step maximizes it with respect to the d ..."
Abstract

Cited by 993 (18 self)
 Add to MetaCart
estimation problem. A variant of the algorithm that exploits sparse conditional distributions is also described, and a wide range of other variant algorithms are also seen to be possible. 1. Introduction The ExpectationMaximization (EM) algorithm finds maximum likelihood parameter estimates in problems
A gentle tutorial on the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models
, 1997
"... We describe the maximumlikelihood parameter estimation problem and how the Expectationform of the EM algorithm as it is often given in the literature. We then develop the EM parameter estimation procedure for two applications: 1) finding the parameters of a mixture of Gaussian densities, and 2) fi ..."
Abstract

Cited by 693 (4 self)
 Add to MetaCart
We describe the maximumlikelihood parameter estimation problem and how the Expectationform of the EM algorithm as it is often given in the literature. We then develop the EM parameter estimation procedure for two applications: 1) finding the parameters of a mixture of Gaussian densities, and 2
Approximating discrete probability distributions with dependence trees
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1968
"... A method is presented to approximate optimally an ndimensional discrete probability distribution by a product of secondorder distributions, or the distribution of the firstorder tree dependence. The problem is to find an optimum set of n1 first order dependence relationship among the n variables ..."
Abstract

Cited by 881 (0 self)
 Add to MetaCart
variables. It is shown that the procedure derived in this paper yields an approximation of a minimum difference in information. It is further shown that when this procedure is applied to empirical observations from an unknown distribution of tree dependence, the procedure is the maximumlikelihood estimate
The Evolution of Social and Economic Networks
 JOURNAL OF ECONOMIC THEORY 106, 265–295
, 2002
"... We examine the dynamic formation and stochastic evolution of networks connecting individuals. The payoff to an individual from an economic or social activity depends on the network of connections among individuals. Over time individuals form and sever links connecting themselves to other individuals ..."
Abstract

Cited by 889 (37 self)
 Add to MetaCart
individuals based on the improvement that the resulting network offers them relative to the current network. In addition to intended changes in the network there is a small probability of unintended changes or errors. Predictions can be made regarding the likelihood that the stochastic process will lead
Results 1  10
of
9,407