37 citations found. Retrieving documents...
G. McLachlan and K. Basford. Mixture Models. Marcel Dekker, New York, 1988.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Heuristic Classifier Performance Bounds in High Dimensional.. - Baggenstoss   (Correct)

.... data leading some researchers to avoid it [17] or constrain the covariances of the kernels to be identical [19] or of uniform size with variable rotation [14] Adding to the covariance estimates based on a Bayesian prior density argument is the preferred method of dealing with the problem [20] [21]. This involves simply adding a diagonal matrix, representing an independent measurement noise prior, to the kernel covariances at each iteration. We have obtained excellent results with this method. When the PDF is estimated by optimizing the likelihood function, such as an EM algorithm, the ....

G. J. McLachlan, Mixture Models. Dekker, 1988.


Using Unlabeled Data to Improve Text Classification - Nigam (2001)   (10 citations)  (Correct)

....(Little, 1977) was recognized immediately. Since then, this approach continues to be used and studied (McLachlan Ganesalingam, 1982; Ganesalingam, 1989; Shahshahani Landgrebe, 1994) Three excellent surveys of the history of EM and its application to mixture modeling are the books by McLachlan and Basford (1988), McLachlan and Krishnan (1997) and McLachlan and Peel (2000) Using likelihood maximization of mixture models for combining labeled and unlabeled data for classification has only recently made its way to the machine learning 99 community (Miller Uyar, 1996; Nigam et al. 1998; Baluja, 1999) ....

McLachlan, G., & Basford, K. (1988). Mixture models. New York: Marcel Dekker.


Learning with Labeled and Unlabeled Data - Seeger (2001)   (28 citations)  (Correct)

....xju. The functional relationships are represented either by linear or by more powerful nonlinear models, in the latter case the model family is tightly regularized by an appropriate prior P ( on the model parameter . The noise model is usually a Gaussian. Other examples are mixture models (e.g. [58], 90] 69] where the latent variable is a grouping variable from a nite set (similar to the class label in supervised classi cation) and the conditional models come from simple families such as Gaussians with structurally restricted covariance matrices. Combinations of mixture and latent ....

....EM equations then parallels very closely the case of mixture models, which can be found in many textbooks, e.g. 11] 2 BASELINE METHODS 18 Indeed, using EM to ll in labels on D u has already been suggested very early, namely in a note by R. J. Little in the discussion of [26] Chapter 1. 11 of [58] gives the idea and further references, however it is not clear whether the authors suggest using the approach for classi cation or merely for partially unsupervised learning, where unsupervised tting of a mixture model to P (x) is aided by a few labeled points D l . It has been used to attack ....

[Article contains additional citation context not shown here]

G. McLachlan and K. Basford. Mixture Models. Marcel Dekker, New York, 1988.


Mixture Codebook Classification - Part 1: Method Outline - Langaas (1995)   (Correct)

....finite mixtures. The problem of constructing classification rules is not addressed in this section. 3.1 One and Two Level Mixture Models We will in this section define one and two level mixtures and the mixture codebook, M. General references to mixture models are Everitt and Hand (1981) McLachlan and Basford (1988) and Titterington et al. 1985) 5 MC 2 Mixture Codebook Classification [1] 3.1.1 One Level Mixture Models Assume that X has a finite mixture distribution defined as p(x) K X l=1 p(c l )f l (xj l ) 1) with mixing weights p(c 1 ) p(c K ) and component densities f 1 (xj 1 ) ....

....The problem of choosing K is not addressed in this report. 3. 2 Estimating the Parameters and Weights of a Mixture The problem of estimating the parameters and weight of a one level mixture is one of the oldest estimation problems in the statistical literature, see Everitt and Hand (1981) and McLachlan and Basford (1988) for an historical account. Estimation is difficult even for a one level mixture of two univariate Gaussian component distributions, where the likelihood surface is littered with singularities. Karl Pearson used in 1894 the method of moments to estimate the five parameters and weights in a mixture ....

McLachlan, G. J. and Basford, K. E. (1988), Mixture Models, Inference and Applications to Clustering, New York: Marcel Dekker.


Feature Subset Selection and Order Identification for.. - Dy, Brodley (2000)   (16 citations)  (Correct)

.... Subset Selection wrapped around EM clustering) and FSSEM k (FSSEM with order identification) In this paper, the term EM clustering represents the expectation maximization (EM) algorithm (Dempster et al. 1977) applied to estimating the maximum likelihood parameters of a finite Gaussian mixture (McLachlan Basford, 1988). Although we apply the wrapper approach to EM clustering, it can be applied to any clustering method. 2. Unsupervised Feature Selection Literature To maintain the wrapper filter model distinction used to characterize feature subset selection in supervised learning, we define the wrapper ....

....In the experiments reported, we applied sequential forward search. In the future, we plan to explore the effect of other search methods on FSSEM. Note that EM is initialized for each new feature subset. In this paper, we assume that the data comes from a mixture model of multivariate Gaussians (McLachlan Basford, 1988). We apply the EM algorithm to estimate the maximum likelihood mixture model parameters and the cluster probabilities of each data point. EM clustering results in soft clusters (i.e. each data point belongs to every cluster with some probability) Note that the framework introduced in this ....

[Article contains additional citation context not shown here]

McLachlan, G. J., & Basford, K. E. (1988). Mixture models, inference and applications to clustering.


Performance Comparison of Smoothing and Gamma Priors for.. - Hsiao, Wang, Gindi (1999)   (Correct)

....optimization in (9) is to use an alternating algorithm: At each iteration k, solve for given current estimates of , then solve for given the latest estimate. This calculation is intractable, but may be made tractable by using an EM algorithm and an appropriate complete data space [10]. The complete data is given by Z aj , which may be interpreted as the probability that pixel j belongs to class a. Note that the complete data satisfies 0 Z aj 1, P a Z aj = 1 and a = 1=N P j Z aj . In terms of objective functions, the problem becomes: L ( jg) mix P ( ....

G.J. McLachlan and K.E. Basford, Mixture Models, Marcel Dekker, 1987.


Joint-MAP Reconstruction/Segmentation for Transmission.. - Hsiao, Rangarajan, Gindi (1998)   (Correct)

....the form of an indicator variable Z = fZ ai g #a=1; L;i=1; N# is introduced, where Z ai = # 1 : pixel i belongs to class a 0 : otherwise (8) with P a Z ai = 1. Note that Z ai can be also viewed as a segmentation of #. The log likelihood function for the complete data turns out to be [11] log L c = X i X a Z ai log## a p## i j# a ## (9) Now, let s perform the E step of EM by taking the expectation of (9) over Z ai by given # k =## k ;# k # and #, E z #log L c j# k ;## = X ai log## k a p## i j# k a ##EZ #Z ai j# i ; # k # = Q##j# k # (10) where EZ #Z ....

G.J. McLachlan and K.E. Basford, Mixture Models, Marcel Dekker, 1987.


Automating the Construction of Internet Portals with.. - McCallum, Nigam..   (29 citations)  (Correct)

....computer science hierarchy would allow the unlabeled documents to benefit classification more. However, even without a complete hierarchy, we could use these documents if we could identify these outliers. Some techniques for robust estimation with EM are discussed by McLachlan and Basford (McLachlan Basford, 1988). One specific technique for these text hierarchies is to add extra leaf nodes containing uniform word distributions to each interior node of the hierarchy in order to capture documents not belonging in any of the predefined topic leaves. This should allow EM to perform well even when a large ....

McLachlan, G., & Basford, K. (1988). Mixture Models. Marcel Dekker, New York.


MDL-Based Selection of the Number of Components in Mixture .. - Tenmoto, Kudo, Shimbo (1998)   (9 citations)  (Correct)

.... [1] In this situation, two problems arise: 1) How should we select the number of components and 2) How should we construct initial components The first problem is crucial but difficult, and there have been many efforts made to resolve this problem (these efforts are discussed in reference [2]) For example, Ichimura [3] has proposed a method for selecting the number of components based on information criteria. In this method, the optimal number of components is selected on the basis of the MDL principle [4] in a trade off between the likelihood of the model to the training samples and ....

McLachlan, G. J., Basford, K. E.: Mixture Models. Marcel Dekker, Inc., New York (1988) 21--29


From Isolation to Cooperation: An Alternative View of a.. - Schaal, Atkeson (1995)   (15 citations)  (Correct)

....learning rather slowly. In incremental learning, another potential danger arises when the input distribution of the data changes. The expert selection system usually makes either implicit or explicit prior assumptions about the input data distribution. For example, in the classical mixture model (McLachlan Basford, 1988) which was employed in several local expert approaches, the prior probabilities of each mixture model can be interpreted as the fraction of data points each expert expects to experience. Therefore, a change in input distribution will cause all experts to change their domains of expertise in order ....

McLachlan, G. J., & Basford, K. E. (1988). Mixture models. New York: Marcel Dekker.


Analysis of Three-Dimensional Protein Images - Leherte, al. (1997)   (3 citations)  (Correct)

.... analysis approach to the problem, we decided to treat the estimation and combination issues together by fitting mixtures of continuous distributions to the data for each class, under the conditional independence assumption commonly used in mixture model approaches to classification and clustering (McLachlan Basford, 1988). In a latent class analysis approach to finding structure in a set of datapoints, one begins with an underlying parameterized model. For example, one might posit that a set of points represented by a 2D scatterplot was generated by a 2D Gaussian (normal) distribution, with means 1 ; 2 and ....

McLachlan, G., & Basford, K. (1988). Mixture Models. Inference and Applications to Clustering.


Bootstrapping for Text Learning Tasks - Jones, McCallum, Nigam, Riloff (1999)   (12 citations)  (Correct)

....estimation. A more inclusive computer science hierarchy would allow the unlabeled documents to benefit classification more. However, even without a complete hierarchy, we could use these documents if we could identify these outliers. Some techniques for robust estimation with EM are discussed by McLachlan and Basford [ 1988 ] One specific technique for these text hierarchies is to add extra leaf nodes containing uniform word distributions to each interior node of the hierarchy in order to capture documents not belonging in any of the predefined topic leaves. This should allow EM to perform well even when a large ....

G.J. McLachlan and K.E. Basford. Mixture Models. Marcel Dekker, New York, 1988.


SIGMA: Integrating Learning Techniques in Computational.. - Grigoris Karakoulas   (1 citation)  (Correct)

....in coping with the intricacies of the IF task may inherently be limited. Within the statistics and the computational learning communities, the idea of distributing a learning problem among a set of local experts has been proposed for robustness against partial observability and non stationarity (McLachlan Basford 1988; Jordan Jacobs 1994) These local experts compete with each other in order to acquire local expertise in regions of the input space which may be overlapping. In the models so developed, gating of the experts is fixed and often depends on prior assumptions about the input data distributions. ....

McLachlan, G.; and Basford, K. 1988. Mixture models. Marcel Dekker.


MML mixture modelling of multi-state, Poisson, von Mises.. - Wallace, Dowe (1997)   (Correct)

.... Bernoulli sampling) 6 Alternative mixture modelling programs The first Snob program (since out dated) 37] was possibly the first program for Gaussian mixture modelling, although many statistical and machine learning approaches to this problem have been developed since (e.g. McLachlan et al.[23, 22], D. Fisher s CobWeb[17] Discussions of early alternative algorithms for Gaussian mixture modelling have been given by Boulton[3] 6.1 Comparison with AutoClass II Like Snob, AutoClass II [10] assumes 6 a prior distribution over the number of classes and independent prior densities over the ....

G.J. McLachlan and K.E. Basford. Mixture Models. Marcel Dekker, New York, 1988.


Continuous Gaussian Mixture Modeling - Stephen Aylward   (Correct)

....FGMMs offer poor consistency. This inconsistency is aggravated by the reliance on the user to specify the number of components. While much research has focused on automatically determining an appropriate number of components for a given problem, a generally applicable approach has not be found [9, 18]. A FGMM s expected accuracy does not vary monotonically as a function of the number of components. Additionally, MLEM s non optimal maxima can lead to poorly utilized components; the effective number of components in an FGMM may be less than the user specified number of components. GPGDs are ....

McLachlan, G.J. and Basford, K.E., Mixture Models. Marcel Dekker, Inc., New York, vol. 84, 1988 p. 253


Neural Networks and Statistical Models - Sarle (1994)   (31 citations)  (Correct)

.... algorithms have been developed in statistics, numerical taxonomy, and many other fields, as described in countless articles and numerous books such as Everitt (1980) Massart and Kaufman (1983) Anderberg (1973) Sneath and Sokal (1973) Hartigan (1975) Titterington, Smith, and Makov (1985) McLachlan and Basford (1988), Kaufmann and Rousseeuw (1990) and Spath (1980) In adaptive vector quantization (AVQ) the inputs are acknowledged to be target values that are predicted by the means of the cluster to which a given observation belongs. This network is therefore essentially the same as that in Figure 12 except ....

McLachlan, G.J. and Basford, K.E. (1988), Mixture Models, New York: Marcel Dekker, Inc.


Empirical Risk Approximation: An Induction Principle for.. - Buhmann (1998)   (4 citations)  (Correct)

....can be answered with a robust probabilistic model of the data. The ultimate success of unsupervised learning is achieved if the data can be generated from the learned model, i.e. when the learning algorithm has infered a generative model of the data [Hinton and Ghahramani, 1997] Mixture models [McLachlan and Basford, 1988], e.g. Gaussian mixtures are prominent examples of this concept of unsupervised learning. Helmholtz machines [Dayan et al. 1995] Generative Topographic Maps [Bishop et al. 1998] and various other neural network models [Poggio and Girosi, 1990, Jordan and Jacobs, 1994, Bishop, 1995] also belong ....

McLachlan, G. J. and Basford, K. E. (1988). Mixture Models. Marcel Dekker, INC, New York, Basel.


Using EM to Classify Text from Labeled and Unlabeled Documents - Nigam (1998)   (4 citations)  (Correct)

....l j = 1. If there is no labeled data, unlabeled data cannot improve classification, as shown in [ Castelli and Cover, 1995 ] If there is infinite amounts of labeled data, all parameters can be recovered with probability 1 from the labeled data and the resulting classifier is Bayes optimal [ McLachlan and Basford, 1988 ] thus, further unlabeled data cannot improve the classification accuracy. Note that our argument does not immediately motivate an algorithm for extracting the information from the unlabeled data. Additionally, it not show that better parameter estimation will yield better classification. The ....

....From this it follows that the parameter estimation error j Gamma j converges to zero at the rate O(1= p jD l j) ffl Infinite unlabeled data. If infinite amounts of unlabeled data are available, however, the parameters of the mixture components can be recovered from the unlabeled data [ McLachlan and Basford, 1988 ] but not the assignment of mixture components to classes. Thus, the estimation problem reduces to the problem of learning a permutation matrix, which assigns labels to the different mixture components. Without any labeled data, this permutation cannot be found, and thus, although the parameters ....

G.J. McLachlan and K.E. Basford. Mixture Models. Marcel Dekker, New York, 1988.


From Isolation to Cooperation: An Alternative View of a.. - Schaal, Atkeson (1996)   (15 citations)  (Correct)

....rather slowly. In incremental learning, another potential danger arises when the input distribution of the data changes. The expert selection system usually makes either 2 implicit or explicit prior assumptions about the input data distribution. For example, in the classical mixture model (McLachlan Basford, 1988) which was employed in several local expert approaches, the prior probabilities of each mixture model can be interpreted as the fraction of data points each expert expects to experience. Therefore, a change in input distribution will cause all experts to change their domains of expertise in order ....

McLachlan, G. J., & Basford, K. E. (1988). Mixture models. New York: Marcel Dekker.


Joint-MAP Reconstruction/Segmentation for Transmission.. - Hsiao, Rangarajan, Gindi (1998)   (Correct)

....in the form of an indicator variable Z = fZ ai g (a=1; L;i=1; N) is introduced, where Z ai = ae 1 : pixel i belongs to class a 0 : otherwise (8) with P a Z ai = 1. Note that Z ai can be also viewed as a segmentation of . The log likelihood function for the complete data turns out to be [11] log L c = X i X a Z ai log( a p( i j a ) 9) Now, let s perform the E step of EM by taking the expectation of (9) over Z ai by given Phi k = k ; k ) and , E z [log L c j Phi k ; X ai log( k a p( i j k a ) EZ [Z ai j i ; Phi k ] Q[ Phij Phi k ] 10) ....

G.J. McLachlan and K.E. Basford, Mixture Models, Marcel Dekker, 1987.


Learning to Classify Text from Labeled and Unlabeled Documents - Nigam (1998)   (61 citations)  (Correct)

....cases: jD l j = 0 and jD l j = 1. If there is no labeled data, unlabeled data cannot improve classification, as shown in [1] If there is infinite amounts of labeled data, all parameters can be recovered with probability 1 from the labeled data and the resulting classifier is Bayes optimal [29]; thus, further unlabeled data cannot improve the classification accuracy. Note that our argument does not immediately motivate an algorithm for extracting the information from the unlabeled data. Additionally, it not show that better parameter estimation will yield better classification. The ....

....rate O(1= p jD l j) Likewise, the classification error converges to the Bayes optimal classifier at the same rate. ffl Infinite unlabeled data. If infinite amounts of unlabeled data are available, however, the parameters of the mixture components can be recovered from the unlabeled data [29], but not the assignment of mixture components to classes. Thus, the estimation problem reduces to the problem of learning a permutation matrix, which assigns labels to the different mixture components. Without 18 NIGAM, MCCALLUM, THRUN AND MITCHELL any labeled data, this permutation cannot be ....

G.J. McLachlan and K.E. Basford. Mixture Models. Marcel Dekker, New York, 1988.


Learning to Classify Text from Labeled and Unlabeled.. - Nigam, McCallum, Thrun, .. (1998)   (61 citations)  (Correct)

....However, this does not show that unlabeled data aids the reduction of classification error. For example, unlabeled data does not help if there is already an infinite amount of labeled data; all parameters can be recovered from just the labeled data and the resulting classifier is Bayes optimal (McLachlan Basford 1988). With an infinite amount of unlabeled data and no labeled data, the parameters can be estimated except classes cannot be matched with components, so classification error remains unimproved. But, with infinite unlabeled data and finite labeled data, there is classification improvement. With ....

McLachlan, G., and Basford, K. 1988. Mixture Models.


Text Classification from Labeled and Unlabeled.. - Nigam, Mccallum.. (1999)   (119 citations)  (Correct)

....error at the rate 1= p n. The picture changes at the other extreme when unlimited unlabeled data are available, along with the n labeled samples. It is well known that unlabeled data alone, when generated from a mixture of two Gaussians, are sufficient to recover the original mixture components (McLachlan Basford, 1988). Thus, the means and covariance matrices of the Gaussians can be recovered along with the mixing parameter between both Gaussians; however, it is impossible to assign class labels to each of the Gaussians without any labeled data. Thus, the remaining learning problem is the problem of assigning ....

....text classification. Some of the results above have been shown to generalize to the multinomial case as well. When there is an infinite amount of data, as above, the multinomial mixture model parameters can be recovered with certainty, except for the mapping between mixture components and classes (McLachlan Basford, 1988). This implies that the exponential learning rate discussed by Castelli and Cover (1995) still holds. In the case where there are no labeled data, the rate of convergence of the parameter estimation error is also O(1= p n) Devroye, Gyorfi, Lugosi, 1996) We do not know at what rate the ....

McLachlan, G., & Basford, K. (1988). Mixture Models. Marcel Dekker, New York.


Empirical Risk Approximation: An Induction Principle for.. - Buhmann (1998)   (4 citations)  (Correct)

....can be answered with a robust probabilistic model of the data. The ultimate success of unsupervised learning is achieved if the data can be generated from the learned model, i.e. when the learning algorithm has infered a generative model of the data [Hinton and Ghahramani, 1997] Mixture models [McLachlan and Basford, 1988], e.g. Gaussian mixtures are prominent examples of this concept of unsupervised learning. Helmholtz machines [Dayan et al. 1995] Generative Topographic Maps [Bishop et al. 1998] and various other neural network models [Poggio and Girosi, 1990, Jordan and Jacobs, 1994, Bishop, 1995] also belong ....

McLachlan, G. J. and Basford, K. E. (1988). Mixture Models. Marcel Dekker, INC, New York, Basel.


Text Classification from Labeled and Unlabeled.. - Nigam, McCallum.. (1999)   (119 citations)  (Correct)

....to improve a classifier by treating the unclassified data as incomplete is mentioned by R. J. A. Little among the published responses to the original EM paper (Dempster et al. 1977) A discussion of this partial classification paradigm and descriptions of further references can be found in McLachlan and Basford s book on mixture models (1988, page 29) 28 NIGAM, MCCALLUM, THRUN AND MITCHELL Two recent studies in the machine learning literature have used EM to combine labeled and unlabeled data for classification (Miller Uyar, 1997; Shahshahani Landgrebe, 1994) Instead of naive Bayes, Shahshahani and Landgrebe use a mixture of ....

McLachlan, G., & Basford, K. (1988). Mixture Models. Marcel Dekker, New York.


Document Preprocessing for Naive Bayes.. - Pavlov.. (2004)   (Correct)

No context found.

G. McLachlan and K. Basford. Mixture Models. Marcel Dekker, New York, 1988.


Sequence Modeling with Mixtures of Conditional Maximum.. - Dmitry Pavlov Yahoo (2003)   (Correct)

No context found.

G. McLachlan and K. Basford. Mixture Models. Marcel Dekker, New York, 1988.


Unsupervised Learning Using MML - Jonathan Oliver Computer (1996)   (21 citations)  (Correct)

No context found.

G.J. McLachlan and K.E. Basford. Mixture Models. Marcel Dekker, New York, 1988.


Mixture Modeling for Digital Mammogram Display and Analysis - Aylward, Hemminger, Pisano (1998)   (Correct)

No context found.

McLachlan GJ, Basford KE (1988) Mixture Models. New York, Marcel Dekker, Inc.


Continuous Gaussian Mixture Modeling - Stephen Aylward And (1997)   (Correct)

No context found.

McLachlan, G.J. and Basford, K.E., Mixture Models. Marcel Dekker, Inc., New York, vol. 84, 1988 p. 253


Joint-MAP Reconstruction/Segmentation for Transmission.. - Hsiao, Rangarajan, Gindi (1998)   (Correct)

No context found.

G.J. McLachlan and K.E. Basford, Mixture Models, Marcel Dekker, 1987.


An Empirical Comparison of Four Initialization Methods.. - Pena, Lozano, Larranaga (1999)   (15 citations)  (Correct)

No context found.

McLachlan, G.J. and Basford, K.E. (1988). Mixture Models. Marcel Dekker, Inc., New York, NY.


Annealed Competition of Experts for a Segmentation and .. - Pawelzik, Kohlmorgen, .. (1996)   (30 citations)  (Correct)

No context found.

McLachlan, G.J., Basford, K.J. (1988), Mixture models, Marcel Dekker, NY and Basel.


Unsupervised Learning Using MML - Oliver, Baxter, Wallace (1996)   (21 citations)  (Correct)

No context found.

G.J. McLachlan and K.E. Basford. Mixture Models. Marcel Dekker, New York, 1988.


Text Classification by Bootstrapping with Keywords, EM and.. - McCallum, Nigam (1999)   (10 citations)  (Correct)

No context found.

G.J. McLachlan and K.E. Basford. 1988. Mixture Models.


Inference in Model-Based Cluster Analysis - Bensmail, Celeux, Raftery, Robert (1997)   (8 citations)  (Correct)

No context found.

McLachlan, G. J. and Basford K. E. (1989), Mixture Models, Inference and Applications to Clustering, New York, Marcel Dekker.


A Theory of Proximity Based Clustering: Structure.. - Puzicha, Hofmann.. (1999)   (9 citations)  (Correct)

No context found.

G. McLachlan and K. Basford. Mixture Models. Marcel Dekker, INC, New York, Basel, 1988.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC