27 citations found. Retrieving documents...
J. J. Oliver, R. A. Baxter, and C. S. Wallace. Unsupervised Learning using MML. In Machine Learning: Proceedings of the Thirteenth International Conference (ICML 96), pages 364--372. Morgan Kaufmann Publishers, 1996.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Unsupervised Selection and Estimation of Finite Mixture Models - Figueiredo, Jain (2000)   (1 citation)  (Correct)

....for FM obtain a set of candidate models (usually by EM) for a range of values of k, and then select one according to b k = arg min fC( b (k) k) k=1; k max g; 3) where C( b (k) k) is some model selection criterion. Several of these methods (see references comparisons in [5, 6]) have good model selection performance, but a major drawback remains: a whole set of k max candidate models has to be obtained, and well known problems associated with EM emerge. a) EM is highly dependent on initialization; a common (time consuming) solution uses several random starts, and then ....

....zero and the corresponding component becomes singular (unbounded likelihood) when the value of k is larger than the optimal true one, this may happen frequently. 2. Proposed Approach 2.1. The Proposed Criterion The minimum description length (MDL, 7] and minimum message length (MML, [8, 6]) are two well known criteria which have been successfully used for FM model selection [5, 6] However, the approach has been the one in Eq. 3) suffering from the draw backs mentioned above. To bypass these difficulties, we propose a shift of approach: we use a selection criterion that can be ....

[Article contains additional citation context not shown here]

J. Oliver, R. Baxter, and C. Wallace, "Unsupervised learning using MML," in Proc. of the 13th Int. Conf. on Machine Learning, (San Francisco), pp. 364--372, 1996.


On Fitting Mixture Models - Figueiredo, Leitão, Jain (1999)   (5 citations)  (Correct)

....the evidencebased Bayesian (EBB) criterion [25] the approximate weight of evidence (AWE) 1] and Schwarz s Bayesian inference criterion (BIC) 5] Although derived in a different framework, BIC formally coincides with MDL and is also given by Eq. 11) The minimum message length (MML) criterion [20], Akaike s information criterion (AIC) 35] and Bezdek s partition coefficient (PC) 3] are other approaches in this class. As pointed out in [25] EBB, MDL BIC, and MML perform comparably and outperform all other methods against whichtheywere tested. Concerning AWE, it is argued in [5] that ....

....approaches in this class. As pointed out in [25] EBB, MDL BIC, and MML perform comparably and outperform all other methods against whichtheywere tested. Concerning AWE, it is argued in [5] that MDL BIC provides a better approximation to the true Bayes factor. The AIC and PC criteria were shown in [20] (based on tests on 20 different mixtures) to be outperformed byMMLand MDL BIC. Accordingly,any new method in this class need only be compared against EBB, MDL BIC, or MML. Finally, drawbacks of MML and EBB are: MML can not be used for certain values of d (for example d =9and d 24) 25]# both EBB ....

[Article contains additional citation context not shown here]

J. Oliver, R. Baxter, and C. Wallace. Unsupervised learning using MML. In Proceedings of the Thirtheenth International Conference on Machine Learning, pages 364--372. Morgan Kaufmann, San Francisco, CA, 1996.


Feature Selection for Temporal Health Records - Graham   (Correct)

....may have k measurements on a set of patients and so the measurements on each individual i are represented as a k dimensional vector. For vector form data well known and widely applied clustering techniques can be applied. Such techniques are generally model based methods include mixture modelling [6], or distance based methods [3] Much real world data is actually in non vector form consisting of observations of an individual, recording information at particular time points. Such variable length event sequence data is described in Sect. 2, but examples include a patient s usage of medical ....

J.J. Oliver, Baxter R.A., and Wallace C.S. Unsupervised Learning using MML. In Machine Learning: Proceedings of the Thirteenth International Conference (ICML 96), pages 364--372. Morgan Kaufmann Publishers, San Francisco, CA, 1996.


Fast Full-Search Equivalent Nearest-Neighbour Search Algorithms - Chua (1999)   (Correct)

....must first consider the question of how relevance should be measured. Our current research on this problem aims at finding an information theoretic definition of relevance. Our approach is based on the Minimum Message Length inference method developed by Prof. Chris Wallace at Monash University [174, 13, 175, 173, 176, 126]. Appendix A The k means Algorithm The k means algorithm is a well known clustering technique in pattern recognition because of its usually good behaviour and simplicity. It is an iterative scheme which starts from an initial distribution of cluster centres in data space. At each iteration, a ....

J.J. Oliver, R.A. Baxter, and C.S. Wallace. Unsupervised learning using MML. In Machine Learning: Proceedings of the Thirteenth International Conference (ICML'96), pages 364--372. Morgan Kauffman Publishers, 1996.


Clustering of Sparse Binary Data using a Minimum Description.. - Plumbley (2002)   (Correct)

....a encoding scheme that could be used to transmit the information in the data X. We will then and search for an encoding that gives us the shortest code length. This approach is know as the minimum description length (MDL) approach [4] and is conceptually similar to mimimum message length (MML) [7] and complexity minimization [8] In fact, the MDL MML approach is a Bayesian method: code lengths and code structures in the coding model are equivalent to negative log probabilities and probability structure assumptions in the Bayesian approach [7] Nevertheless, we nd the coding model more ....

.... similar to mimimum message length (MML) 7] and complexity minimization [8] In fact, the MDL MML approach is a Bayesian method: code lengths and code structures in the coding model are equivalent to negative log probabilities and probability structure assumptions in the Bayesian approach [7]. Nevertheless, we nd the coding model more appealing in that we do not need to assume that the data has any given probability structure or generative model. We will know, however, that the encoding scheme will have a good t to the data if our coding model does correspond to the process whereby ....

J. J. Oliver, R. A. Baxter, and C. S. Wallace, \Unsupervised learning using MML," in Machine Learning: Proceedings of the Thirteenth International Conference (ICML 96). 1996, pp. 364{ 372, Morgan Kaufmann Publishers.


Towards a Simple Clustering Criterion Based on Minimum Length.. - Ludl, Widmer (2002)   (Correct)

....maps, MML MDL, mixture modeling. KH99] gives an excellent overview of clustering techniques which can be adopted for use with large data sets. MDL based approaches are attractive because they provide a non parametric way of automatically deciding on the optimal number of clusters [OBW96] by trading off model complexity and fit on the data. However, up to now MDL (or MML) criteria have been used in clustering predominantly in mixture modelling algorithms (e.g. Snob [WD94] or AutoClass [CKS 88] where they proved to be very effective. The downside, however, is that ....

J.J. Oliver, R.A. Baxter, and C.S. Wallace. Unsupervised learning using MML. In Proceedings of the 13 International Conference on Machine Learning, pages 364--372, San Francisco, CA, USA, 1996. Morgan Kaufmann.


Learning Simple Relations: Theory and Applications - Berkhin, Becher (2002)   (7 citations)  (Correct)

....negative effect that this could result in quite non optimal local minimum of objective function, there are strategies [4] to compensate for this effect. Regarding the second question, different indicators (such as F statistic, Marriott index, coefficients of separation, MDL, AWD and BIC criteria [29, 17, 31]) are used to derive the most appropriate k. In the case of the IR clustering, a more straightforward criterion can be suggested. Consider the user defined threshold 8, say 0.05, which specifies a percentage (e.g. 5 ) of the original information the user is willing to sacrifice for the ....

Oliver, J.O., Baxter, R.A., Wallace, C.S., Unsupervised Learning Using MML, Machine Learning, ICML '96, 1996.


Survey Of Clustering Data Mining Techniques - Berkhin (2002)   (18 citations)  (Correct)

....L, number of clusters k, number of parameters per cluster, total number of estimated parameters p, and different flavors of Fisher information matrix. For example, 28 MDL(k) L p 2 log(p) kbext: min arg MDL(k) BIC(k) L P2 log(n) kbext = maxargBIC(k) See Oliver et al. [OBW96] and Fraley Raftery [FR98] for an introduction of the subject and for further references. Some examples include: MCLUST and X means use BIC criterion, SNOB uses MML criterion. The MDL principle is used in an evolutionary approach to k determination in Lee Antonsson [LA00] Significant ....

Oliver, J., Baxter, R. and Wallace, C. Unsupervised learning using MML. Machine Learning: ICML ' 96, 1996.


On Fitting Mixture Models - Figueiredo, Leitão, Jain (1999)   (5 citations)  (Correct)

....the evidencebased Bayesian (EBB) criterion [25] the approximate weight of evidence (AWE) 1] and Schwarz s Bayesian inference criterion (BIC) 5] Although derived in a different framework, BIC formally coincides with MDL and is also given by Eq. 11) The minimum message length (MML) criterion [20], Akaike s information criterion (AIC) 35] and Bezdek s partition coefficient (PC) 3] are other approaches in this class. As pointed out in [25] EBB, MDL BIC, and MML perform comparably and outperform all other methods against which they were tested. Concerning AWE, it is argued in [5] that ....

....in this class. As pointed out in [25] EBB, MDL BIC, and MML perform comparably and outperform all other methods against which they were tested. Concerning AWE, it is argued in [5] that MDL BIC provides a better approximation to the true Bayes factor. The AIC and PC criteria were shown in [20] (based on tests on 20 different mixtures) to be outperformed by MML and MDL BIC. Accordingly, any new method in this class need only be compared against EBB, MDL BIC, or MML. Finally, drawbacks of MML and EBB are: MML can not be used for certain values of d (for example d = 9 and d 24) 25] ....

[Article contains additional citation context not shown here]

J. Oliver, R. Baxter, and C. Wallace. Unsupervised learning using MML. In Proceedings of the Thirtheenth International Conference on Machine Learning, pages 364--372. Morgan Kaufmann, San Francisco, CA, 1996.


Unsupervised Learning of Finite Mixture Models - Figueiredo, Jain (2000)   (21 citations)  (Correct)

....Methods As was mentioned above, the vast majority of the deterministic algorithms for fitting mixtures with unknown numbers of components are supported on the EM algorithm. Al 9 though several of these methods exhibit good model selection performance (see references comparisons in [17] [35]) a major draw back remains: a whole set of candidate models has to be obtained, and the following well known problems associated with EM emerge. a) EM is highly dependent on initialization. Common (time consuming) solutions include one (or sometimes even a combination of several) of the ....

....the whole set of available models, kmax [ k=k min M k ; rather then selecting one among a set of candidate models f b (k) k = k min ; k = k min g. Previous uses of MML for mixtures do not strictly adhere to this perspective, and end up using MML as a model class selection criterion [35]. Rather than using EM to compute a set of candidate models (which has the drawbacks mentioned above) we will be able to directly implement the MML criterion using a variant of EM. This new algorithm turns out to be much less initialization dependent than standard EM, and has a built in behavior ....

[Article contains additional citation context not shown here]

J. Oliver, R. Baxter, and C. Wallace, "Unsupervised learning using MML," in Proc. of the 13th Int. Conf. on Machine Learning, (San Francisco), pp. 364--372, 1996.


Unsupervised Learning of Finite Mixture Models - Figueiredo, Jain (2000)   (21 citations)  (Correct)

.... evidence based Bayesian (EBB) criterion [17] the approximate weight of evidence (AWE) 18] and Schwarz s Bayesian inference criterion (BIC) 19] 20] 8 Other criteria include Rissanen s minimum description length (MDL) 21] which formally coincides with BIC) the minimum message length (MML) [22], 23] 24] Akaike s information criterion (AIC) 25] and Bezdek s partition coefficient (PC) 26] As reported in [17] EBB, MDL BIC, and MML perform comparably and outperform all other methods against which they were tested. Concerning AWE, it is shown in [20] that MDL BIC provides a better ....

....coefficient (PC) 26] As reported in [17] EBB, MDL BIC, and MML perform comparably and outperform all other methods against which they were tested. Concerning AWE, it is shown in [20] that MDL BIC provides a better approximation to the true Bayes factor. The AIC and PC criteria were shown in [22] to be outperformed by MML and MDL BIC. Resampling based schemes [27] and cross validation approaches [28] have also been used to estimate the number of mixture components. However, in terms of computational load, these methods are much closer to stochastic techniques (see below) than to ....

J. Oliver, R. Baxter, and C. Wallace, "Unsupervised learning using MML," in Proceedings of the Thirtheenth International Conference on Machine Learning, pp. 364--372, San Francisco, CA: Morgan Kaufmann, 1996.


Hybrid Genetic Algorithms are Better for Spatial Clustering - Estivill-Castro   (Correct)

....competitions on a data set with 30 noise. Data sets with n = 300 and n = 3; 000 were used. Both data set had 10 clusters, so the correct value of k is 10. However, we tested smaller and larger values of k because typically k is unknown and several k are tested for the evaluation of indicators [17] to decide on the value of k. Thus, it is of up most importance that the optimisation method gives good solutions even when the page 13 Provider of best solution (noise =30 ) Method 12 competitions each with a 3 competitions each with a budget=4,166 unitary operations budget=29,162 number k of ....

J.J. Oliver, R.A. Baxter, and C.S. Wallace. Unsupervised learning using MML. In 13th International Conference on Machine Learning, pages 364--372, San Mateo, CA, 1996. Morgan Kaufmann Publishers.


A Fast and Robust General Purpose Clustering Algorithm - Estivill-Castro, Yang   (Correct)

....of algorithm, referred as k d median. Our C implementation of k d median is compared with our C implementation of k Means and Expectation Maximization. For Expectation Maximization 13 we used the assumption that the covariance matrix Sigma j of each component, although unknown, is diagonal [42]. The assumption that Sigma j is diagonal implies that the component densities are aligned with the axes. In a sense, the clouds of points that are the clusters are ellipsoids with axes parallel to the co ordinate axes. If the assumption on diagonal from is removed, then other assumptions are ....

....multidimensional data set. Because it is faster than Expectation Maximization, it can be applied in combination with criteria for determining the number k of cluster. Recall that the most robust criteria find a robust estimate of the value of k by repeated clustering with different values of k [42]. Further research will see the incorporation of methods that fin discrete Euclidean medians as well as numerical approximations to the Fermat Weber center in order to extend our method to categorical data and improve its efficiency in examples like R.N. Neal s 10 dimensional binary data. ....

J.J. Oliver, R.A. Baxter, and C.S. Wallace. Unsupervised learning using MML. In L. Saitta, editor, Proceedings of the 13th Machine Learning Conference, pages 364--372, San Mateo, CA, July 1996. Morgan Kaufmann Publishers. 19


Bayesian Approaches to Gaussian Mixture Modelling - Roberts, Husmeier, Rezek, Penny (1998)   (12 citations)  (Correct)

....distributions and that a reference prior should be sought in which the mean and variance are assumed to be initially independent. In this case we would choose a prior distribution of the form P ( oe) P ( P (oe) oe Gamma2 . Independent flat (improper) priors are advocated by Oliver et al. [18] and preferred over the Jeffreys case by Lee [13] Taking a set of such improper priors leads to a simplified analytic solution, as the prior becomes independent of the parameter values. We choose, for this reason, to use such flat priors and consider each component of the mean vector for any ....

.... Gaussian are taken to have a flat prior in the range (0; fioe pop ) This gives rise to a prior density for the internal parameters of K Gaussians in a d dimensional space of P ( internal ) 1 (2fffioe 2 pop ) Kd (11) We note that this is the same prior distribution taken by Oliver et al. [18]. We furthermore follow the suggestion made in [18] to allow the prior distribution of the external model parameters (the set of model priors, or mixing fractions) to be of simple Dirichlet form such that: P (fP (k)g) K Gamma 1) 12) Combining Equations (11) and (12) and taking natural ....

[Article contains additional citation context not shown here]

J.J. Oliver, Baxter R.A., and Wallace C.S. Unsupervised Learning using MML. In Machine Learning: Proceedings of the Thirteenth International Conference (ICML 96), pages 364-- 372. Morgan Kaufmann Publishers, San Francisco, CA, 1996. Available on the WWW from http://www.cs.monash.edu.au/ ~ jono.


Unsupervised Data Partitioning: a Bayesian Approach - Stephen Roberts   (Correct)

....(6) and combining it with Equation (3) gives ln p(X ) L(X j ) ln P ( N p 2 ln(2 ) Gamma 1 2 ln jHj (7) We adopt non informative reference priors P ( for the parameter set. This leads naturally to the notion of independent flat priors, which are also advocated by Oliver et al. [5] and Lee [4] We consider each component of the mean vector for any of the K Gaussians to have a flat distribution 1 We will, furthermore, keep to the standard convention of denoting a probability using upper case P and a probability density function using lower case p. in the range ....

.... covariance elements (oe ii ) of each Gaussian are taken to have a flat prior in the range (0; fioe pop ) This gives rise to a prior density for the internal parameters of K Gaussians in a d dimensional space of P ( internal ) 1 (2fffioe 2 pop ) Kd (8) We follow the suggestion made in [5] to allow the prior distribution of the external model parameters (the set of model priors, or mixing fractions) to be of the form P (fP (k)g) K Gamma 1) Combining this with Equation (8) and taking natural logarithms gives ln P ( GammaKd ln Gamma 2fffioe 2 pop Delta ln(K ....

[Article contains additional citation context not shown here]

J.O. Oliver, R. A. Baxter, and C.S. Wallace. Unsupervised Learning using MML. Technical report, Technical report, Computer Science Dept. Monash University, jono@cs.monash.edu.au, 1996.


Minimum Message Length Segmentation - Jonathan Oliver Rohan (1998)   (4 citations)  Self-citation (Oliver Wallace)   (Correct)

No context found.

J.J. Oliver, Baxter R.A., and Wallace C.S. Unsupervised Learning using MML. In Machine Learning: Proc. of the Thirteenth International Conference (ICML 96), pages 364--372. Morgan Kaufmann Publishers, San Francisco, CA, 1996. Available on the WWW from http://www.cs.monash.edu.au/ ~ jono.


A Comparative Study of RNN for Outlier Detection in.. - Williams, Baxter, He.. (2002)   Self-citation (Baxter)   (Correct)

No context found.

J. J. Oliver, R. A. Baxter, and C. S. Wallace. Unsupervised Learning using MML. In Proc. of the Thirteenth Int. Conf. (ICML 96), pages 364--372. Morgan Kaufmann Publishers, San Francisco, CA, 1996.


Minimum Message Length Inference: Theory and Applications - Baxter (1996)   (2 citations)  Self-citation (Baxter)   (Correct)

....There are many approaches to unsupervised learning. Within AI there have been systems such as (a) CLUSTER This chapter contains joint work with Jon Oliver and Chris Wallace. Experimental work based on the results in this chapter was presented at the International Conference on Machine Learning[106], in a paper with Jon Oliver and Chris Wallace. Related work, not included here, was presented at the Sydney International Statistical Congress[13] with Jon Oliver and David Hand. I was second author in the first paper and first author in the second paper. The C program to produce the results ....

....for the simple mixture models considered. I experimentally find how many data points are required for MML to find the correct number of components with high probability for di#erent separation of components. The MML estimates use approximations requiring the assumption of well separated components[106]. Two alterations are made to the existing MML estimates in order to improve its performance on overlapping distributions. Experiments are performed with the new estimates to confirm that they are e#ective. The results here will assist mixture modellers. They will help mixture modeller ....

[Article contains additional citation context not shown here]

J.J. Oliver, R.A. Baxter, and C.S. Wallace. Unsupervised Learning using MML. In Machine Learning: Proceedings of the Thirteenth International Conference (ICML 96), pages 364--372. Morgan Kaufmann Publishers, 1996.


Bayesian Approaches to Segmenting a Simple Time Series - Oliver, Forbes (1997)   (3 citations)  Self-citation (Oliver)   (Correct)

....2 N( oe 2 j =q) 2) oe 2 j Gamma Gamma1 (ff; fi) 3) where ff; fi; and q are hyper parameters for the prior distribution. 3.1. 3 Prior #2 (An Improper Prior) We considered using improper prior distributions analogous to the distributions used in the context of mixture modelling [25, 15]. We considered using a uniform prior for each oe j and each j inversely proportional to the standard deviation of the differences: h(oe j ) 1 oe Deltay for oe j 0; j = 0; C h( j ) 1 oe Deltay for j 2 [ Gamma1; 1] j = 0; C We assume that the parameters are ....

J.J. Oliver, Baxter R.A., and Wallace C.S. Unsupervised Learning using MML. In Machine Learning: Proceedings of the Thirteenth International Conference, pages 364--372, 1996.


MML mixture modelling of multi-state, Poisson, von Mises.. - Wallace, Dowe (1997)   Self-citation (Wallace)   (Correct)

....numerically, to the quantity h(V ) Theta P (XjV ) p F , which is what MML (in general) and Snob (in particular) endeavour to maximise. Thus, although AutoClass II is differently motivated from Snob, in practice it gives almost identical results. 6. 2 Comparison with other methods Oliver et al.[25] re wrote the Gaussian mixture modelling part of Snob[41, 42] by modifying the Bayesian priors and introducing lattice constants[43, 39] see Section 2.5) and then empirically showed a successful performance of (this slightly modified) Snob against AIC (Akaike s Information Criterion) BIC [28] ....

J. Oliver, Baxter R., and Wallace C. Unsupervised learning using MML. In Proc. 13th International Conf. Machine Learning (ICML 96), pages 364--372. Morgan Kaufmann, San Francisco, CA, 1996.


Algorithms for Clustering High Dimensional and - Tao   (Correct)

No context found.

J. J. Oliver, R. A. Baxter, and C. S. Wallace. Unsupervised Learning using MML. In Machine Learning: Proceedings of the Thirteenth International Conference (ICML 96), pages 364--372. Morgan Kaufmann Publishers, 1996.


Message Length Estimators, Probabilistic Sampling and Optimal.. - Davidson, Yin   (Correct)

No context found.

Oliver, J., Baxter, R., and Wallace, C.S., Unsupervised Learning Using MML, International Conference on Machine Learning: Proceedings of the Thirteenth International Conference, 1996


In Search of the Horowitz Factor: Interim Report on a Musical.. - Widmer (2002)   (2 citations)  (Correct)

No context found.

Oliver, J., Baxter, R., and Wallace, C. (1996). Unsupervised Learning Using MML. In Proceedings of the 13th International Conference on Machine Learning (ICML'96). San Francisco, CA: Morgan Kaufmann.


S E A R C H P O R T I D I A P D a l l e M o l l e I n s t i t u t .. - Pe Cep Ua (2002)   (Correct)

No context found.

J. Oliver, R. Baxter, and C. Wallace, \Unsupervised learning using MML," Proc. 13th Int'l Conf. Machine Learning, pp. 364-372, 1996.


Unknown - (2002)   (Correct)

No context found.

J. J. Oliver, R. A. Baxter, and C. S. Wallace, "Unsupervised learning using MML," Proc. 13 Int. Conf. on Machine Learning, 1996, pp. 1-10.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC