| A. Rao, D. Miller, K. Rose, and A. Gersho, "Mixture of Experts Regression Modeling by Deterministic Annealing," IEEE Trans. on Signal Processing, vol. 45, no. 11, pp. 2811--2820, 1997. |
....development of regression models realized as an aggregation of several, each partially explicative of the problem. Modularization of regression tasks (Sharkey, this volume) has been described in the switching regression framework [32, 22] and well developed within the mixture of experts approach [1, 30, 40]. As in principle learning strategies are available for combining the most complex models [30, 40] it might be surprising that very simple architectures like piecewise linear models are often reported to outperform more complex network combinations on difficult regression problems [34] It is a ....
....of the problem. Modularization of regression tasks (Sharkey, this volume) has been described in the switching regression framework [32, 22] and well developed within the mixture of experts approach [1, 30, 40] As in principle learning strategies are available for combining the most complex models [30, 40], it might be surprising that very simple architectures like piecewise linear models are often reported to outperform more complex network combinations on difficult regression problems [34] It is a fact that the structure of the training data distribution and the type of learning procedure have ....
[Article contains additional citation context not shown here]
A. V. Rao, D. Miller, K. Rose, and A. Gersho. Mixture of experts regression modeling by deterministic annealing. IEEE Trans. on Signal Processing, 45(11):2811--2820, November 1997.
....the original cost function. Deterministic annealing has been applied to a wide variety of optimization problems including deformable models and matching problems [130, 132, 131, 36, 37, 114, 115] non metric multidimensional scaling [55, 67] supervised learning for classification and regression [88, 98], the travelling salesman problem [30, 29, 105, 52] and data clustering [103, 104, 15, 16, 14, 56, 59, 57, 58, 54, 86, 87] the application on which this work is going to focus. The technique of deterministic annealing only works for a particular class of cost functions, because it is necessary ....
A. Rao, D. Miller, K. Rose, and A. Gersho. Mixture of experts regression modeling by deterministic annealing. IEEE Transactions on Signal Processing, accepted for publication, 1997.
.... an effective optimization tool for similar tasks [4] Derived from fundamental principles of statistical physics and information theory, DA was first proposed for clustering and related problems [5] 6] and later applied to pattern classifiers [7] source coding systems [8] regression functions [9], etc. Most recently, DA has been successfully applied in the design of discrete observation HMM (DHMM) 10] 11] and continuous density HMM (CHMM) recognizers [12] and was shown to substantially outperform both ML based EM algorithm and MCE based GPD algorithm. In this paper, we propose a new ....
A. V. Rao, D. Miller, K. Rose, and A. Gersho, "Mixture of experts regression modeling by deterministic annealing", IEEE Trans. on Signal Processing, vol. 45, no. 11, pp.2811-2820, 1997.
....on the number of clusters is imposed, then at zero temperature a hard clustering solution, or a quantizer, is obtained. The basic DA approach to clustering has since inspired modifications, extensions, and related work by numerous researchers including [6] 14] 47] 64] 70] 72] 73] [82], 91] 103] 106] This paper begins with a tutorial review of the basic DA approach to clustering, and then goes into some of its most significant extensions to handle various partition structures [69] as well as hard supervised learning problems including classifier design [70] piecewise ....
.... with a tutorial review of the basic DA approach to clustering, and then goes into some of its most significant extensions to handle various partition structures [69] as well as hard supervised learning problems including classifier design [70] piecewise regression [78] and mixture of experts [82]. Another important theoretical aspect is the connection with Shannon s rate distortion (RD) theory, which leads to better understanding of the method s contribution to quantization and yields additional contributions to information theory itself [87] Some of the currently investigated ....
[Article contains additional citation context not shown here]
A. Rao, D. Miller, K. Rose, and A. Gersho, "Mixture of experts regression modeling by deterministic annealing," IEEE Trans. Signal Processing, vol. 45, pp. 2811--2820, Nov. 1997.
....but does so while employing a powerful optimization tool. DA was first proposed for clustering and related problems [12,13] and later extended to solve problems which require structural constraints on the clustering rule [6] and applied to certain source coding systems [7] regression functions [8,10], pattern classifiers [6] etc. For a tutorial on DA see [14] Most recently, DA was successfully applied in the design of discrete observation HMM classifiers [9,11] and was shown to substantially outperform both ML and GPD. In this paper, we propose a generalization of the DA method to the ....
....case of g , the random classification rule reverts to the non random best path classifier that assigns all the winning probability to the path with the highest score. The Gibbs parametric form of this distribution is not arbitrary, but is derivable from information theoretic principles [6, 8,14]. We should re emphasize that the random classifier paradigm is adopted only during the training phase. The DA algorithm ultimately produces a regular, non random HMM based classifier. The expected misclassification rate of the random classifier is given by: N x H p P c H C P N i H N j ....
Rao A.V., Miller D., Rose K. and Gersho A.(1997), Mixture of experts regression modeling by deterministic annealing, IEEE Trans. On Signal Processing, Vol. 45, No 11, Nov 1997, pp 2811-2820.
.... we propose here, an alternative method based on the technique of deterministic annealing (DA) DA was first proposed in the context of clustering [6] later extended to solve structurally constrained clustering problems such as the design of pattern classifiers [7] and regression functions [8], and recently applied to of time series classification [9] We will show here that the DA method for HMM classifier design offers substantial gains by combining the right criterion of MCE with the optimization power of DA. 2. THE HMM DESIGN PROBLEM In a typical isolated word speech recognition ....
....on the cost surface. We refer to the Lagrange parameter T as the temperature because of interesting connections to statistical physics. The process of reducing T to zero is similar in principle to the phenomenon of annealing in physical systems. For more insights into the physical analogy, see [6, 7, 8]. The minimization of the Lagrangian cost function L is achieved by a series of gradient descent steps at each temperature. An important aspect of the proposed method is the discovery of an efficient forwardbackward algorithm to determine the gradient parameters for the optimization. The ....
A. Rao, D. Miller, K. Rose, A. Gersho, " Mixture of Experts Regression Modeling by Deterministic Annealing ", IEEE Trans. Signal Processing , Nov. 1997.
No context found.
A. Rao, D. Miller, K. Rose, and A. Gersho, "Mixture of Experts Regression Modeling by Deterministic Annealing," IEEE Trans. on Signal Processing, vol. 45, no. 11, pp. 2811--2820, 1997.
No context found.
Rao, A., Miller, D., Rose, K. & Gersho, A. (1997), `Mixture of Experts Regression Modeling by Deterministic Annealing', IEEE Trans. on Signal Processing 45(11), 2811--2820.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC