| J. Hampshire and A. Waibel, A novel objective function for improved phoneme recognition using timedelay neural networks, IEEE Transactions on Neural Networks 1 (1990): 216-228. |
....yield equivalent classification performance, and that all MLPs are in effect no more than exotic estimators of Bayesian a posteriori probabilities. In fact, neither conclusion is correct. A broad class of objective functions called N monotonic Classification Figures of Merit (CFM mono ) [8] are shown to approximate Bayesian classification performance under the same conditions for which the reasonable error measures yield Bayesian performance. However the CFM mono class of functions does not produce MLP output activations that reflect a posteriori probabilities P( i j x) instead ....
.... class of functions does not produce MLP output activations that reflect a posteriori probabilities P( i j x) instead it asymptotically identifies the maximum a posteriori probability for a given input P( max j x p ) as long as P( max j x p ) 0:5 (see section 4) Despite this limitation, [8] indicates that CFM mono trained MLPs can be more robust approximations to the Bayesian discriminant than their reasonable error measure counterparts, given small training sample sizes. While the findings of [8] are not broad enough to be considered conclusive, they do argue against the maxim ....
[Article contains additional citation context not shown here]
J. B. Hampshire II and A. H. Waibel, "A Novel Objective Function for Improved Phoneme Recognition Using Time-Delay Neural Networks", IEEE Trans. Neural Networks, vol. 1, pp. 216-228, June, 1990.
....Soo Young Lee (phone: 82 42 869 3431, e mail: sylee4eelatist.ac3a) I. INTRODUCTION Multilayer perceptron (MLP) is the most popular neural network model which has wide application areas such as mobile telecommunications [1] 2] ATM networks [3] 4] pattern recognition [5] speech recognition [6], time series prediction [7] and nonlinear control [8] Especially theoretical analyses of MLPs in mathematical or statistical aspects support the applications and research efforts for MLPs [9] 12] Training of MLPs is usually done by the error backpropagation (EBP) algorithm [13] in which ....
....a weak error signal needs to be generated for correctly saturated output nodes so that the weight update associated with a mining pattern scarcely perturbs the weights mined for all mining pattems. The weak error signal is also necessary to prevent overfitting of leaming for mining patterns [6]. In this sense, an nCE error function [ 18] was proposed as L t l(t. x(L) 5 s j=l 2 n 2 (1 x ) where ts = 1 and n = 1,2, 5) Using the above error function, the error signal of output layer becomes n l (L) n B L) t (t ) s = 2 l (6) The nCE error function with n = 1 ....
[Article contains additional citation context not shown here]
J. B. Hampshire lI and A. H. Waibel, "A Novel Objective Function for Improved Phoneme Recognition Using Time-delay Neural Networks," IEEE Trans. Neural Networks, Vol. 1, June 1990, pp. 216-228.
....or boolean values representing designated classes) Cross entropy (CE) assumes idealized class outputs (i.e. target values of zero or one for a sigmoid activation) 13] and is therefore more appropriate to classification problems. However, error values using SSE and cross entropy have been shown [9] to be inconsistent with ultimate sample classification accuracy. That is, minimizing CE or SSE is not necessarily correlated to high recognition rates. Numerous experiments in the literature provide examples of networks that achieve little error on the training set but fail to achieve the best ....
....and generalizing accurately over the entire population. B. Shortcomings of search methodologies More fundamentally, the above objective functions provide mechanisms that do not reflect the true goal of classification learning, which is to achieve high recognition rates on unseen data. In [9], a new objective function, the classification figure of merit (CFM) is introduced for which minimizing error remains consistent with increasing classification accuracy. Networks that use the CFM as their criterion function in phoneme recognition are introduced in [9] and further considered in ....
[Article contains additional citation context not shown here]
Hampshire 1I, John B., "A Novel Objective Function for Improved Phoneme Recognition Using Time-Delay Neural Networks", IEEE Transactions on Neural Networks, Vol. 1, No. 2, June 1990.
....and moreover greater for a correct classification than for an incorrect classification. One can therefore construct examples where the MSE minimum for a given training set and a given MLP structure does not correspond to the classification error minimum for that structure [Brady Raghavan, 1988, Hampshire Waibel, 1990] However, it is important to note that with sufficient parameters, minimizing squared error will give as output an estimate of the posterior probabilities P (C k jx) Several proofs have been presented for this [Geman et al. 1992, Richard Lippmann, 1991] The reasoning is that the posterior ....
....comparison of the correct category with the best matching incorrect category; the sign of this measure reflects the correctness of the classification. This choice corresponds to one of the choices described for the Classification Figure of Merit (CFM) criterion proposed for feed forward MLPs [Hampshire Waibel, 1990, Hampshire, 1993] It is interesting to note that with = 1, all categories are weighted equally, and the misclassification measure resembles the MMI criterion, fMMI ( equation (2.17) The similarity between MMI and MCE GPD will be discussed in greater detail in Chapter 5. Definition of MCE ....
Hampshire, J. and Waibel, A. (1990). A Novel Objective Function for Improved Phoneme Recognition using Time-Delay Neural Networks. IEEE Transactions on Neural Networks, Vol. 1, No. 2, pp. 216-228.
....on bootstrap sampling [8] It generates several training sets from the original training set and then trains an individual neural network from each generated training set. There are also many other approaches for training individual neural networks. Examples are as follows. Hampshire and Waibel [15] utilize di#erent objective functions to train di#erent individual neural networks. Cherkauer [6] trains individual networks with di#erent number of hidden units. Maclin and Shavlik [25] initialize individual networks at di#erent points of the weight space. Krogh and Vedelsby [23] employ ....
J. Hampshire and A. Waibel, A novel objective function for improved phoneme recognition using timedelay neural networks, IEEE Transactions on Neural Networks 1 (1990): 216-228.
....corresponding pattern is included in the training set. Selective Updating has the drawback of assuming uncorrelated input units, which is often not the case for practical applications. Another approach to selective learning is simply to discard those patterns that have been classified correctly [40, 41]. The effect of such an approach is that the training set will include those patterns that lie close to decision boundaries. If the candidate set contains outlier patterns, these patterns will, however, also be selected. This error selection (ES) approach therefore requires a robust estimator ....
Hampshire, J.B., Waibel, A.H.: A Novel Objective Function for Improved Phoneme Recognition using TimeDelay Neural Networks, IEEE Transactions on Neural Networks, 1(2), 1990, 216-228.
....network. Cohn, Atlas and Ladner define active learning as any form of learning in which the learning algorithm has some control over which part of the input space it receives information [2] From this definition, two main approaches to active learning can be defined, namely selective learning [3, 5, 7, 8] and incremental learning. Selective learning selects a completely new training subset from the candidate training set at each subset selection interval, based on some measure of pattern informativeness. Each original candidate pattern is eligible for selection at each subset selection interval, ....
JB Hampshire, AHWaibel, "A Novel Objective Function for Improved Phoneme Recognition using TimeDelay Neural Networks", IEEE Transactions on Neural Networks, 1(2), 1990, pp 216-228.
....between the correct digit class and the probabilities assigned to the various classes by the digit models. The maximum mutual information criterion emphasizes correct discrimination rather than correct modeling of the image data, and it generally leads to better discriminative performance [40], although the advantage of discriminative learning vanishes if the generative model is correct and the fitting process produces the true probability of the data given the model [41] Early experiments showed that, for our generative models, maximum likelihood learning was just as effective as ....
J. Hampshire and A. Waibel, "A novel objective function for improved phoneme recognition using time-delay neural networks", Technical Report CMU-CS-89-118, Carnegie-Mellon, Pittsburgh, PA, 1989.
.... sequence generated by the i th handwritten version of the symbol to the recognizer and analysing the generation probabilities obtained by the HMMs or their combination, a measurement for the reliability R of this classification result can be calculated by using the cost function C presented in [7][8] By distinguishing between correct and wrong classification results achieving independency from the Top 1 recognition rates, two reliabilities R c and R w are calculated by using . Based on the feature extraction algorithms and their combination, the average results of the reliability ....
J.B. Hampshire II, A.H. Waibel, A Novel Objective Function for Improved Phoneme Recognition Using Time-Delay Neural Networks, IEEE Trans. on Neural Networks Vol.1 No.2, pp. 216-228, June 1990.
....modularized architecture of AME, we will only explore constant weighting functions in this paper. See [Jacobs, 1995] for a review of different types of non constant combination. Others also find that averaging is the best strategy rather than the more complicated strategies suggested in theory [Hampshire Waibel, 1990, J. Ghosh Beck, 1992] 3 Document Filtering Document filtering requires a decision to accept or reject documents as they appear one at a time independently of previously or subsequently examined documents. To make these judgements, the system must produce some measure of value, or likelihood ....
Hampshire, J. B., & Waibel, A. H. (1990). A Novel Objective Function for Improved Phoneme Recognition Using Time-Delay Neural Networks. IEEE Transactions on Neural Networks, 1(2),216-- 228.
....implicit segmentation is evaluated. A complete mathematical formulation of both approaches is given. 2. METHODS 2.1. Discriminative training within a given phoneme segmentation For the discriminative training we apply the objective function of the Generalized Probabilistic Descent (GPD) approach [7], 1] 2] The total cost y with the parameters g and h is given in equations (1) and (2) y N e m m M dmn n N m M m = 11 1 1 1 1 ( g (1) dmn s m M e M e mn sm mmm M smsm mmm M mn mn mn ( log( log( ....
Hampshire J.B. and Waibel A.H., A Novel Objective Function for Improved Phoneme Recognition Using Time-Delay Neural Networks, IEEE Trans. on Neural Networks, vol. 1, no. 2, June 1990
....that input is assigned to model . However, the training of these parameters is performed to minimize the regression error directly rather than maximize the likelihood objective. We note that in the closely related problem of pattern classification, there has been a renewed research interest [7] [11], 18] 21] in the optimization of the true, yet complex, cost misclassification probability rather than a mismatched but simpler cost function. This approach has found applications in various fields, particularly in speech recognition [19] 29] At this point, we must reconcile our argument ....
J. B. Hampshire and A. H. Waibel, "A novel objective function for improved phoneme recognition using time-delay neural networks," IEEE Trans. Neural Networks, vol. 1, pp. 216--228, 1990.
....yield equivalent classification performance, and that all MLPs are in effect no more than exotic estimators of Bayesian a posteriori probabilities. In fact, neither conclusion is correct. A broad class of objective functions called N monotonic Classification Figures of Merit (CFM mono ) [8] are shown to approximate Bayesian classification performance under the same conditions for which the reasonable error measures yield Bayesian performance. However the CFM mono class of functions does not produce MLP output activations that reflect a posteriori probabilities P( i j x) instead ....
.... class of functions does not produce MLP output activations that reflect a posteriori probabilities P( i j x) instead it asymptotically identifies the maximum a posteriori probability for a given input P( max j x p ) as long as P( max j x p ) 0:5 (see section 4) Despite this limitation, [8] indicates that CFM mono trained MLPs can be more robust approximations to the Bayesian discriminant than their reasonable error measure counterparts, given small training sample sizes. While the findings of [8] are not broad enough to be considered conclusive, they do argue against the maxim ....
[Article contains additional citation context not shown here]
J. B. Hampshire II and A. H. Waibel, "A Novel Objective Function for Improved Phoneme Recognition Using Time-Delay Neural Networks", IEEE Trans. Neural Networks, vol. 1, pp. 216-228, June, 1990.
....theories and neural network architectures have been developed conforming the Speech Recognition Schools of Thought. Some examples of these neural networks are the well known TDNN s Time Delay Neural Networks [Waibel et al. 86] Lang et al. 90] Lee and Rabiner, 93] Hataoka and Waibel, 90] Hampshire and Waibel, 90] TDNN s with Dynamic Time Warping [Botros and Premnah, 92] the LPNN s Linked Predictive Neural Networks [Tebelskis and Waibel, 90] Tebelskis et al. 91] and the Dynamic Programming Neural Networks [Sakoe et al. 89] Even though there is a strong interest and support for the development of ....
....are the TDNN s (Time Delay Neural Networks) of Waibel et al. Their TDNN architecture (Figure 2. 7) is a feedforward multilayer neural network with timedelayed connections at each hidden layer to relate and compare current input to the past history of events [Waibel et al. 86] Lang et al. 90] Hampshire and Waibel, 90] 20 Figure 2.7: The TDNN Architecture (from [Waibel et al., 86] 21 In TDNNs the speech input (spectrogram) is partitioned into frames (of 10 ms each) It requires a pre labeling of the elements to be learned and recognized and the output of the network is assumed to depend on the information ....
J.B. Hampshire, A. Waibel, "A Novel Objective Function for Improved Phoneme Recognition Using Time-Delay Neural Networks", IEEE Transactions on Neural Networks, Vol. I, No.2, June 1990.
....points in weight space is to vary the training process to produce networks that are independent and therefore useful for combination. Examples of this approach include Reilly et al. s multi resolution architectures [ 1987 ] Schapire s [ 1990 ] and Drucker et al. s [ 1994 ] boosting algorithms, Hampshire and Waibel s [ 1990 ] use of different objective functions, Baxt s [ 1992 ] method of training networks on different tasks, and Perrone s [ 1992 ] tree structured neural networks. Our method of combining competitive (unsupervised) and backpropagation (supervised) learning is similar to a number of other approaches [ ....
J. Hampshire and A. Waibel. A novel objective function for improved phoneme recognition using time-delay neural networks. IEEE Trans. on Neural Networks, 1:216-- 228, 1990.
....one specific value or range of values can effectively be eliminated by training multiple models with alternative values and taking a vote. In the case of Pearlmutter and Rosenfeld [11] this parameter is the set of initial random weights in backpropagation. In the case of Hampshire and Waibel ([5], reported in [11] it is the objective function. In Alpaydin s case [1] this parameter is the order of patterns in the training set. One can also divide the training set into N and train N separate models and take a vote. 3 Weighted Voting When we take a simple majority voting, each network ....
Hampshire, J., Waibel, A. (1989) "A novel objective function for improved phoneme recognition using time delay neural networks," CMU, TR CS-89-118.
....does not yield zero error. Then non separating suboptimal solutions can exist with lower error than the optimal solution. So solutions with minimum least mean square error does not imply having a minimum number of misclassifications. The situation can be illustrated by the following simple figure [Hampshire and Waibel 90] We have a network with two output units having output between 0 and 1. The outputs are mapped onto the x and y axis respectively. If the desired target pattern is (10) then all outputs to the right of the line y = x can be considered correct. If and only if the contours of equal error are ....
....function does not prevent the weights to converge into flat regions in weight space but makes it easier to escape from these regions. 5. 1 The CFM error function The classification figure of merit error function (CFM) applies to classification problems with an orthogonal output representation [Hampshire and Waibel 90] 1 This new approach to training neural networks, also often referred to as differential learning, maximizes a function of the minimum difference between the output from the unit representing the right class and other units output. In this way, learning focuses most on the reduction of ....
J.B. Hampshire and A.H. Waibel (1990), A Novel Objective Function for Improved Phoneme Recognition Using TimeDelay Neural Networks, IEEE Transactions on Neural Networks, Vol. 1, No. 2, pp. 216-228.
....between the correct digit class and the probabilities assigned to the various classes by the digit models. The maximum mutual information criterion emphasizes correct discrimination rather than correct modeling of the image data, and it generally leads to better discriminative performance [42], although the advantage of discriminative learning vanishes if the generative model is correct and the fitting process produces the true probability of the data given the model [43] Early experiments showed that, for our generative models, maximum likelihood learning was just as effective as ....
J. Hampshire and A. Waibel, "A novel objective function for improved phoneme recognition using time-delay neural networks", Technical Report CMU-CS-89-118, Carnegie-Mellon, Pittsburgh, PA, 1989.
....solutions. Rather than choosing to approximate the discriminant function, a number of researchers have proposed alternative cost objectives and learning algorithms which better match the goal of minimizing misclassification error (or minizing risk, if errors are not weighed equally) e.g. 7] [4], 6] 11] Typically, these methods descend on an energy surface, using either a batch or a sequential optimization technique. While these approaches optimize MLPs and other network models to effectively minimize classification error, a legitimate concern is the potential to fall into poor local ....
J. B. Hampshire and A. H. Waibel. A novel objective function for improved phoneme recognition using time-delay neural networks. IEEE Trans. on Neural Net., 1:216--228, 1990.
No context found.
J. Hampshire and A. Waibel. A Novel ObjectiveFunction for Improved Phoneme Recognition Using Time-Delay Neural Networks. In Proceedings of the
.... Gamma (y i Gamma t i ) 2 ) which (like cross entropy) punishes outliers with an error approaching infinity for jt i Gamma y i j approaching 1.0. For the word level training, we have achieved best results with an objective function similar to the Classification Figure of Merit (CFM) [6], which tries to maximize the distance d = y c Gamma y hi between the correct score y c and the highest incorrect score y hi instead of using absolute targets 1.0 and 0.0 for correct and incorrect word units: ECFM (T ; Y ) f (y c Gamma y hi ) f (d) 1 Gamma d) 2 The philosophy here is ....
J. Hampshire and A. Waibel. A Novel Objective Function for Improved Phoneme Recogn. Using Time Delay Neural Networks. IEEE Trans. on Neural Networks, June 1990.
No context found.
J. Hampshire and A. Waibel, A novel objective function for improved phoneme recognition using timedelay neural networks, IEEE Transactions on Neural Networks 1 (1990): 216-228.
No context found.
J. Hampshire and A. Waibel, A novel objective function for improved phoneme recognition using timedelay neural networks, IEEE Transactions on Neural Networks 1 (1990): 216-228.
No context found.
J. Hampshire and A. Waibel, A novel objective function for improved phoneme recognition using time-delay neural networks, IEEE Trans. Neural Networks 1 (1990) 216228.
No context found.
J. B. H. II and A. Waibel, #A novel objective function for improved phoneme recognition using time-delay neural networks," IEEE Transactions on Neural Networks 1, pp. 216#228, June 1990.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC